992 Matching Annotations
  1. Mar 2022
    1. Author Response:

      Evaluation Summary

      Challa and Ryu et al. systematically evaluated various combinations of ADP-ribose-binding modules to make sensors detecting poly(ADP-ribose). They developed and tested two indicator designs optimized for analyses in cell culture (dimerization-dependent GFP-based) or intact tissues (split Nano luciferase-based). Overall, with further experimental controls and quantification, this timely set of cell biology probes will be useful to study the biological functions of ADP-ribosylation in cultured cells and whole organisms.

      We appreciate the positive and encouraging words from the reviewer. We also appreciate the helpful comments, criticisms, and suggestions, which we have endeavored to address fully.

      Reviewer 1 (Public Review):

      While these tools are more sensitive than existing tools, it is unclear whether a dynamic range of 6-fold (GFP) and 3-fold (luciferase) provide sufficient sensitivity for properly understanding the PAR dynamics (which was thought to increase as much as 100-fold in DNA damage settings). In addition, it is unclear whether the fold increases in both fluorescence and luminescence linearly correlate with the traditional measures by western blot.

      We are pleased that the reviewer found our sensors to potentially useful. The reviewer provided a number of excellent comments and suggestions that have served as a useful guide for improving our paper. We have carefully considered all of the comments, insights, and suggestions from the reviewer and revised the manuscript accordingly. We think this has strengthened our conclusions and improved the paper considerably. We thank the reviewer for the careful and thorough review of our paper.

      Figure 1F indicates on the western blot that there was a precipitous drop of PARylation after 5 min, but the GFP signal indicated a linear drop. It will be important to quantify the signals on western blots and test how correlate their data with the GFP/luciferase data in scatter plots for their various sets of data. Would this system under-estimate the changes and be not sensitive enough to subtle changes that may be 1-2 fold measured by traditional means

      We agree with the reviewer that a comparison with existing PAR detection technologies will improve the manuscript. We now performed a comparative analysis of ELISA, Western blot, and immunofluorescence assays with live cell imaging using PAR-T GFP (Figures 6A, 6C, 6D). The results indicate that the detection range of PAR-T ddGFP is comparable to the established PAR detection assays. In addition, we also compared the live cell luciferase assays using PAR-T NanoLuc to Western blotting (Figure 6B) and found that these two assays are able to detect PAR changes at comparable levels. We would also like to emphasize that these sensors were developed to improve our ability to detect PAR changes in living cells and animals, which the existing techniques are not capable of doing.

      Similarly, how is their quantitation in Figure 2 compared with traditional immunofluorescence?

      We performed this comparison and observed that the changes in PAR levels as detected by live cell imaging using PAR-T ddGFP are comparable to the changes detected in immunofluorescence assays using the WWE-Fc reagent (Figure 6D and 6E).

      Lastly, for the luciferase signal in Figure 3B and C, the corresponding signal in western blots are missing. Therefore, it is difficult to estimate the background signal. If Niraparib, as in other figures, eliminates PAR signals on western blot, these data would indicate half of the basal signal are background, which is rather high. Having said that, tool development is an evolution process. These tools will provide a good foundation for future development. Therefore, understanding these limitations (dynamic range, quantitative sensitivity correlation, and background) will provide a better assessment of the utility of these new tools for investigating PAR biology.

      We appreciate the reviewer’s concern about the high background signal in Niraparibtreated samples. To answer this concern, we compared the dynamic range of PAR-T NanoLuc to Western blotting (Figure 6B) and found that the results from live cell luciferase assays using PAR-T NanoLuc are comparable to Western blotting using WWE-Fc. Of note, we were able to detect decreases in PAR levels with Niraparib using live cell luciferase assays using PAR-T NanoLuc, but not Western blotting. Based on these analyses, we can conclude that the changes in PAR levels at the basal level are very minimal, leading to only 50% decrease in PAR-T NanoLuc signal with Niraparib treatment (Figure 6B, Figure 5A-5C). Note that the decrease in PAR-T NanoLuc signal is greater when UV-treated cells were pre-treated with Niraparib, which is consistent with the results from Western blot analysis (Figure 5A).

      Reviewer 2 (Public Review):

      In this study, the authors attempted to extend their own work and that of others in the field in developing probes to detect the signaling molecule, poly-ADPribose (PAR) that can be used in the test tube, in cells and in tumor models. Major strengths include the development of a set of probes with data demonstrating utility and efficacy. Further, the authors show the assay to be useful in cell models and tumor models. Some weaknesses include what appears to be a high level of background in the assay. Further, regarding methods, the exact probes (sequences) being evaluated are not defined. This is one of several new PAR probes being developed over the last few years but may have widespread utility due to the quantitative nature of the bioluminescent assay.

      We thank the reviewer for these thoughtful and encouraging comments, as well as the interesting, thought-provoking, and constructive criticisms that have prompted us to dig deeper and provide more evidence to support our claims

      Reviewer 3 (Public review):

      The major drawback is that, while the authors demonstrated some applications of these PAR trackers (PAR-T) in both culture cells and in animals, the data of PAR-T ddGFP on cancer cells and the data of PAR-T Nano luciferase may not be sufficient to support the authors' claim that the new tool can detect spatial and temporal dynamics of PAR in cells and in animals. That said, the new tools can potentially expand the capability of cell biologists to visualize and study the PAR production process in both normal and disease states with improved sensitivity and tissue compatibility.

      We thank the reviewer for appreciating the potential utility of the PAR-T sensors, as well as the detailed and constructive criticisms that have prompted us to provide more evidence to support our claims. Addressing these comments has helped us to improve the paper.

      One of the major issues of this manuscript is the lack of time-course data for PAR-T luminescent sensors to demonstrate temporal monitoring of PAR levels in animals. If the binding of two split Nano Luciferase parts is irreversible, the application might be limited. However, according to the literature (Scientific Reports volume 11, Article number: 12535 (2021)), the split Nanoluc technology should be able to detect dynamic changes. Either way, a set of time-course data would be necessary. The authors need to provide evidence to support their statement "The high sensitivity and low signal to noise ratios of the PAR-Trackers described here enable spatial and temporal monitoring of PAR levels in cells and in animals.

      We agree with the reviewer’s comment that the original manuscript did not demonstrate that the PAR-T sensors can be used to detect spatio-temporal changes in PAR. To demonstrate that PAR-T NanoLuc can be used to detect time-dependent changes in PAR levels, we performed a time course of UV-mediated PARP-1 activation (Figure 5D). The results from this assay demonstrated that the dynamic changes in PAR in live cells, in response to DNA damage, can be recaptured using the PAR-T NanoLuc sensors. In addition, we also measured PARGi-mediated PAR accumulation in vivo in xenograft tumors (Figure 8 - figure supplement 1B-1D). We found that PAR can be detected readily in breast cancer cells when injected into mice. Upon treatment with PARGi, the luminescence from PAR-T NanoLuc increased significantly by 6 hours and then diminished by 24 hours. These data demonstrate that PAR-T NanoLuc can be used to track dynamic changes in PAR levels both in cells and in animals. While not in vivo, our work with spheroids also addresses this concern. See our response to the next comment below.

      Figure 2- figure supplement 2. For the detection of spatial dynamics of PAR signals in cancer spheroids, the authors did not provide sufficient evidence as only static images of different spheroids in different conditions were provided. And 2 out of 3 fields of view only include one spheroid. In addition, there is no time-course image data showing the spatial patterns of PAR in cancer cells are dynamic.

      We have now performed a quantitative analysis of multiple spheroids. As indicated in Figure 3B, we observed a significantly higher GFP fluorescence signal in spheroids derived Challa et al. (Kraus) – Rebuttal February 2, 2022 10 from PAR-T ddGFP expressing cells compared to those expressing ddGFP or those treated with Niraparib. To address the reviewer’s concern about using PAR-T ddGFP for spatio-temporal changes in cells, we included a video for live cell imaging of H2O2-mediated increase in PAR-T ddGFP (Figure 2 - figure supplement 2, video). We also developed an analysis approach that allows us to quantify the signals from the core of the spheroids separately from the periphery of the spheroids. We also performed a time course in 3D cancer spheroids to visualize the spatio-temporal changes in PAR levels (Figure 3C and 3D). The results from this experiment demonstrate that the PAR levels in cells at the core of the spheroids are relatively resistant to Niraparib treatment, as the PAR levels in cells at the core of the spheroid decrease at a lower rate when compared to PAR in the cells at the outer layer of the spheroid.

      In the caption of Figure 2 -figure supplement 1 (B and C), it states "Immunofluorescence assay to track PAR formation in response to H2O2.", but there is no evidence showing any antibodies were used there.

      We thank the reviewer for pointing out this error. It should have been written as live cell imaging, not immunofluorescence assay. We made this correction.

      It seems that Figure 3 B and C does not support the statement "we observed specific detection of firefly luciferase with D-Luciferin and NanoLuc with furimazine with no cross-reactivity" And it is unclear why the authors refer Fig. 3B and C after that statement as those data seems not supporting this claim. Similarly, the statement "Moreover, the luminescence of PAR-T Luc is only 30-fold lower than intact firefly luciferase." Was not supported by Fig. 3B. In fact, the differences between PAR-T Luc and intact firefly luciferase were ~1000 fold in vivo, judging from Fig 5B. It is also unclear which data of the construct was used to plot Fig. 3C.

      We thank the reviewer for this comment. We changed the scale bar to represent the true scale for the luminescence from Nano luciferase and Firefly luciferase. This indicates that the brightness of PAR-T NanoLuc is 30-fold lower than intact firefly luciferase. In Figure 3C, we plotted the ratio of PAR-T NanoLuc to firefly luciferase.

      Fig. 4C, it seems that Firefly luciferase was consistently brighter with PARGi, and I wonder if such difference is statistically significant. The authors did not perform a twoway ANOVA test for the firefly luciferase dataset.

      We included the statistics to indicate that these changes were not significant.

      The statement "Moreover, none of these sensors can detect PAR accumulation in vivo." seems to lack support. Have the authors proved that with evidence? I would recommend using the following statement instead: "Moreover, none of these sensors has yet demonstrated detection of PAR accumulation in vivo

      We made this change.

      For the in vivo experiment, it is unclear about the benefits of normalizing the PAR-T radiance to the Firefly luciferase since the signals from Firefly luciferase did not overlap well with that from the PAR-T nano luciferase, which may cause bigger variations.

      We thank the reviewer for raising this point. We normalize the luminescence from PAR-T NanoLuc to that from firefly luciferase to account for the variability in tumor size between the mice. We think this is an important control in the analysis. The luminescence from firefly luciferase represents the differences in tumor size between the mice. Hence, that signal is greater than the signal from PAR-T NanoLuc and is spread over a larger area.

      Judging from the data of Fig 3 supplement 1E, the signal intensity from the split firefly luciferase-based PAR-T sensors was ~10000 fold less than intact firefly luciferase, not ~1000 fold. It makes more sense to give up the split firefly luciferase for ~10000 fold differences since the signal intensity from the split nano luciferase was ~1000 fold less than intact firefly luciferase (Fig 5B).

      We noted the reviewers concern about the split firefly luciferase PAR-T. We agree with the reviewer that the split nano luciferase is brighter than the split firefly luciferase (Figure 4C and Figure 4 - figure supplement 1E). Although split nano luciferase is 1000-fold dimmer than the intact firefly luciferase in vivo (Figure 8B and Figure 8 - figure supplement 1A), this difference is only 30-fold in in vitro assays (Figure 4C). Hence, the comparison of sensors based on split firefly luciferase to split nano luciferase highlights our efforts to make a brighter sensor. Moreover, we included the split firefly luciferase data to compare the performance of WWE vs macrodomain in the development of the PAR-T NanoLuc sensor. Since firefly luciferase is frequently used for sensor development, we believe that it is important to include the results obtained from this sensor.

      Therefore, developing tools to measure ADPR dynamics in cells and in vivo is critical for better understating the various biological processes mediated by ADPR". "understating" should be "understanding".

      We corrected this error.

    1. Author Response:

      Reviewer #1:

      A number of metabolic traits show differences between reciprocal crosses of inbred mouse strains. These can be conceptualized as parent-of-origin effects. Differential expression (DE) at an unimprinted locus can be a pleiotropic side-effect of allele-specific expression (ASE) at an imprinted locus. There is no parent-of-origin effect at the unimprinted locus, in the sense that there would be no change in expression at this locus if the parental origin of the two alleles were reversed while keeping parental origin of alleles at the imprinted locus unchanged. Sexual recombination will have the effect of randomizing alleles at the unimprinted locus relative to alleles at the imprinted locus.

      Expression of the imprinted gene Nnat and unimprinted gene F2r were correlated in the author's analyses of reproductive fat pads from their mice and their interaction was predictive of basal glucose levels. In a single-celled analysis, using an existing database, expression of the paternally-expressed imprinted gene Nnat increases and expression of the unimprinted gene F2r decreases along an adipogenic trajectory in preadipocytes.

      One reason why small mammals, such as mice, socialize is to keep warm. Within a huddle or nest, heat generation consumes individual substrates for a communal benefit and this can create selection for imprinted expression when members of groups are asymmetric kin. This conflict has been discussed in the context of the effects of imprinted genes on brown adipocytes that use UCP1 to 'uncouple' oxidative phosphorylation in mitochondria (see Current Biology 18: R172). A UCP1-independent mechanism of non-shivering thermogenesis in skeletal muscle, and beige adipocytes (see Frontiers in Endocrinology 11: 498, involves 'uncoupling' of the SERCA channel that is regulated by Nnat. The role of Nnat is SERCA-dependent thermogenesis is yet to be established.

      This is an interesting hypothesis, and one that extends our hypothesis by suggesting that Nnat alters beige thermogenesis, which would alter basal glucose levels. The primary effect would be adipogenic and the secondary effect could be thermogenic, both resulting in altered glucose levels. This would be very interesting to follow-up on in subcutaneous adipose which has been shown to beige in mice. We have added to the text in the discussion.

      Reviewer #2:

      The authors conducted experiments to examine whether non-imprinted genes interacted with imprinted genes, explored gene pairs that may affect phenotype, and identified two genes, Nnat and Cdkn1c that may initiate parent-of-origin effects. A major strength is the testing of these predictions in a new cohort by manipulating phenotype by diet.

      I focus my review on the design. My major concern is the nature of the group assignment to diet, which would require different statistical analysis.

      "At three weeks of age, animals were weaned into same-sex cages and randomly placed on high fat (42% kcal from fat; Teklad TD88137) or low-fat (15% kcal from fat; Research Diets D12284) isocaloric diets" - To be sure I and readers can understand, were the animals first placed into same-sex cages, then the cage was randomly assigned to a diet? If so, the unit of analysis is the cage, and results need to be analyzed as though the animals are not independent within cages. This does not currently seem to be the case, and the analyses not valid.

      We thank the reviewer for the comment and we have added detail to the methods for clarity. The animals in each cage for each generation represent animals from different litters. The experimental unit of analysis is the individual animal: for the RNAseq studies one animal from each cage was randomly selected for sequencing in the F1 population. For the F2 animals, animals that were randomly placed into a cage and diet were not genetically identical. Specific details are described in the methods.

      • Please describe how many animals were housed per cage and how many cages there were for each diet.

      These details have been added to the methods.

      • Please describe the method of randomization.

      We used a random number generator in R, and have included this detail in the text.

      To fully assess the methods and facilitate reproducibility, additional information is needed to describe items as recommended by the ARRIVE guidelines (https://arriveguidelines.org/sites/arrive/files/documents/ARRIVE%20guidelines%202.0%20-%20English.pdf). This includes: the number of animals represented in each experiment/figure (this is not clear throughout the text); the sex of animals used in each experiment; whether order of measurements or animal/cage positioning were randomized (or if not); whether any blinding was performed; housing conditions if available (e.g., room temperature and humidity, light/dark cycle), additional details about diet (whether ad lib, water quality), whether any animals were excluded from analysis after being assigned to a diet; cage type and bedding.

      We have added these details to conform to the ARRIVE guidelines as suggested.

      Reviewer #3:

      Macias-Velasco et al. aimed to demonstrate that non-imprinted genes could generate parent-of-origin effects on metabolic traits, such as glucose concentrations, through interactions with imprinted genes. They used four populations at different levels of intercrossing of inbred mice, and by doing so were able to demonstrate that non-imprinted genes interact with imprinted genes and by doing so impact the animal's phenotype. They focused on two genes in particular, Nnat, and F2r, in high fat-fed female mice as the covariation of these two genes associated with basal glucose concentrations and the relationship. They provided a biological validation of this relationship by demonstrating it consistently across multiple generations. Interactions between imprinted and non-imprinted genes have been previously demonstrated, but the present study took it one step further to identify one possibly reason as to why there is such a prevalence of parent-of-origin effects by detailing the interactions of ASE and DE genes and how that interaction leads to a specific phenotype. The genes the authors identified appear to play a role in adipogenesis. The results may allow for better prediction of an individual's phenotype based on their genome.

      The authors conducted a lot of work, and provided a lot of data for the reader, but the paper can be strengthened by the following:

      In the results the authors refer to females on a high-fat diet. It would be useful for the reader to put this in a bigger context and explain the implication of diet. It was unclear whether the figures represented sexes combined, and if so, it would have been useful to show the results in males, even if they were null. The authors demonstrated that Nnat expression covaries with F2r in high fat-fed females, yet used available scRNAseq data collected from C57BL/6J epididymal adipose tissue to examine which cell types express these two genes and whether the negative correlation persisted across an adipogenic trajectory (which it did). In mice, the visceral fat pads in the perigonadal region are known as epididymal in males and periovarian in females. Although it was not implicitly stated that these cells were from males, it is presumed by the name of the fat pad. The authors should address possible limitations and considerations related to sex differences, and possibly even strain differences in these comparisons.

      The clarity of the methods and materials needs to be improved. Specifically, to allow for others to reproduce the data and to provide greater transparency, in addition to following the eLife guidelines. The authors should follow the ARRIVE guidelines and cite this in the manuscript. With the multiple populations of mice used, it would make it easier on the reader if the sample size and sex were more clearly outlined. Although the authors state that the mice were randomly placed on high-fat or low-fat diet, how the animals were randomized was not outlined. Another possible consideration would be to include a figure outlining the study design, including sample size/sex for the various populations and components of the study. Also, please identify why the sample size was decided, and whether there were any inclusion/exclusion criteria.

      We thank the reviewer for the comments and have modified the text for clarity.

    1. Author Response:

      Reviewer #1:

      While the implications are compelling, a few other controls and analyses would better establish the link between arousal level and the TRR.

      First, it is difficult to link changes in task difficulty to arousal level without demonstrating that the subjects did not change their strategy between easy and difficult task conditions by, for example, looking directly at the more difficult targets instead of maintaining central fixation as the task required. Without this control, the changes reported in TRRs could be attributed to changes in eye movements and the concomitant changes in the the visual field, especially given measurements were made in visual cortex.

      We have analyzed observers’ eye movements to rule out this possibility. Please see the new section of the results: “An eye-movement-evoked artifact could not account for task-related fMRI responses."

      In the same vein, a more detailed or explicit differentiation between the stimulus or attention-evoked hemodynamic response and the TRR is necessary to help the reader evaluate the TRR without simultaneous eye tracking to remove trials where the visual field may have changed.

      Please see the new section of the supplement titled “Stimulus-evoked activity during the localizer” and the new supplementary figures S4 and S5.

      Given arousal is a loosely defined cognitive phenomenon, physiologic arousal markers (ex. pupil, heart rate, respiration) are commonly used to track changes in arousal level, as is the case in this work. The evidence in this work that arousal level changed between task conditions (ex. difficult and easy trials) requires a more detailed analysis to control for the large number of variables and determine the effects that survive. While an accompanying data set showed changes in pupil diameter in a manner consistent with arousal changes during the task, this data was recorded in a separate experiment. This does provide a source of eye movement data for potential control analysis.

      We have eye data from the pupillometry recordings outside the scanner. We have reanalyzed these data, comparing the variance in eye position between conditions. We did not find any statistically significant differences in this measure of fixation stability between conditions.

      If the widespread BOLD responses were luminance-evoked, i.e., caused by eye movements changing the pattern of retinal stimulation, we would have expected to observe the strongest responses in V1 at eccentricities close to the representation of the screen edge. To test this hypothesis, we analyzed the BOLD signal amplitude across the representation of the visual field, but did not observe larger response amplitudes in the cortical locations corresponding to the edge of the screen.

      Lastly, the authors speculate about the origin of the TRR by comparing its magnitude and modulation in different task conditions across different levels of the visual cortical hierarchy (V1 vs. V2 vs V3). A direct statistical comparison of these effects would be necessary to convincingly demonstrate differences in the TRR across visual regions.

      We have performed this comparison and reported the results.

      Reviewer #2:

      The discussion suggests that this relates to a LC-NE arousal process. The connection is suggested by the data, but further work would be needed to cement this idea. The data are interesting, and a good window into further understanding of this effect.

      In the discussion line 330, they suggest that the TRR should be separately modeled and removed from fMRI data in preprocessing. While the authors have convinced this reader that the TRR is likely related to arousal, it is far from clear that this means that this effect should be removed from fMRI data in preprocessing. Many arousal effects exist naturally in fMRI data, and in brain activity in general. Many arousal effects are observable in spiking and LFPs. Since no spiking or LFPs were measured here, we don't know whether this signal is or is not related to spiking or LFPs (though some data from monkeys suggests a similar signal is hemodynamic only, it would take more convincing that the current TRRs arise from the same process as the previously reported primate literature).

      Agreed. We do not suggest that the TRR should be removed from fMRI data in all or even most scenarios. It can be removed if one has a specific reason to do so, e.g. separating a mixture of stimulus-evoked and task-related responses. We have revised our statement on this to clarify.

      Reviewer #3:

      However, a weakness of the paper is that the authors do not pursue the computational/functional significance, nor the biological drivers, of TRRs. For instance, linking TRRs to an explicit model of decision-making (beyond showing they covary with RTs and lapses), or further discussing their potential link to widespread arousal and movement variables in rodent calcium imaging and ephys data, would strongly increase the interest from those beyond the visual fMRI community.

      We have now discussed how our findings relate to a relevant computational model that links arousal and decision-making. And we have discussed how merging linear modeling of the TRR with computational models of decision-making may be a fruitful future direction for research. We have also discussed the possible links to widespread arousal-related cortical responses observed in rodent calcium imaging and ephys data.

    1. Author Response:

      JMML is a rare pediatric leukemia, emanating from mutations in the RAS pathway. One of the most frequent genetic causes are loss of function mutations in PTPN11. Using genetically-engineered zebrafish, the investigators show that a chronic inflammatory state is present.

      Comments:

      1. A mouse model for Noonan Syndrome with overlap with JMML, PTPN11 D61G , displays myeloproliferative, cardiac, and craniofacial disease. Here, the zebrafish ptpn11 D61G displayed a wide penetrance depending on allelic burden.

      2. The hematopoietic effects in the affected fish involve the myeloid compartment. There appears to be no zebrafish ortholog for GM-CSF, and instead the author look at effects of Gcsf. They note enhanced GM colony formation -- but Gcsf should not promote macrophage development. In addition, the effect in human is that of spontaneous growth of CFU-GM. The authors would need to address the differences with human JMML malignant hematopoiesis.

      3. The use of MEK or PI3K inhibitors do not themselves implicate proinflammatory response (line 273). These agents are not anti-inflammatory but have a wide range of effects. They are as much anti-proliferative. To demonstrate inhibition of proinflammatory response would require true anti-inflammatory agents. Dexamethasone targets lymphocytes, and may also reduced cytokine release by macrophages in fish. Controls should include genes that are biomarkers for inflammation (i.e., not l-plastin or c-myb).

      We have rephrased the text to make sure we did not unintentionally suggest that MEK or PI3K inhibitors are anti-inflammatory agents. We used MEK and PI3K inhibitors, because these are known to inhibit signaling downstream of SHP2. As anti-inflammatory agent, we used dexamethasone, which rescued many of the observed blood defects in Shp2-D61G mutant zebrafish embryos. We observed a rescue of the number of c-myb and l-plastin expressing cells (read-out for HSPCs). We have done additional experiments and assessed that the increase in the number of neutrophils and macrophages was largely rescued by dexamethasone, showing that dexamethasone targets the myeloid lineage. Moreover, dexamethasone rescued the inflammatory response in Shp2- D61G expressing embryos as assessed by expression of the inflammatory response genes, tnfa, gcsfb and il1b (new Fig. 6D-G). Based on these data, we conclude that the inflammatory response in Shp2- D61G mutant zebrafish embryos may have a causal role in the pathogenesis of the NS/JMML-like MPN blood phenotype.

      Reviewer #3:

      The authors of this paper model the D61G mutation in zebrafish, creating a model consistent with the human Noonan syndrome, which is predisposed to a JMML and MPN like syndrome. They use RNA-seq to identify the potential cellular abnormalities in either the HSPC or monocyte/macrophage clusters, which nominates an inflammatory signature as being pathogenic. They complement this with analysis of human JMML patients, showing a similar inflammatory signature.

      The study nicely provides a new model that can be used as the basis of future studies in the field. Because the mutant variably displays phenotypes along a spectrum from NS to MPN, different researchers can choose to focus on this as they see fit. Where the manuscript falls short is in more clearly delineating the defect in HSPC vs. monocyte/macrophages (especially in comparing fish to human) and at least a hint of the involved mechanisms.

      In fish and human, we are analyzing RNA expression in HSPCs, consisting of HSC- like cells and early progenitors, but not fully differentiated monocytes and/or macrophages. We have used the single cell sequencing dataset of zebrafish HSPCs to determine when pro-inflammatory gene expression was evoked. We used Monocle trajectory inference analysis and found that differentiation started in HSC-like cells and progressed in two directions in pseudotime, towards monocyte/macrophage progenitors in one direction and towards thrombocyte and erythrocyte progenitors in the other direction. The analysis showed that proinflammatory genes are evoked during early differentiation of monocyte/macrophage progenitor cells specifically in the Shp2D61G mutants. We have included this analysis in the new panels C-E in Fig. 4 and Fig.4 – figure supplement 1 of the revised version of the manuscript and accompanying textual changes in the Results section.

    1. Author Response:

      Reviewer #1:

      In this study, Barrasso et. al., investigate the previously reported positive association between the commensal P. aminovorans and V. cholerae in the human gut during infection. The authors find that P. aminovorans and V. cholerae interact in vitro to form a dual-biofilm. Using a suckling mouse model of infection, the authors also demonstrate that V. cholerae gut colonization is enhanced in the presence of P. aminovorans, and that this colonization enhancement depends on the ability of V. cholerae to produce biofilm exopolysaccharide and biofilm proteins. Overall, the experiments are well-performed and the findings are interesting. My major comment is that the authors should perform some further analysis to demonstrate a definitive causal relationship between the abundance of P. aminovorans and V. cholerae colonization, which would help to strengthen their conclusions.

      1) The authors find a positive correlation between the abundance of P. aminovorans and V. cholerae infection by the observation that 6 of 22 infected individuals harbored detectable levels of P. aminovorans in rectal swabs, compared to only 2 of the 36 uninfected individuals (Figure 1). The authors then go on to demonstrate that there is an increase in the colonization of V. cholerae (CFU) in the presence of P. aminovorans in the suckling mouse model on infection (Figure 2B and Figure 2C). The author's conclusions would be strengthened by demonstrating a positive association between the presence of P. aminovorans and the abundance of V. cholerae in the human samples (from Figure 1). The authors demonstrate that the presence of P. aminovorans is associated with the number of people infected with V. cholerae, but not that the presence of P. aminovorans also leads to higher levels of V. cholerae in the 6 of infected individuals harboring detectable levels of P. aminovorans, compared to the infected individuals that did not. This additional analysis would help to solidify the authors conclusions that the presence of P. aminovorans enhances the colonization of V. cholerae during infection.

      In the prior study in which the association between P. aminovorans and V. cholerae infection was first noted (Midani F et al, JID,, July13;218[4] :645-653), among the 4,181 different operational taxonomic units ([OTU], a taxonomic grouping based roughly on species-level identification) identified in the stool of persons infected with V. cholerae, Pa was among the top 5 OTUs found to be most abundant during infection in a machine learning analysis. This analytic method provided additional resolution about the correlative relationships between gut microbes and clinical outcomes beyond that of community structure-based analyses (Midani 2018). We have added new Supplementary File 1 which contains the full data (raw and normalized abundance) with counts in persons with and without Pa found in the stool from the Midani 2018 study. Total bacterial counts in the stool of each group did not differ, and the average Vc OTUs were higher in the group with Pa in the stool; however, this difference was not statistically significant (Supplemental File 1), which may have been due to an underpowered comparison. The wide range of Vc abundance in the human samples reflects the cross-sectional nature of this human data in that study participants are likely to be at different stages of colonization or infection at the time of sampling (compared to mice who are sampled at the same time since infection). The limitations of V. cholerae stool culture for diagnosis of infection are evident in Supplemental File 1, in which 16S detection of Vc and culture positivity were discordant in some samples (see column 3, Vc infection determination, in Supplemental File 1, all previously published data). Notably, stool culture is the gold standard for diagnosing V. cholerae infection. Rationale and additional information about these classifications and the machine learning analysis methods are included in the prior manuscript (Midani 2018). In summary, we feel that our human data from the prior study are not of sufficient in sample size to fully resolve the question asked by the reviewer, and we have added the statistical testing of Vc counts in Pa- infected and Pa-uninfected humans in our results section and to Supplemental File 1. We have also included these results in the revised manuscript. We have also included the raw and normalized abundances data in Supplementary File 1.

      2) It's surprising that the relative abundance Proteobacteria does not change much after the introduction of >10^6 CFU of P. aminovorans, which would be expected to represent a significant proportion of the abundance of the total bacteria in the small intestine (Supplemental Figure 1). It would be important for the authors to determine whether the addition of P. aminovorans and not the displacement of other members of the Proteobacteria Phylum leads to the increased V. cholerae CFU in Figure 2B and Figure 2C.

      To show with more resolution the impact of Pa on the gut microbiota of the mouse we have added a Panel C to Figure 2-figure supplement 1 Panel C (original Supplemental Figure 1) showing order-level abundance data within the Proteobacteria phylum (to which Pa belongs). Differences in the two groups were not found to be significant with FDR-adjusted significance testing at multiple taxonomic levels. As in humans, it may be the case that a small amount of Pa is sufficient to impact Vc colonization, and it could be that only small amounts of Pa remain after inoculation.

      3) Expression analysis should be performed to determine whether there is an increase in virulence factor expression in V. cholerae in the presence of P. aminovorans. This important given that previous studies have demonstrated that biofilm growth leads to the upregulation of V. cholerae virulence factors, which may enhance colonization.

      We have added new Figure 7-figure supplement 1 showing the qRT-PCR results for measuring the relative transcript levels of ctxA and tcpA in Vc monoculture and Vc-Pa co-culture under two different culture conditions. First, we compared virulence gene expression under the AKI growth conditions (Iwanaga et al, Microbiology and immunology, 30(11), 1075–1083) because El Tor biotype Vc does not express virulence genes optimally under standard lab growth conditions (DiRita et al, PNAS, 93(15):7991-5). Under the AKI conditions, we observed a slight increase of the transcript levels of tcpA, but not ctxA, in Vc cocultured with Pa. It should be noted that, in addition to the use of a special medium, a biphasic growth (static growth followed by shaking for extended period) is needed for inducing virulence gene expression in AKI. We reasoned that such growth conditions did not allow Vc to interact sufficiently with Pa to form a robust biofilm, likely preventing a strong induction of virulence gene expression in the AKI coculture.

      We also measured virulence gene expression in regular LB medium under static growth conditions (similar to the conditions used to grow the coculture biofilms). As discussed above, we noted that under these growth conditions the relative transcript levels of these virulence genes are low (e.g., compared to the relative transcript levels of vpsL), suggesting this is not an ideal condition for virulence gene expression comparison in mono- and co-culture. Nonetheless, we found that ctxA and tcpA transcript levels are slightly higher in the static LB coculture (although only ctxA comparison is statistically significant).

      Reviewer #3:

      The presence of the organism Paracoccus aminovorans in stool was previously shown to correlate with susceptibility of humans to infection with V. cholerae and to enhance agglutination and growth of V. cholerae. In this manuscript, the authors use a neonatal mouse model as well as in vitro models to demonstrate that the association between Paracoccus aminovorans and V. cholerae occurs in a VPS-dependent biofilm and enhances colonization of the neonatal mouse intestine.

      The strengths of this manuscript:

      1) Examination of P. aminovorans-V. cholerae interaction in the small intestine of the neonatal mouse model. V. cholerae colonizes the terminal ileum yet most of the human microbiota studies examine stool, which is unlikely to be representative of the terminal ileum. In addition, adult models of infection such as the gnotobiotic or antibiotic-treated mouse display colonic but not true ileal colonization. Furthermore, this colonization is not dependent on the V. cholerae toxin co-regulated pilus, which is necessary for human infection. In fact, flow in the colon is slow enough to allow growth without true attachment to the surface. This may explain why, as the authors note, the diarrhea of cholera clears most of the microbiota from wtool samples. Therefore, stool exiting the colon and the adult mouse are not ideal for studying the interaction between the microbiota and V. cholerae during infection. By using the neonatal mouse, the authors choose a host compartment that is relevant to human disease. The findings of the authors that P. aminovorans improves V. cholerae colonization of the small intestine are very convincing.

      2) Use of microscopy to detail the distribution of the two organisms in culture. Imaging clearly demonstrates subdomains of the biofilm that contain mixtures of P. aminovorans and V. cholerae.

      The weaknesses of this manuscript:

      1) Specific markers for VPS exopolysaccharide and P. aminovorans are not used: The authors conclude that V. cholerae increases VPS synthesis in response to P. aminovorans based on increased WGA staining in regions of biofilms where P. aminovorans is concentrated. One concern is that WGA is not a specific marker for VPS. It adheres to GlcNAC residues, which could also be present in an extracellular polysaccharide synthesized by P. aminovorans.

      In response to the reviewer’s question, we added negative controls (new Figure 4-figure supplement 1) showing that Pa cells along or Vc ΔvpsL mutant does not show WGA staining. While it is true that WGA stains GlcNAC residues, for Gram-negative bacterial cells including Vc and Pa, the WGA lectin molecules do not pass through the outer membrane. Only when the outer membrane is significantly impaired will one see strong WGA signal. Indeed, we always observe dead cells with strong WGA signal but those are easily distinguished from the VPS signal. We have updated Figure 6A-B accordingly to better show the local distribution of VPS.

      Furthermore, the authors image biofilms that include neon-green-expressing V. cholerae and unlabeled P. aminovorans by staining with FM4-64. Regions of the biofilm with predominantly FM4-64 staining are presumed to have greater numbers of P. aminovorans. This is more convincing because the shape of these cells is coccoid as might be expected for P. aminovorans. However, this is not a feature that provides specificity. Because the authors also indicate that these clusters of Pa are surrounded by V. cholerae¬-synthesized VPS, it is important to definitively identify these cells as Pa.

      We thank the reviewer for raising this point. In the new Figure 6A-B, we have provided clearer zoomed-in images in which Pa cells can be clearly distinguished from Vc cells by BOTH the absence of SCFP3A signal AND the characteristic cocci shape.

      2) There is no examination of activation of VPS synthesis or other virulence factors at the transcriptional level. The authors conclude that VPS synthesis is activated by WGA staining but do not provide additional data to show whether this activation occurs at the transcriptional or post-transcriptional level. If transcription activation of vps genes were observed, this would bolster the WGA staining result.

      In response to the reviewer’s request, we have obtained a strain harboring Pvpsl-mNeonGreen and we have repeated the pellicle imaging in the updated Figure 6C-D. Interestingly, subpopulations of Vc cells have elevated vpsL expression when co-cultured with Pa. In addition, we have performed qRT-PCR experiments to show that indeed the relative transcript levels of vpsL is higher in the co-culture (Figure 5).

    1. Author Response:

      Reviewer #1:

      The paper uses a microfluidic-based method of cell volume measurement to examine single cell volume dynamics during cell spreading and osmotic shocks. The paper successfully shows that the cell volume is largely maintained during cell spreading, but small volume changes depend on the rate of cell deformation during spreading, and cell ionic homeostasis. Specifically, the major conclusion that there is a mechano-osmotic coupling between cell shape and cell osmotic regulation, I think, is correct. Moreover, the observation that fast deforming cell has a larger volume change is informative.

      The authors examined a large number of conditions and variables. It's a paper rich in data and general insights. The detailed mathematical model, and specific conclusions regarding the roles of ion channels and cytoskeleton, I believe, could be improved with further considerations.

      We thank the referee for the nice comment on our work and for the detailed suggestions for improving it.

      Major points of consideration are below.

      1) It would be very helpful if there is a discussion or validation of the FXm method accuracy. During spreading, the cell volume change is at most 10%. Is the method sufficiently accurate to consider 5-10% change? Some discussion about this would be useful for the reader.

      This is an important point and we are sorry if it was not made clear in our initial manuscript. We have now made it more clear in the text (p. 4 and Figure S1E and S1F).

      The important point is that the absolute accuracy of the volume measure is indeed in the 5 to 10% range, but the relative precision (repeated measures on the same cell) is much higher, rather in the 1% range, as detailed below based on experimental measures.

      1) Accuracy of absolute volume measurements. The accuracy of the absolute measure of the volume depends on several parameters which can vary from one experiment to the other: the exact height of the chamber, and the biological variability form one batch of cell to another (we found that the distribution of volumes in a population of cultured cells depends strongly on the details of the culture – seeding density, substrate, etc... - which we normalized as much as possible to reduce this variability, as described in previous articles, e.g. see2). To estimate this variability overall, the simplest is to compare the average volume of the cell population in different experiments, carried out in different chambers and on different days.

      Graph showing the initial average volume of cells +/- STD for 7 spreading experiments and 27 osmotic shock experiments, expressed as a % deviation from the average volume over all the experiments.

      The average deviation is of 10.9 +/- 8%

      2) Precision of relative volume measurements. When the same cell is imaged several times in a time-lapse experiment, as it is spreading on a substrate, or as it is swelling or shrinking during an osmotic shock, most of the variability occurring from one experiment to another does not apply. To experimentally assess the precision of the measure, we performed high time resolution (one image every 30 ms) volume measurements of 44 spread cells during 9 s. During this period of time, the volume of the cell should not change significantly, thus giving the precision of the measure.

      Graph showing the coefficient of variation of the volume (STD/mean) for each individual cell (n=44) across the almost 300 frames of the movie. This shows that on average the precision of volume measurements for the same cell is 0.97±0.21%. In addition, if more precision was needed, averaging several consecutive measures can further reduce the noise, a method which is very commonly used but that we did not have to apply to our dataset.

      We have included these results in the revised manuscript, since they might help the reader to estimate what can be obtained from this method of volume measurement. We also point the reviewer to previous research articles using this method and showing both population averages and time-lapse data2–8 . Another validation of our volume measurement method comes from the relative volume changes in response to osmotic shock (Ponder’s relation) measured with FXm, which gave results very similar to the numbers of previously published studies. We actually performed these experiments to validate our method, since the results are not novel.

      2) The role of cell active contraction (myosin dynamics) is completely neglected. The membrane tether tension results, LatA and Y-compound results all indicate that there is a large influence of myosin contraction during cell spreading. I think most would not be surprised by this. But the model has no contribution from cortical/cytoskeletal active stress. The authors are correct that the osmotic pressure is much larger than hydraulic pressure, which is related to active contraction. But near steady state volume, the osmotic pressure difference must be equal to hydraulic pressure difference, as demanded by thermodynamics. Therefore, near equilibrium they must be close to each other in magnitude. During cell spreading, water dynamics is near equilibrium (given the magnitude of volume change), and therefore is it conceptually correct to neglect myosin active contraction? BTW, 1 solute model does not imply equal osmolarity between cytoplasm and external media. 1 solute model with active contraction was considered before, e.g., ref. 17 and Tao, et al, Biophys. J. 2015, and the steady state solution gives hydraulic pressure difference equal to osmotic pressure difference.

      This is an excellent point raised by the referee. We have two types of answers for this. First an answer from an experimental point of view, which shows that acto-myosin contractility does not seem to play a direct role in the control of the cell volume, at least in the cells we used here. Based on these results we then propose a theoretical reason why this is the case. It contrasts with the view proposed in the articles mentioned by the referee for a reason which is not coming from the physical principles, with which we fully agree, but from the actual numbers, available in the literature, of the amount of the various types of osmolytes inside the cell. We give these points in more details below and we hope they will convince the referee. We also now mention them explicitly in the main text of the article (p. 6-7, Figure S3F) and in the Supplementary file with the model.

      A. Experimental results

      To test the effect of acto-myosin contraction on cell volume, we performed two experiments:

      1) We measured the volume of same cell before and after treatment with the Rho kinase ROCK inhibitor Y-27632, which decreases cortical contractility. The experiment was performed on cells plated on poly-L-Lysin (PLL), like osmotic shock experiments, a substrate on which cells adhere, allowing the change of solution, but do not spread and remain rounded. This allowed us to evaluate the effect of the drug. Cells were plated on PLL-coated glass. The change of medium itself (with control medium) induced a change of volume of less than 2%, similar to control osmotic shock experiments (maybe due to shear stress). When the cells were treated with Y-27, the change of volume was similar to the change with the control medium (now commented in the text p. 6-7, Figure S3F). To make the analysis more complete, we distinguished the cells that remained round throughout the experiment from the cells which slightly spread, since spreading could have an effect on volume. Indeed we observed that treatment with Y-27 induced more cells to spread (Figure S3F), probably because the cortex was less tensed, allowing the adhesive forces on PLL to induce more spreading9. Nevertheless, the spreading remained rather slow and the volume change of cells treated or not with Y-27 was not significantly different. This shows that, in the absence of fast spreading induced by Y-27, the reduction of contractility per se does not have any effect on the cell volume.

      Graphs showing proportion of cells that spread during the experiments (left); average relative volume of round (middle) and spread (right) control (N=3, n=77) and Y-27 treated cells (N=4, N=297).

      2) To evaluate the impact of a reduction of contractility in the total absence of adhesion, we measured the average volume of control cells versus cells which have been pretreated with Y-27, plated on a non-adhesive substrate (PLL-PEG treatment). This experiment showed that the volume of the cells evolved similarly in time for both conditions, proving that contractility per se has no effect on the cell volume or cell growth, in the absence of spreading.

      Graphs showing average relative volume of control (N=5, n=354) and Y-27 (N=3, n=292) treated cells plated on PLL-PEG (left); distributions of initial volume for control (middle) and Y-27 treated cells (right) represented on the left graph.

      Taken together these results show that inhibition of contractility per se does not significantly affect cell volume. It thus confirms our interpretation of our results on cell spreading that reduction of contractility has an effect on cell volume, specifically in the context of cell spreading, primarily because it affects the spreading speed.

      B. Theoretical interpretation

      In accordance with our experiments, in our model, the effect of contractility is implicitly included in the model because it modulates the spreading dynamics, which is an input to the model, i.e. through the parameters tau_a and A_0.

      We do not include the effect of contractility directly in the water transport equation because our quantitative estimates support that the contribution of the hydrostatic pressure to the volume (or the volume change) is negligible in comparison to the osmotic pressure, and this even for small variation near the steady-state volume. The main important point is that the concentration of ions inside the cell is actually much lower than outside of the cell10,11. The difference is about 100 mM and corresponds mostly to nonionic small trapped osmolytes, such as metabolites12. The osmotic pressure corresponding to this is about 10^5 Pa. Taking the cortical tension to be of order of 1 mN/m and cell size to be about ten microns we get a hydrostatic pressure difference of about 100 Pa due to cortical tension. A significant change in cell volume, of the order observed during cell spreading (let’s consider a ten percent decrease) will increase the osmotic pressure of the trapped nonionic osmolytes by 10^4 Pa (their number in the cell remaining identical). For this osmotic pressure to be balanced by an increase in the hydrostatic pressure, the cortical tension would need to increase by a factor of 100, which we consider to be unrealistic. Therefore, we find it reasonable to ignore the contribution of the hydrostatic pressure difference in the water flux equation. It is also consistent with the novel experiments presented above which show that inhibition of cortical contractility changes the cells volume below what can be detected by our measures (thus likely at maximum in the 1% range). This is now explained in the main text and Supplementary file.

      Regarding our minimal model required to define cell volume, the reason why we believe one solute model is not sufficient is fundamentally the same as above: the concentration of trapped osmolytes is comparable to the total osmolarity, which means that their contribution to the total osmotic pressure cannot be discarded. Secondly, within the simplest one solute model, the pump and leak dynamics fixes in inner osmolytes concentration but does not involve the actual cell size. The most natural term that depends on the size is the Laplace pressure (inversely proportional to the cell size in a spherical cell model). But as discussed above, this term may only permit osmotic pressure differences of the order of 100 Pa, corresponding to an osmolytes concentration difference of the order of 0.1 mM. That is only a tiny fraction of the external medium osmolarity, which is about 300 mM. Such a model could thus only work for extremely fine tuning of the pump and leak rates to values with less than about 1% variation. Furthermore, such a model could not explain finite volume changes upon osmotic shocks without involving huge (100-fold) cell surface tension variations, as discussed above. For these reasons, we believe that the one-solute model is not appropriate to describe our experiments, and we feel that a trapped population of nonionic osmolytes is needed to balance the osmolarity difference created by the solute pump and leak.

      In the revised version of the manuscript, we have now added a section in Supplementary file and in the main text, explaining in more detail this approximation.

      3) The authors considered the role of Na, K, and Cl in the model, and used pharmacological inhibitors of NHE exchanger. I think this part of the experiments and model are somewhat weak. I am not sure the conclusions drawn are robust. First there are many ion channels/pumps in regulating Na, K and Cl. The most important of which is NaK exchanger. NHE also involves H, and this is not in the model. The ion flux expressions in the model are also problematic. The authors correctly includes voltage and concentration dependences, but used a constant active term S_i in SM eq. 3 for active pumping. I am not sure this is correct. Ion pump fluxes have been studied and proposed expressions based on experimental data exist. A study of Na, K, Cl dynamics, and membrane voltage on cell volume dynamics was published in Yellen et al, Biophys. J. 2018. In that paper, they used different expressions based on previously proposed flux expressions. It might be correct that in small concentration differences, their expressions can be linearized or approximated to achieve similar expressions as here. But this point should be considered more carefully.

      We thank the reviewer for this comment. Indeed, we have not well justified our use of the NHE inhibitor EIPA. Our aim was not to directly affect the major ion pumps involved in volume regulation (which would indeed rather be the Na+/K+ exchanger), because that would likely strongly impact the initial volume of the cell and not only the volume response to spreading, making the interpretation more difficult. We based our choice on previous publication, e.g.13, showing that EIPA inhibited the main fast volume changes previously reported for cultured cells: it was shown to inhibit volume loss in spreading cells, as well as mitotic cell swelling14,15. Using EIPA, we also found that, while the initial volume was only slightly affected, the volume loss was completely abolished even in fast spreading cells (Y-27 and EIPA combined treatment, Figure S5H). This clearly proves that the volume loss behavior can be abolished, without changing the speed of spreading, which was our main aim with this experiment.

      The most direct effect of inhibiting NHE exchangers is to change the cell pH16,17, which, given the low number of H protons in the cell (negligible contribution to cells osmotic pressure), cannot affect the cell volume directly. A well-studied mechanism through which proton transport can have indirect effect on cell volume is through the effect of pH on ion transporters or due to the coupling between NHE and HCO3/Cl exchanger. The latter case is well studied in the literature18. In brief, the flux of proton out of the cell through the NHE due to Na gradient leads to an outflux of HC03 and an influx of Cl. The change in Cl concentration will have an effect on the osmolarity and cell volume.

      We thus performed hyperosmotic shocks with this drug and we found that, as expected, it had no effect on the immediate volume change (the Ponder’s relation), but affected the rate of volume recovery (combined with cell growth). Overall, the cells treated with EIPA showed a faster volume increase, which is what is expected if active pumping rate is reduced. This is in contrast with the above mentioned mechanism of volume regulation which will to lead to a reduced volume recovery of EIPA treated cells. This leads us to conclude that there is potentially another effect of NHE perturbation. Changing the pH will have a large impact on the functioning of many other processes, in particular, it can have an effect on ion transport16. Overall, the cells treated with EIPA showed a faster volume increase, which is what is expected if active pumping rate is reduced.

      On the model side, the referee correctly points out that there are many ion transporters that are known to play a role in volume regulation which are not included in Eq. 3. In the revised manuscript we now start with a more general ion transport equation. We show that the main equation (Eq.1 - or Supplementary file Eq.13) relating volume change to tension is not affected by this generalization. This is because we consider only the linear relation between the small changes in volume and tension. We note that the generic description of the PML (Supplementary file Eqs.1-6) can be seen as general and does not require the pump and channel rates to be constant; both \Lambda_i and S_i can be a function of potential and ion concentration along with membrane tension. It is only later in the analysis that we do make the assumption that these parameters only depend on tension. This point is now made clear in the Supplementary file.

      There is a huge body of work both theoretical and experimental in which the effect of different ion transporters on cell volume is analyzed. The aim of this work is not to provide an analysis of cell volume and the effect of various co-transporters but is rather limited to understanding the coupling between cell spreading, surface tension and cell volume.

      To analytically estimate the sign of the mechano-osmotic coupling parameter alpha we use a minimal model. For this we indeed take the pumps and channels to be constant. As it is again a perturbative expansion around the steady state concentration, electric potential, and volume, the expression of alpha can be easily computed for a model with more general ion transporters. This generalization will come at the cost of additional parameters in the alpha expression. We decided to keep the simpler transport model, the goal of this estimate is merely to show that the sign of alpha is not a given and depends on relative values of parameters. Even for the simple model we present, the sign of alpha could be changed by varying parameters within reasonable ranges.

      Given these points, and the clarification of the reasons to use EIPA in our experiments, a full mechanistic explanation of the effect of this drug is beyond the scope of this work. Because of this we are not analyzing the effect of EIPA on the model parameter alpha in detail. We now clarified our interpretation of these results in the main text of the article.

      Reviewer #2:

      The work by Venkova et al. addresses the role of plasma membrane tension in cell volume regulation. The authors study how different processes that exert mechanical stress on cells affect cell volume regulation, including cell spreading, cell confinement and osmotic shock experiments. They use live cell imaging, FXm (cell volume) and AFM measurements and perform a comparative approach using different cell lines. As a key result the authors find that volume regulation is associated with cell spreading rate rather than absolute spreading area. Pharmacological assays further identified Arp2/3 and NHE1 as molecular regulators of volume loss during cell spreading. The authors present a modified mechano-osmotic pump and leak model (PLM) based on the assumption of a mechanosensitive regulation of ion flux that controls cell volume.

      This work presents interesting data and theoretical modelling that contribute new insight into the mechanisms of cell volume regulation.

      We thank the referee for the nice comments on our work. We really appreciate the effort (s)he made to help us improve our article, including the careful inspection of the figures. We think our work is much improved thanks to his/her input.

      Reviewer #3:

      The study by Venkova and co-workers studies the coupling between cell volume and the osmotic balance of the cell. Of course, a lot of work as already been done on this subject, but the main specific contribution of this work is to study the fast dynamics of volume changes after several types of perturbations (osmotic shocks, cell spreading, and cell compression). The combination of volume dynamics at very high time resolution, and the robust fits obtained from an adapted Pump and Leak Model (PLM) makes the article a step-forward in our understanding of how cell volume is regulated during cell deformations. The authors clearly show that:

      -The rate at which cell deforms directly impacts the volume change

      -Below a certain deformation rate (either by cell spreading or external compression), the cells adapt fast enough not to change their volume. The plot dV/dt vs dA/dt shows a clear proportionality relation.

      -The theoretical description of volume change dynamics with the extended PLM makes the overall conclusions very solid.

      Overall the paper is very well written, contains an impressive amount of quantitative data, comparing several cell types and physiological and artificial conditions.

      We thank the referee for the positive comment on our work.

      My main concern about this study is related to the role of membrane tension. In the PLM model, the coupling of cell osmosis to cell deformation is made through the membrane-tension dependent activity of ion channels. While the role of ion channels is extensively tested, it brings some surprising results. Moreover, the tension is measured only at fixed time points, and the comparison to theoretical predictions is not always as convincing as expected: when comparing fig 6I and 6J, I see that predictions shows that EIPA (+ or - Y27), CK-666 (+ or - Y27) and Y27 alone should have lower tension than in the control conditions, and this is clearly not the case in fig 6J. But I would not like to emphasize too much on those discrepancies, as the drugs in the real case must have broad effects that may not be directly comparable to the theory.

      We apologize for the mislabeling of the Figure 6I (now Figure 5I). This plot shows the theoretical estimate for the difference in tension (in the units of homeostatic tension) between the case when the cell loses its volume upon spreading (as observed in experiments) compared to the hypothetical situation when the cell does not lose volume upon spreading (alpha = 0). The positive value of the tension difference predicts that the cell tension would have been higher if the cell were not losing volume upon spreading, which is the case for the treatments with EIPA and CK-666 (+ Y27) and corresponds to what we found experimentally.

      It thus matches our experimental observations for drug treatments which reduce or abolish the volume loss during spreading and correspond to higher tether force only at short time.

      We have corrected the figure and figure legend and explained it better in the text.

      But I wonder if the authors would have a better time showing that the dynamics of tension are as predicted by theory in the first place, as comparing theoretical predictions with experiments using drugs with pleiotropic effects may be hazardous.

      Actually, a recent publication (https://doi.org/10.1101/2021.01.22.427801) shows that tension follows volume changes during osmotic shocks, and overall find the same dynamics of volume changes than in this manuscript. I am thus wondering if the authors could use the same technique than describe in this paper (FLIM of flipper probe) in order to study the dynamics of tension in their system, or at least refer to this paper in order to support their claim that tension is the coupling factor between volume and deformation.

      As was suggested by the referee, we tried to use the FLIPPER probe. We first tried to reproduce osmotic shock experiments adding to the HeLa cells 4% of PEG400 (+~200 mOsm) or 50% of H20 (-~170 mOsm) and measuring the average probe lifetime before and after the shock. We found significantly lower probe lifetime for hyperosmotic condition compared with control, and non-significant, but slightly higher lifetime for hypoosmotic shock. The magnitude of lifetime changes was comparable with the study cited by the reviewer, but the quality of our measures did not allow us to have a better resolution. Next we measured average lifetime for control and CK-666+Y-27 treated cells 30 min and 3 h after plating, because we have highest tether force values for CK-666+Y-27 at 30 min. We did not see a change in lifetime in control cells between 30 min and 3 h (which also did not see with the tether pulling). Cells treated with CK-666+Y-27 showed a slightly lower lifetime values than control cells, but both 30 min and 3 h after plating, which means that it did not correspond to the transient effect of fast spreading but probably rather to the effect of the drugs on the measure.

      Graph showing FLIPPER lifetime before and after osmotic shock for HeLa cells plated on PLL- coated substrate. Left: control (N=3, n=119) and hyperosmotic shock (N=3, n=115); Right: control (N=3, n=101) and hypoosmotic shock (N=3, n=80). p-value are obtained by t-test.

      Graph showing FLIPPER lifetime for control just after the plating on PLL-coated glass (the same data for control shown at the previous graph), 30 min (control: N=3, n=88; Y-27+CK-666: N=3, n=130) and 3 h (control: N=3, n=78; Y-27+CK-666: N=3, n=142) after plating on fibronectin-coated glass. p-value are obtained by t-test.

      Because the cell to cell variability might mask the trend of single cell changes in lifetime during spreading, we also tried to follow the lifetime of individual cells every 5 min along the spreading. Most illuminated cells did not spread, while cells in non-illuminated fields of view spread well, suggesting that even with an image every 5 minutes and the lowest possible illumination, the imaging was too toxic to follow cell spreading in time. We could obtain measures for a few cells, which did not show any particular trend, but their spreading was not normal. So we cannot really conclude much from these experiments.

      Graph showing FLIPPER lifetime changes for 3 individual cells plated on fibronectin-coated glass (shown in blue, magenta and green) and average lifetime of cells from non-illuminated field (cyan, n=7)

      Our conclusions are the following:

      1) We are able to visualize some change in the lifetime of the probe for osmotic shock experiments, similar to the published results, but with a rather large cell to cell variability.

      2) The spreading experiments comparing 30 minutes and 3 hours, in control or drug treated cells did not reproduce the results we observed with tether pulling, with a global effect of the drugs on the measures at both 30 min and 3 hours.

      3) Following single cells in time led to too much toxicity and prevented normal spreading.

      We think that this technology, which is still in its early developments, especially in terms of the microscope setting that has to be used (and we do not have it in our Institute, so we had to go on a platform in another institute with limited time to experiment), cannot be implemented in the frame of the revision of this article to provide reliable results. We thus consider that these experiments are for further development of the work and are out of the scope of this study. It would be very interesting to study in details the comparison between the oldest and more established method of tether pulling and the novel method of the FLIPPER probe, during cell spreading and in other contexts. To our knowledge this has never been done so far, so it is not in the frame of this study that we can do it. It is not clear from the literature that the two methods would measure the same thing in all conditions even if they might match in some.

    1. Authors Response:

      Reviewer #2 (Public Review):

      The authors use representational similarity analysis on a combination of behavioral similarity ratings and EEG responses to investigate the representation of actions. They specifically explore the role of visual, action-related, and social-affective features in explaining the similarity ratings and brain responses. They find that social-affective features best explain the similarity ratings, and that visual, action-related, and social-affective features each explain some of the variance in the EEG responses in a temporal progression (from visual to action-related to social-affective).

      The stimulus set is nicely constructed, broadly sampled from a large set of naturalistic stimuli to minimize correlations between features of interest. I'd like to acknowledge and appreciate the work that went into this in particular.

      The analyses of the behavioral similarity judgments are well executed and interesting. The subject exclusion criteria and catch trials for online workers are smart choices, and the authors have tested a good range of models drawn from different categories. I find the case that the authors make for social features as determinants of behavioral similarity ratings to be compelling.

      I have a few questions and requests for additional detail about the EEG analyses. I appreciate that the authors have provided the code they used for all the analyses, and I'm sure that the answers to many if not all of my questions are there, but I don't have access to a Matlab license to run the code. Also, since the code requires familiarity with not just Matlab but with specific libraries to understand, I think that more description of the analysis in the paper would be appropriate.

      Some more detail is needed in the description of the multivariate classifier analysis. The authors write (line 597-599): "The two pseudotrials were used to train and test the classifier separately at each timepoint, and multivariate noise normalization was performed using the covariance matrix of the training data (Guggenmos et al., 2018). "

      I suspect I'm missing something here, because as written this sounds as if there was only one trial on which to train the classifier, which does not seem compatible with SVM classification. If only one trial was used to train the classifier, that sounds more like nearest-neighbor classification (or something else). Alternatively, if all different pseudo-trial averages - each incorporating a different subset of trials - were used for training, then that would seem to mean that some of the training pseudo-trials contained information from trials that were also averaged into the pseudo-trials used for testing. I don't know if this was done (probably not) but if it was it would constitute contamination of the test set. I think this part of the methods needs more detail so we can evaluate it. How many trials were used to train and to test for each iteration?

      Thank you for raising this issue; we agree that our Methods section was unclear on this point. We used split-half cross-validation. There was one pseudotrial for training per condition (which was obtained by averaging trials). There was no contamination between the training and test sets, because the data was first divided into separate training and test sets, and only afterwards averaged into pseudotrials for classification. This procedure was repeated 10 times with different data splits to obtain more reliable estimates of the classification performance. We rewrote the corresponding section to make this clearer:

      “Split-half cross-validation was used to classify each pair of videos in each participant’s data. To do this, the single-trial data was divided into two halves for training and testing, whilst ensuring that each condition was represented equally. To improve SNR, we combined multiple trials corresponding to the same video into pseudotrials via averaging. The creation of pseudotrials was performed separately within the training and test sets. As each video was shown 10 times, this resulted in a maximum of 5 trials being averaged to create a pseudotrial. Multivariate noise normalization was performed using the covariance matrix of the training data (Guggenmos et al., 2018). Classification between all pairs of videos was performed separately for each time-point. […] The entire procedure, from dataset splitting to classification, was repeated 10 times with different data splits.”

      We also performed the decoding procedure with a higher number of cross-validation folds and found very similar results.

      I think a bit more detail is also necessary to clarify the features used for the classification. My understanding is that each timepoint was classified as one action vs each other action on the basis of all the electrodes in the EEG for a given temporal window. Is this correct? (I'm guessing / inferring more than a little here.)

      This is correct, and we agree that further clarification was needed in text. We have added this:

      “Classification between all pairs of videos was performed separately for each time-point. Data were sampled at 500 Hz and so each time point corresponded to non-overlapping 2 ms of data. Voltage values from all EEG channels were entered as features to the classification model.

      The entire procedure, from dataset splitting to classification, was repeated 10 times with different data splits. The average decoding accuracies between all pairs of videos were then used to generate a neural RDM at each time point for each participant. To generate the RDM, the dissimilarity between each pair of videos was determined by their decoding accuracy (increased accuracy representing increased dissimilarity at that time point).”

      It would be useful to know how many features constituted each feature space. For example, was motion energy reduced to one summary feature (total optic flow for whole sequence?) For "pixel value", is that luminance? (I suspect so, since hue is quantified separately, but I don't think this was specified).

      For motion energy, we used the magnitude of the optic flow, and calculated Euclidean distances between the vectorized magnitude maps rather than reducing it to summary features. We have included the dimensionality of each feature in Supplementary File 1b and we now refer to it in text:

      “These features were vectorized prior to computing Euclidean distances between them (see Supplementary File 1b for the dimensionality of each feature).”

      Pixel value was indeed the luminance, and we have clarified this in text.

      More broadly, I would appreciate a bit more discussion of the role of time in these analyses. Each clip unfolds over half a second, so what should we make of the temporal progression of RDM correlations? Are the social and affective features correlated with later responses because they take more time to compute (neurally speaking), or because they depend on longer temporal integration of information? These two are not even exactly mutually exclusive, and I realize that it may be difficult to say with certainty based on this data, but I think some discussion of this issue would be appropriate.

      This is a great point, although it is difficult to speculate based on this data. One way to get at this would be to examine how much social-affective processing relies on previously extracted features. Future work could look at the causality between early and later-stage EEG features (unfortunately our post-hoc attempts to address this via Granger-causal analysis were unsuccessful, likely due to insufficient SNR with our specific experimental design). Alternatively, this could be investigated in a follow-up experiment that varies how social information unfolds over time (e.g., images vs. videos or varying video duration). We now discuss this possibility in the manuscript:

      “Given the short duration of our videos and the relatively long timescale of neural feature processing, it is possible that social-affective features are the result of ongoing processing relying on temporal integration of the previously extracted features. However, more research is needed to understand how these temporal dynamics change with continuous visual input (e.g. a natural movie), and whether social-affective features rely on previously extracted information.”

    1. Author Response:

      Reviewer #1 (Public Review):

      In this study, the authors present a detailed analysis of the T cell receptor repertoire in mice examining how the age, the cell differentiation status, the tissue compartment distinguishing between spleen and bone marrow, and antigen exposure influence its composition. Looking at amino acid motif distributions and sharing patterns of nucleotide sequences within the individual clones, and comparing them between the different cell compartments and groups, they aim at identifying the main axes of influence that shape the T cell receptor repertoire within these mice. They find some interesting differences, with e.g. repertoires from different functional compartments being more separated within young animals compared to older ones. However, given the complexity of the different aspects that are investigated and compared, it is sometimes difficult to follow the main conclusions from each of the presented analyses. In addition, the main conclusion shown in Figure 7 that repertoire evolution is influenced by cell migration, differentiation and age/infection seems to be partly well known, as also acknowledged by the authors. However, it would be really interesting if, based on their analyses, the authors could put a "weight" to each of these features, i.e. if aging has a larger effect on repertoire differences than tissue compartments. This is currently not easily seen from their analyses. If possible this would increase the value of this study going beyond a descriptive presentation of the results and support the mechanistic relationships hypothesized within the discussion.

      We are grateful to the reviewer for these insightful comments. We have introduced a completely new analysis (new Figure 5 and two additional panels in Figure 6) which captures the quantitative shift in a two-dimensional space defined by two global statistical parameters of the repertoire (V and Triplet cosine similarity) as the repertoires move along the multidimensional trajectory illustrated in Figure 7 (old, now Figure 5A). This analysis framework is novel, at least to our knowledge. When combined with the existing analysis, this additional analysis reveals a clear hierarchy of impact of different immunological processes on repertoire diversification. Definition of this hierarchy is itself we believe novel. In addition, we observe a CD4+/CD8+ lineage dependent interaction between age and differentiation on the structure of the repertoire providing additional novelty. A completely new results and discussion session are attached to this figure.

      Reviewer #2 (Public Review):

      This is a careful observational study of a rich dataset, quantifying the relationships between naive, regulatory, effector and memory subsets. it is notable for its thorough approach to analysing TCR diversity by multiple levels of granularity, from V-beta usage to nucleotide level. However it is a little lacking in narrative and interpretation. As a result my impression is that it doesn't present any results that expand significantly on our existing understanding of T cell biology. As the authors note, shifts in diversity of T cell subsets with age are already well established and while multiple measures of diversity are explored, the results are broadly in agreement with each other. I wanted to be more enthusiastic about this study but it comes across as a tour de force in data exploration rather than something that sheds light on the forces shaping the structure and overlap of T cell repertoires.

      We agree with this reviewer and the prior reviewer that the main impact of this study was not clearly explained. We provide a quantitative framework which can be used to track the movement of repertories along the multidimensional space illustrated in our Figure 7(old) and can define a hierarchy of impact of different immunological processes in driving diversification of the repertoire. As outlined in detail in the response to reviewer 1, we think this perspective on statistical features of TCR repertoires goes significantly beyond previous studies and provides a robust quantitative framework for analysing the relative influence of internal or external selective pressure on the T cell compartment.

    1. Author Response:

      Reviewer #2 (Public Review):

      Zhang et al. describe a new method for inducing traumatic optic neuropathy in larger mammalian models that offers the additional advantage of allowing rapid administration of local therapies to the site of optic nerve injury. Furthermore, the authors build on their prior work which has demonstrated a neuroprotective effect of hypothermia provided that protease inhibitors are employed to protect against cold-induced microtubule damage. In the present manuscript, they show that an endonasal approach to accessing the optic nerve within the optic canal can be performed safely in goat without inducing optic nerve damage. They then demonstrate that experimental crush of the optic nerve within the optic canal results in progressive degeneration of retinal ganglion cell (RGC) neurons and of their axons which form the optic nerve; this occurs over a period of several months, a similar time course to traumatic optic neuropathy in humans. Transcriptional profiling of mRNA obtained from the optic nerve at its site of injury identified changes in gene expression related to molecular pathways involved in inflammation, ischemia, and cellular metabolism. The authors then proceed to apply local hypothermia or protease inhibitor administration (individually or in combination) to the site of optic nerve crush for two minutes and observed a decrease in axonal degeneration in the subsequent months, although without an improvement of conduction of visual information by the affected optic nerve. Finally, the authors describe a computer program which uses computed tomography scans to evaluate thousands of potential endonasal approaches to access the prechiasmal optic nerve in large and medium sized mammals, and they proceed to successfully perform optic nerve exposure and experimental crush in a macaque model.

      The authors' interpretation of the data is generally accurate; however, their conclusions about the impact of the work are somewhat overstated.

      We appreciate the reviewer’s insightful and precise comments, which definitely help improve our manuscript. We have scaled back the conclusions about the impact of our work, and revised our manuscript point by point according to the reviewer’s comments.

      Strengths:

      The prevailing use of rodent models of traumatic optic neuropathy in the field is problematic, and the authors' efforts to use larger mammalian models may be helpful in understanding the pathophysiology of and developing treatments for traumatic optic neuropathy in humans. The computer program that evaluates and recommends detailed surgical approaches and instrumentation is novel and would be quite useful to other investigators attempting to perform similar endonasal procedures. The authors make use of multimodal assessments of the goats and macaques [e.g. quantitative retina and optic nerve histology; optical coherence tomography measurements of retinal layers; pupillary light reflex assessment; and electrophysiology studies including visual evoked potential (VEP) and pattern electroretinography (PERG)] to convincingly demonstrate that the endonasal procedure itself can be performed without inducing progressive optic nerve damage and that experimental optic nerve crush using this procedure induces the expected profound decrease in optic nerve function, associated with progressive degeneration of RGC cell bodies and axons. The histological assessment of goat optic nerves following the local hypothermia/protease inhibitor treatment demonstrates a convincing reduction in the absolute number of RGC axons that are lost at 1 month after optic nerve crush.

      Thank you very much.

      Weaknesses:

      The premise that optic nerve crush within the optic canal is much more physiologically relevant than other existing animal models is overstated: while the optic canal is believed to be the most common site of injury to the optic nerve, most human cases of traumatic optic neuropathy occur as a result of indirect mechanisms rather than compression/crush-namely, stretching and shearing forces are applied to the optic nerve where it is tethered to the periosteum of the optic canal by its dura. The authors' example of a bony fragment compressing the optic nerve within the optic canal (Figure 1B) is relatively rare and would actually represent one of the few cases where a surgical intervention (to relieve acute optic nerve compression) might be considered clinically justifiable.

      We agree with the reviewer that “most human cases of traumatic optic neuropathy occur as a result of indirect mechanisms rather than compression/crush”. To address the reviewer’s concern, we updated our discussion as: “Recently, rodent models have been developed using indirect mechanisms (apply periorbital ultrasound or skull weight drop) to induce distal ON injury. Compared with direct optic nerve compression or crush, these models are likely more clinically relevant since most clinical TON cases are indirect and due to force transmission (39-41). However, due to force scattering, unwanted and uncontrolled collateral damage to the eyeball, contralateral optic nerve, orbit or skull often occur in these models (39,40). Additionally, the success rate of these modeling methods is not as high as direct optic nerve crush; for example, 10% mice died immediately after head weight drop (39). Moreover, extension of these modelling methods to large animal species has not been reported. Therefore, clinically translatable, local treatment of injured ONs via trans-nasal endoscopy cannot be performed in these small animal models.”

      To further address the reviewer’s concern, we have added the following statement to the “Limitations of this study” section in the revised manuscript: “Our TON model is clinically relevant in terms of injury site, subsequent spatiotemporal pattern of retrograde axonal degeneration, and availability of trans-nasal local treatment. However, the mechanism of optic nerve injury in our model differs from that in most clinical TON cases, in which the intra-canalicular optic nerve is injured by an indirect mechanism (stretching and shearing forces), rather than by direct compressing forces.”.

      We respectfully disagree that “a bony fragment compressing the optic nerve within the optic canal (Figure 1B) is relatively rare”, at least in China. According to our previous clinical study of 1275 patients with indirect traumatic optic neuropathy (PMID: 27267448), bony fracture of the optic canal is not rare: 50% of patients had a visible optic canal fracture on high-resolution CT scans, and an additional 20% had a visible optic canal fracture under trans-nasal endoscopy (because endoscopy provides excellent illumination and a magnified view of the optic canal).

      The transcriptomic profiling at three locations along the visual pathway in post-trauma goats only showed differences in expression at the location of optic nerve injury, and not within the retina or proximal optic nerve on the affected side. The authors assert that the high rate of expression changes in pathways relevant to ischemia, inflammation and metabolism indicates "that targeting these pathways with local treatment could alleviate secondary damage." This is overstated. While such profiling may be useful for hypothesis generation, it was not followed up by any experiments to determine whether these expression changes are actually detrimental to the optic nerve. Some of them may very well be compensatory, such that inhibiting them may only exacerbate damage. Furthermore, given that these transcriptional changes are not seen in the retina (where RGC nuclei reside) or in the proximal optic nerve, this would suggest that the observed transcriptional changes at the site of injury are actually occurring in non-neuronal cells (e.g. astrocytes, oligodendrocytes, microglia). The authors should convey that the changes they observe are unlikely intrinsic to the optic nerve axons.

      We appreciate the reviewer’s insightful explanation and agree that our original claim is premature. We deleted the statement " targeting these pathways with local treatment could alleviate secondary damage" in the Results section. Additionally, we have replaced the previous sentence in the Discussion, which stated: " Instead, we targeted hypothermia to the injured pre-chiasmatic ON to prevent early changes in ischemia, inflammation, and metabolism transcripts (as revealed by RNA-sequencing at 1 dpi)." with “Instead, we targeted hypothermia to the injured pre-chiasmatic ON according to early transcriptomic changes in ischemia, inflammation, and metabolism pathways.”.

      We also agree with the reviewer’s comment that these transcriptomic changes at the injured optic nerve mainly occurred in the micro-environment of non-neuronal cells. The major advantage of our large animal model is the ability to modulate the micro-environment around the axons of the distal optic nerve, including glial cells, vasculature, connective tissues and the extracellular matrix. To incorporate the reviewer’s comment, we have added the following statement in the Results section: “These transcriptomic changes at the injury site were unlikely intrinsic to the distal optic nerve, and more likely occurred mostly in non-neuronal cells in the micro-environment.”

      The lack of any rescue of the pupillary light reflex or of visual evoked potential responses after local hypothermia/protease inhibitor treatment suggests that the physiological significance of any anatomical rescue by this treatment is minimal. If a number of axons survive but cannot conduct sufficient visual signal to stimulate the pupil response or stimulate the visual cortex, then the local treatment (even when applied immediately after trauma in this model, unlike in human patients in which a delay of a number of hours would be required at the very least) cannot be considered a substantial success. The authors also characterize the goats at 1 month post-injury, so one cannot say whether the statistically significant improvement in axon loss with the combined treatment would be durable at the later 3-month time point.

      We agree with the reviewer that the local treatment with hypothermia/protease inhibitor did not achieve functional recovery of the visual pathway, and thus it could not be considered a substantial success. Our work provides a safe and clinically translatable approach to modulate the inhibitory extrinsic environment at the injury site. Although in our current study, application of local treatment with hypothermia/protease inhibitor does not achieve eye-to-brain functional recovery, and its long-term therapeutic effect is unclear, these results are encouraging because they show some benefit of local treatment. Another important feature of our large animal model is that local treatment of the extrinsic environment at the distal optic nerve can be combined with currently available intravitreal treatments, such as gene therapy to boost intrinsic axon regrowth capacity, to target both extrinsic and intrinsic factors. To address the reviewer’s comment, we have added the following to the discussion in the “Limitations of this study” section: “Additionally, the current local treatment did not achieve functional recovery of the eye-to-brain pathway, and its long-term therapeutic effect is unclear.”

      As the authors acknowledge, the use of larger mammals prevented them from conducting studies with a large n. As a result, their analyses are underpowered.

      We agree with the reviewer that, in an ideal scenario, increasing the sample size would improve the power of the analysis. In this study, however, ethical issues and limitations of housing space and other resources, constrain the sample size of large mammals. We include this point in our “Limitations of this study” section.

      Because of the dissimilarities between their model and the most common mechanism of human traumatic optic neuropathy and because of the lack of a clinically significant rescue of the optic neuropathy when the local hypothermia/protease inhibitor treatment was applied, the authors' assertion that their model may "trigger a paradigm shift " for traumatic optic neuropathy research should be scaled back.

      We agree with the reviewer that it is too early to state that our model may “trigger a paradigm shift”. We need more effective local treatments to prove the value of this model and the novel therapeutic approach. Therefore, we have deleted “trigger a paradigm shift” in the Introduction and Discussion.

    1. Author Response:

      Reviewer #1 (Public Review):

      Phillips and Rubin investigated the biophysical mechanisms that underly the generation of respiratory related burstlets and bursts in the rhythmogenic pre-Botzinger circuit. They show, suing a computational modelling approach, that synaptically mediated intracellular calcium concentrations and Ca2+ release and uptake via the endoplasmic reticulum are a unifying mechanism of the generation of the respiratory rhythm and its recruitment at the motoneuronal level. This explanatory model is able to unify contradictory experimental findings and is further evaluated by modelling the effect of opioids on the generation of burstlets and bursts in pre-Botzinger complex.

      The conclusions of this paper are mostly well supported by a sound computational modelling approach, however the current computational modelling data are largely based on experimental data of very few workgroups, while previous modelling approaches and experimental data that support anatomical network connectivity as a key feature for respiratory rhythm generation and transmission of burstlet/bursts to motorneuron pool were neglected.

      To our knowledge, the work supporting anatomical network connectivity as a key feature that the reviewer mentions (Ashhad & Feldman, Neuron, 2020; Slepukhin et al., arXiv: 2012.12486) makes the claim that in a general setting involving spread of activity by percolation without any additional biophysical mechanisms (such as synaptic plasticity, cf. Guerrier et al., PNAS, 2015), a specific distribution of synaptic weights is needed to produce burstlet generation that agrees with experimental findings. Our work shows that random connectivity without specially distributed weights is sufficient in a setting in which a fraction of neurons are endogenous bursters and the calcium-related recruitment mechanisms that we model are also present. Our approach of studying this biophysical framework independently from the alternative, based entirely on anatomical network connectivity, is important in order to characterize what properties can follow from the biophysical framework on its own. Once our paper is published, the two complementary frameworks and their corresponding predictions will be represented in the literature and can be tested in future experiments. We contend that this is more useful than if we had tried to combine both biophysical and connection-based frameworks in the present model, before considering the former on its own.

      The current model is exclusively focused on biophysical properties and in particular calcium dynamics to generate a unifying computational model for respiratory rhythm generation. However, a previous model and experimental data suggest the emergence of rhythmogenic activity in pre-Botzinger complex may be largely determined by the local network connectivity as well as by the connectivity of the pre-Botzinger complex with the extended medullary and caudal pontine respiratory circuit. It would be interesting if this crucial component of a truly unifying model could be added or at least needs to be discussed appropriately.

      We fully acknowledge that our work focuses on rhythm-generation and pattern-generation mechanisms in the preBotC. It is quite possible that these mechanisms can only function at full capacity when the neurons involved receive inputs from other parts of the extended respiratory circuit (e.g., Jones and Dutschmann, J. Neurophysiol., 2016). These inputs, for example, might help set the level of excitability of preBotC neurons (cf. Smith et al., J. Neurophysiol., 2007). Although adding any sort of detailed representation of this extended circuit is beyond the scope of this work, we agree that it is important to mention, to help put our work in a bigger picture context, and we have added text about this point in our revised Discussion and in our responses below to Reviewer #1’s Recommendations for the authors.

      Reviewer #2 (Public Review):

      Since its isolation in the transverse slice in 1991 (1), researchers have studied the preBotC with a focus on 2 related questions: how respiratory rhythm is generated, and how this rhythm is transformed into the pattern of inspiratory bursts, which are recorded at hypoglossal rootlets (XIIn), and in more intact preparations, from cervical ventral roots. The discovery of burstlets has provided support for the conjecture that the preBotC can be functionally parcellated into rhythm-generating and pattern-forming networks (2).

      Others have proposed that burstlets are rhythmogenic (2-4), via a stochastic percolation mechanism that synchronizes tonic spiking into burstlets, which in turn give rise to inspiratory drive. Phillips and Rubin challenge this account for its lack of detail and argue that it is weakly supported by features of burstlets that can be otherwise accounted for (burstlet onset and slope; percolation of activation following photostimulation of network subsets). Their points are well-taken, but two points need to be made. First, the percolation conjecture has gained traction because other more conventional mechanisms for rhythmogenesis have been shown not to apply, as nicely summarized in (4, 5). In particular, rhythmogenesis arising out of the activity of endogenous bursters (as is the case in their model) has failed to find empirical support: optical recordings of respiratory networks under conditions of synaptic blockade have not revealed appreciable numbers of endogenous bursters (this despite the fact that groups are recording from slices expressing genetically-encoded Ca2+ indicators in glutamatergic neurons in preBotC, using two-photon microscopy), older studies using conventional patch clamp methods identified negligible numbers of endogenous bursters (6, 7), and pharmacological disruption of specific endogenous bursting mechanisms has not silenced respiratory rhythm in the slice (8). Second, and more importantly, the processes that they model mediate the amplification of burstlets to bursts, and thus don't have obvious bearing on whether or not burstlets are generated by stochastic percolation or endogenous burster activity. As they state (lines 605-607) "the findings about burstlets and bursts presented in this work would have been obtained if the burstlet rhythm was imposed (Fig. 1) or if burstlets were generated by some other means". Thus a discussion of the inadequacies of current percolation models of respiratory rhythmogenesis seems to be irrelevant to their main points, and fails to acknowledge that other more plausible mechanisms have been set aside because they were undermined by experimental findings. They should either explain why a percolation-based mechanism for the emergence of burstlets is incompatible with their model for CIRC-mediated amplification of burstlets to network bursts, or they should remove these arguments, which distract from their main findings.

      We acknowledge that different experiments, done using different approaches and in various settings, have yielded heterogeneous results about the prevalence and importance of endogenous bursters in respiratory rhythm generation. Importantly, the emphasis in our work is on the biophysical mechanisms associated with neural recruitment in the burstlet-to-burst transition. We do not make the claim that a percolation-based mechanism for the emergence of burstlets is incompatible with our ideas about this transition. We have added a critical phrase to this text to clarify this point. Importantly, in the revision process, we have also performed additional simulations, now included in the paper, which show that our model gives similar recruitment and amplification of activity when INaP-based bursting is replaced by imposing rhythmic activity on the burstlet population; see Figure 4-Figure Supplement 1.

      It would be useful if the authors could elaborate on the applicability of their findings to less reduced preparations. The emergence of burstlets as well as the transition from predominantly burstlets to predominantly bursts is strongly dependent on network excitability, which is controlled by varying [K+]bath. Importantly, in their simulations burstlet activity falls silent at [K+]bath < 4 mM, and robust motor output only emerges for [K+]bath > 8 mM. While these modeling results jibe well with experimental results in the slice preparation, in more intact preparations, robust and stable respiratory rhythm is maintained at physiological levels ([K+]bath = 3 mM); in intact animals, 9 mM [K+]o is lethal. This raises the question of whether their model has explanatory power for respiratory rhythmogenesis in more intact preparations, or whether it is limited to describing fictive respiration in the slice.

      We thank the reviewer for highlighting this important point. The relevance of the current simulations to more intact preparations is now discussed in the revised manuscript. In in vitro slice preparations the preBotC generally lacks sufficient intrinsic excitability to generate an inspiratory rhythm presumably due to the loss of excitatory drive from higher brainstem centers such as the KF, PB, RTN, RO; see Smith et al., J. Neurophysiol 2007. As such, elevating extracellular potassium in in vitro slice preparations is a standard procedure in the field as it is required for generating rhythmic output under these conditions. However, elevating the preBotC excitability by other means such as manipulations of extracellular Ca2+ does not appear to significantly impact preBotC rhythmicity (Ruangkittisakul et al., Respir Physiol Neurobiol. 2011) or the generation of bursts and burstlets (Kam et al., J. Neurosci. 2013). Moreover, burstlets can be seen in vivo via manipulations of preBotC and BotC excitability (Kam et al., J. Neurosci. 2013). Therefore, understanding burstlet and burst generation in the reduced preBotC slice preparation is highly relevant in more intact preparations. Investigation of how these higher brainstem centers impact preBotC inspiratory rhythm and pattern generation are beyond the scope of this study.

      Another aspect to this problem is that the rhythmogenic mechanism they have incorporated in their model has strong dependence on [K+]bath; this straight-forwardly accounts for the transition from quiescence to burstlets, since within their models, endogenous bursters are quiescent at lower excitability levels. Of greater interest is the extent to which the [K+]bath dependence of the transition from burstlets to inspiratory bursting is again due to their choice of endogenous burster implementation. This is important, because it might enable CIRCA-mediated transitions from burstlets to network bursting that show less steep voltage dependence, and which are robust at more physiological [K+]o, thus enhancing the generalizability of their model. This was certainly a feature of an early study, in which I(CAN) was proposed as a mechanism for endogenous bursting, with weaker voltage dependence than the I(NaP)-based endogenous bursters implemented here (7).

      Our response to this comment has several aspects, which we present sequentially below.

      1) Qualitatively, the Kbath dependence of our model matches well with the Kbath dependence shown in Kallurkar et al., eNeuro 2019. That is, the burstlet frequency and bustlet amplitude increase with Kbath. We are unaware of any data that shows a different Kbath vs burstlet frequency relationship. Moreover, if PSynCa is increased with Kbath our model produces changes in the burstlet fraction that closely match those seen experimentally.

      2) It is not clear why we would want to show a less steep Kbath-dependence (we are assuming that was what the reviewer was commenting about, rather than voltage dependence) with the burstlet to burst transition (burstlet fraction). First, our model actually has a very shallow slope in the Kbath vs burstlet fraction curve if Kbath is the only parameter varied (PSynCa=fixed). This can be seen by considering a horizontal slice through Fig. 4 D-F. In order to match the slope of the Kbath vs burstlet fraction curve in experimental data we actually needed to make the slope between Kbath and the burstlet fraction steeper. This was achieved by increasing PsynCa with Kbath, Fig. 4I. The justification of increasing PSynCa is presented on lines 422-442 of the discussion.

      3) Indeed, some earlier work has identified Cd2+ sensitive and presumably Ca2+/ICAN dependent endogenous bursters and these bursters showed weaker voltage-dependence (Thoby-Brisson Journal of neurophysiology 2001). However, more recent studies have shown that ICAN is not involved in rhythm generation (Koizumi et al. 2018, Picardo et al., 2019). Similarly, application of Cd2+ which should eliminate Ca2+/ICAN dependent bursters does not affect rhythm generation, and instead blocks preBotC motor output (Kam et al., 2013, Sun et al., 2019). That is, Cd2+ application eliminates preBotC network bursts but does not affect the ongoing burstlet rhythm.

      4) The Kbath value of approximately 5mM where endogenous bursting first emerges in our model is also in agreement with existing experimental data (Del Negro et al., 2001, Mellen et al., 2010).

      5) In our model, a burstlet transitions into a burst when postsynaptic calcium transients in the pattern generating subpopulation are of a sufficient magnitude to trigger CICR, which leads to ICAN activation, depolarization, and ultimately recruitment of these neurons into a network burst. The rhythm in the burstlet population is driving the postsynaptic Ca2+ transient in the pattern generating subpopulation. The mechanism that is driving the burstlet rhythm has no impact on the dynamics of the postsynaptic Ca2+ transients, CICR, ICAN activation or recruitment of the pattern generating population. Therefore, the mechanism of rhythm generation does not impact the proposed mechanism underlying the burstlet to burst transition. However, to illustrate this point we imposed a rhythm in the neurons comprising the rhythmogenic subpopulation. This was achieved by imposing a Poisson spike train in each neuron where the frequency was zero during the interburst interval. At the start of the burstlet the frequency linearly ramps up over 250ms to an imposed maximum firing rate and then linearly ramps back down to zero over an additional 250mS, see Figure 4-figure supplement 1A. The burstlet frequency was set by changing the interburst interval and the amplitude was set by changing the maximum firing rate at the peak of the burstlet, Figure 4-figure supplement 1 panels A and B. These simulations show that this network produces qualitatively similar bursts and burstlets which depend on a CICR-mediated burstlet-to-burst recruitment process. Moreover, as in Fig. 4, the burstlet fraction primarily depends on Psynca, whereas the burst frequency depends on Psynca and the burstlet frequency.

      1. Smith JC, Ellenberger HH, Ballanyi K, Richter DW, Feldman JL. Pre-Botzinger complex: a brainstem region that may generate respiratory rhythm in mammals. Science. 1991;254(5032):726-9.
      2. Kam K, Worrell JW, Janczewski WA, Cui Y, Feldman JL. Distinct inspiratory rhythm and pattern generating mechanisms in the preBotzinger complex. J Neurosci. 2013;33(22):9235-45.
      3. Ashhad S, Feldman JL. Emergent Elements of Inspiratory Rhythmogenesis: Network Synchronization and Synchrony Propagation. Neuron. 2020;106(3):482-97 e4.
      4. Kallurkar PS, Grover C, Picardo MCD, Del Negro CA. Evaluating the Burstlet Theory of Inspiratory Rhythm and Pattern Generation. eNeuro. 2020;7(1).
      5. Del Negro CA, Funk GD, Feldman JL. Breathing matters. Nat Rev Neurosci. 2018;19(6):351-67.
      6. Del Negro CA, Koshiya N, Butera RJ, Jr., Smith JC. Persistent sodium current, membrane properties and bursting behavior of pre-botzinger complex inspiratory neurons in vitro. J Neurophysiol. 2002;88(5):2242-50.
      7. Thoby-Brisson M, Ramirez JM. Identification of two types of inspiratory pacemaker neurons in the isolated respiratory neural network of mice. J Neurophysiol. 2001;86(1):104-12.
      8. Del Negro CA, Morgado-Valle C, Feldman JL. Respiratory rhythm: an emergent network property? Neuron. 2002;34(5):821-30.
      9. Del Negro CA, Johnson SM, Butera RJ, Smith JC. Models of respiratory rhythm generation in the pre-Botzinger complex. III. Experimental tests of model predictions. J Neurophysiol. 2001;86(1):59-74.

      Reviewer #3 (Public Review):

      A key contribution of this study includes a demonstration that two sets of neurons coupled via excitation can drive network activity similar to that observed in mXII nerve during breathing. In particular, the authors formulate a link between synaptic excitation and intracellular calcium induced Ca2+ release mechanisms via positive feedback from ICaN. This forms the basis for recruitment of non-rhythmogenic neurons by rhythm generating neurons. Such a formulation seems to help explain the burstlet theory and support a percolation theory of network bursts put forward in the field.

      The manuscript is well written. Figures and figure legends are clear, and justify the results stated. Methodology is well laid out; however, missed references in many places where it is not clear where the equations came from (e.g., equations 19 through 24 and elsewhere). The authors state that the model code "will" be available on ModelDB. This should instead be submitted with the manuscript for review. Later, the code must be made available on a GitHub repository for wide dissemination and future updates by others. ModelDB has models which can only be downloaded but not extended for wider use. Oftentimes this leads to lack of technical help for future users and limits model use and enhancement.

      We thank the reviewer for the positive comments about readability and clarity. The reviewer makes an excellent point about ensuring that code sharing occurs in a way that allows for future use. We have made our model code available on GitHub at the following link:

      https://github.com/RyanSeanPhillips/Putting-the-theory-into-burstlet-theory

      The authors put forward a plausible mechanistic explanation for Ca2+ dependent recruitment of non-rhythmogenic neurons by linking synaptic excitation from the rhythm generating neurons to CICR in the former. Although attractive, there are numerous factors which control intracellular Ca2+ signaling and buffering. It would be important to clarify whether the assumption that dendritic depolarization due to synaptic inputs directly contributes to CICR as postulated in this model (the term PsynCa*Isyn in equation 19), has any (direct/indirect) empirical support either in preBotz neurons or elsewhere to ensure that this is not purely conjectural.

      The link between postsynaptic Ca2+ transients and CICR in the pattern generating preBotC subpopulation and equation (19) has empirical support from multiple studies, which are discussed on lines 422-444. Specifically, Mironov et al., 2008 showed that synaptically triggered Ca2+ transients in the distal dendrites of preBotC inspiratory neurons travel in a wave to the soma, where they activate TRPM4 currents (𝐼𝐶𝐴𝑁). This idea is further supported by Del Negro et al., 2011, which showed that dendritic Ca2+ transients precede inspiratory bursts, and Phillips et al., 2018, which showed that Cd2+ sensitive voltage-gated Ca2+ channels are primarily located in distal dendritic compartments. Taken together, these studies suggest that excitatory synaptic inputs to distal dendrites of preBotC inspiratory neurons trigger postsynaptic Ca2+ transients. In the model, we capture the synaptically triggered postsynaptic Ca2+ transient with equation (19), which specifies that a percentage (PSynCa) of the total postsynaptic current (ISyn) is carried by Ca2+ ions.

    1. Author Response:

      We thank the reviewers for their positive, constructive feedback. However, we would like to take the opportunity now to briefly address one comment from Reviewer #1:

      In addition, only pyramidal cells were considered here but long-range projecting GABA neurons have been recently reported in the prelimbic cortex (preprint from Malik et al., 2021) which suggests that this could be a possibility in the anterior cingulate cortex as well. So even if the starter cells identified in the present study were sufficient to detect other inputs to the hippocampus, I am not sure it is sufficient to completely rule out the existence of a scarce and potentially inhibitory projection to the hippocampus.

      We are quantifying the number of starter cells in our rabies experiment and will report these in the revised manuscript. However, the preprint from Malik et al. reported that optogenetic stimulation of the inhibitory projection from PFC did not evoke IPSCs in CA1 pyramidal cells (0/38 neurons tested). As such, if a similar inhibitory projection from ACC does indeed exist (it appears that it does, based on informal conversations with other researchers on social media) then we would not detect it in our monosynaptic rabies experiment, as we used the Emx1-cre mouse line to specifically map presynaptic inputs onto hippocampal glutamatergic neurons. Indeed, the preprint from Malik and colleagues provides evidence of a direct route by which prefrontal cortex can drive feedforward inhibition in CA1 while avoiding pyramidal cells. How the function of this projection differs from the indirect route via nucleus reuniens which, as we recently reported in another preprint (Andrianova et al., 2021, bioRxiv 2021.09.30.462517), also appears selective for interneurons over pyramidal cells, provides an exciting avenue for future research.

    1. Author Response:

      We thank the reviewers for their thoughtful evaluation of the manuscript and insightful comments. One reviewer commented on the lack of biological replicates for our scRNAseq. To mitigate batch effects, we sequenced a single technical replicate for each condition, but each consisted of several biological specimens (11-13 animals each) from multiple litters, totaling 49 animals. Pooling this large number of retinas was necessary for obtaining enough cells for analysis, and consequently, we have many biological samples represented. In addition, the proportions of microglia expressing specific markers in our scRNAseq match our published quantification by in situ hybridization (Anderson et al. Cell Reports 2019), giving us confidence that the proportions determined here are representative. The reviewer also asked whether the clusters with remodeling signatures are actually remodeling. We agree with the reviewer that we cannot infer function using transcriptomic data alone. However, our scRNAseq of microglia illustrates several clusters with high expression of lysosomal and lipid metabolism genes which are diminished or absent in retinas lacking developmental apoptosis (Bax KO), consistent with remodeling activity. A complete in vivo functional analysis of each remodeling cluster is beyond the scope of this manuscript, but we are currently probing this further for another publication.

    1. Author Response:

      Reviewer #2:

      Major comments:

      1. Despite the strong ERG phenotype, some 50% of the TADR mutant flies still show behavioral responses in the phototaxis axis, strongly arguing for a pathway acting in parallel to TADR. Comparison to a known blind mutant, such as HDC could clarify this issue.

      As an essential amino acid, each cell can only use extracellular histidine. Although TADR is a specific histidine transporter, its expression pattern suggests that it is not the only histidine transporter. Supporting this, null tadr mutants are viable and have no growth phenotype. As photoreceptor cells may uptake histidine from other histidine transporters, they have a small histidine pool and synthesize some histamine via Hdc. Supporting this, histamine levels in tadr mutants are higher than in Hdc mutants, although histamine levels are greatly reduced in both mutants (Figure 5D). Consistent with reduced histamine levels, tadr^2 mutants exhibit weak phototactic behavior, indicating the presence of another histidine transporter in Drosophila photoreceptor cells. However, given that tadr^2 mutants displayed a complete loss of ON and OFF transients, greatly reduced histamine levels, and much less phototactic behavior, we speculate that TADR is the major histidine transporter, responsible for maintaining the histidine pool and keeping visual transmission at high frequencies. We added this part to the discussion section.

      1. Such a second pathway is also consistent with the level of histamine still present in TADR flies. Although curiously this issue is not specifically addressed by the authors, the level appears to be significantly higher than in HDC mutants (Figure 5D). This should be addressed.

      We thank the reviewer for this concern. Indeed, we found that levels of histamine in compound eyes of tadr^2 mutant flies were slightly higher than in eyes from Hdc^P217 mutants. As you mentioned, we cannot not rule out the possibility that another pathway acts in parallel to TADR. We briefly explain this in the results section (highlighted). Moreover, histamine levels in tadr^2 mutant flies were higher than in Hdc^P217 mutants, suggesting that a small fraction of histidine could be supplied by other transporter systems.

      1. Beyond referring to a "complete disruption of the tadr locus", the molecular details of the mutant should be better explained: Does the mutant result in an in-frame or an out-of-frame fusion of exon3 and 5? What parts of the protein are deleted?

      We thank the reviewer for this suggestion. We verified the genomic details of the tadr^2 mutant, and indeed the tadr^2 mutation generates a truncated tadr mRNA with a frame-shift (deletion of 244 nt) at the truncated site. We added the molecular details of the mutant to the manuscript: “PCR amplification and sequencing of the tadr locus from genomic DNA isolated from wild-type and tadr^2 flies revealed a truncated tadr locus in mutant animals, resulting in an out-of-frame fusion of exon 3 and 5 (Figure 2-figure supplement 2B and 2C).”. We also modified Figure 2-figure supplement 2C showing the cDNA sequence and corresponding amino acids around the truncated stie.

    1. Author Response:

      Reviewer #1:

      NPM1-mutated acute myeloid leukemia (AML) is a frequent AML subtype for which new therapeutic approaches are needed. Immunotherapy may represent a promising strategy, to be combined with or alternative to chemotherapy. Particularly, vaccination strategies may be successful in NPM1-mutated AML, as mutant NPM1 is known to be immunogenic and specific T-cells may control disease relapse. In this study, Tripodo et al. performed a handful of experiments to demonstrate the feasibility and the antileukemic efficacy of a dendritic cell (DC) vaccine armed with neutrophil extracellular traps (NETs) derived from NPM1-mutated myeloid cells.

      While this work has the major strength of being novel and performed in vivo, the models used in the study, the number of replicates and the quality of some of the data presented seem insufficient.

      One of the major issues is the lack of a formal demonstration of clear antileukemic activity of this approach in leukemic mice. The authors first used a non-leukemic model, then, in their second set of experiments, a subcutaneously injected AML model. In this regard, I am worried that no effect may be seen in AML models where leukemia is engrafted in the bone marrow or in leukemic genetically modified mouse models.

      As for Reviewer suggestion we challenged the possibility of detecting vaccine effect against the leukemia model engrafted into the bone marrow. To this end C1498-NPMc+ also GFP+ cells were injected directly into the tibia of C57BL/6 mice, to better mirror leukemia development and the cross-talk with bone marrow microenvironment. Treating injected mice with NPMc+ NET DCs, we confirmed the efficacy of the vaccination schedule showing a reduced engraftment of C1498 within the BM of treated mice compared with animals left untreated. In the same experiment we compared mice vaccinated with DC+NPMc+NET with those vaccinated with DC+NPMc+peptides. Tumor take was evaluated by FACS as number of GFP+ leukemia cells in flushed BM. We show a reduction in GFP+ cells in the BM of mice that have been vaccinated with both NPMc+ NET/DC and the control DC/peptide vaccine in comparison to non-vaccinated mice (new Figure 4G). However only the reduction obtained through the NPMc+ NET/DC vaccine reached statistical significance (New Figure 4G). Also, the NPMc+ NET/DC vaccine was superior in inducing the proliferation of CD8 T-cells (evaluated as frequency of CD8+Ki-67+, Fig. 4I) and their production of TNF (new Figure 4I). Both vaccines sustained OX40 expression on CD8 T-cells (new Figure 4J) and reduced the frequency of exhausted PD1+TIM3+LAG3+ CD8-T cells (new Figure 4K). -cells (new Figure 4L-M). The production of TNF by CD8 T-cells was higher in NPMc+ NET/DC vaccinated mice compared to controls. In this mice we also observed a trend in the induction of Tem (CD44+CD62L+) (new Figure 4O).

      Finally, both vaccines were able to induce Auto-Ab against mutant NPMc (new Figure 4P), however only NPMc+ NET/DC vaccinated mice developed Ab to MPO (new Figure 4Q). The last might concur to the anti-leukemia response we have particularly observed in the sc setting, indeed C1498 cells do express MPO (Mopin, A 2016).

      Overall, the data support reason and advantage of the newly proposed NET-based vaccines in AML. The use of leukemic blasts as source of NETs provides the cell antigen repertoire displayed onto the DNA threads, also endowed of adjuvant functions. Differently, peptide-based vaccines require prior knowledge of the immunogenic peptides to be loaded into in vitro-activated DCs.

      Furthermore, it is unclear what would be the place of NET-DC among other DC vaccines, as there is no direct comparison in this study.

      We understand the point raised by the reviewer. The requested comparison between NPMc/NET and DC peptides vaccines was already present in the original manuscript (Figure 4C-E and supplementary Figure 2 in the present resubmission) and indicated a more efficient reduction in tumor growth in presence of NET/DC vaccines compared to NPMc+ peptides-based vaccines. Following the Reviewer’s request, we performed additional in vivo experiments comparing the efficacy of the two vaccines in the orthotopic injection setting (intra-bone). Similarities and differences in the two vaccines in terms of leukemia take and CD8 T cell activation status are discussed above and in the manuscript were data are shown as new Figure 4 (panel G-Q).

      Other issues include replicate numbers that need to be increased in some experiments, data representation which is not always appropriate and one panel which has been duplicated from a previously published work.

      To overcome the limitation of the number of observations, we now applied non-parametric approaches in order to have more robust results, according to the new statistical analysis performed.

      In addition for the new experiments we increased the number of observations and results obtained from the new analysis adjusted for the different experiments (block variable) have been reported in the manuscript.

      For all the experiments, we represented individual data as dot plots with the average and appropriate error bars for each group or boxplots. Statistical methods section has been accordingly modified.

      We also thank the Reviewer for pointing out the issue regarding the IF panel, which has been replaced with the correct one.

    1. Author Response:

      Reviewer #1 (Public Review):

      1. Figure 1A discusses Horvath et al. multi-tissue and skin and blood clocks but more clocks could be applied. It is recommended to add Hannum et al., Levine et al. PhenoAge, Lu et al. GrimAge, and Teschendorff epitoc2 clocks.

      We have applied other epigenetic clocks to our dataset and commented on their results in the first results section (see Supplementary Figure 1A). These clocks appeared to rejuvenate later in the reprogramming process, suggesting that the epigenome may be rejuvenated in stages.

      1. The study reports substantial differences in DNA methylation and chronological ages, which might be due to passage number. The passage number of these cells should be listed, if possible. Additionally, there seems to be a deviation when applying EPIC chip data to the Horvath et al. clock compared to its original platform. The authors may address in the Methods section whether this inconsistency has been addressed.

      We tried to use the lowest passage number available to reduce the effect of in vitro culture on epigenetic age. As such, cells were used at passage four after purchasing. The exact passage number at purchase is unfortunately not available from Thermo Fisher. These details have been added to the methods section. As for applying EPIC data to the Horvath clock, work by McEwen et al (2018) shows that DNA methylation age is highly correlated between 450K and EPIC array platforms. Nevertheless, we have applied multiple epigenetic clocks to our dataset including those trained for EPIC array data and stated this point in the methods section.

      1. It is discussed that fibroblast morphology is reversed. It would be good to quantify this morphological dynamics. For instance, whether cell size undergoes transition from mesenchymal to epithelial lineages and if any reversal is observed.

      This is an interesting point and we have quantified the morphological changes using confocal microscopy and measuring a ratio of roundness (the maximum length divided by the perpendicular width) of individual cells before, during and after maturation phase transient reprogramming (MPTR). Cells became temporarily rounder during MPTR (lower ratio) and then returned to an elongated state (higher ratio) which matched that of the starting fibroblasts (Figure 1D).

      1. The findings relate to cells >39 years old. Would the same significant effect be observed in younger cells?

      The effect of MPTR on cells of other ages (younger or older) is an interesting question (also raised by reviewer 3), but we feel it is beyond the scope of the current study. We have examined this idea in more detail in the discussion section.

      1. Citation 32 should be updated. Also, are there any common genes responsible for both rejuvenation and transient reprogramming?

      Citation 32 has been updated to the published article. We are not here able to distinguish between genes responsible for cellular rejuvenation relative to genes responsible for iPSC reprogramming. Indeed, the required genes for each process may be (partially) overlapping or unique. A number of transcriptional programs are activated by the Yamanaka factors, and it may be possible to disentangle these in the future to determine which genes are required for rejuvenation and which are required for reprogramming.

      1. Figure 1F (the PCA figure) has a disproportional percentage of PC1 and PC2. As PC1 represents the reprogramming trajectory, does PC2 have any obvious biological meaning?

      PC2 is composed of CpG sites that are temporarily hypermethylated or hypomethylated during reprogramming. These CpG sites are near genes that are involved in asymmetric protein localisation (according to gene ontology enrichment). We have included these observations in the results section.

      1. Of the cells that failed to reprogram, do they undergo cellular senescence? What are their predicted ages?

      Cells that fail to reprogram appear to remain fibroblast like (transcriptionally and epigenetically) and their epigenetic ages are unchanged (Figures 1A, 1D, 1F and 4B). The cell cultures continued to replicate for several weeks following the MPTR process with no obvious cellular senescence beyond what is usually observed in cultures of growing primary fibroblasts.

      1. It would be interesting to investigate multiple rounds of transient reprogramming, and the maximum number rounds that could be achieved.

      The effect of multiple rounds of MPTR is an interesting question as this may enhance the magnitude of rejuvenation. It is our view that this is beyond the scope of this manuscript, however, we have discussed this idea in the discussion section.

      1. Ohnishi et al. 2014 could be cited as a key article, which should not be missed when discussing the oncogenicity of these genes.

      The Ohnishi et al 2014 article has been cited in the discussion section.

      1. There are only 6 rejuvenation genes overlapping in figure 4f. Would DNAm age be reversed if all 6 genes are overexpressed? Would it erase cell identity?

      The genes in figure 4F have both rejuvenated levels of expression and DNA methylation, however, these genes are likely to be downstream of rejuvenation pathways as they are not known to possess transcription factor activity. As a result, upregulating and downregulating these genes as appropriate may not recapitulate the effects of reprogramming induced rejuvenation. In addition, there may be further overlaps that are not captured due to the limited CpGs covered by the DNA methylation array.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigate the potential of cellular reprogramming for the rejuvenation of age-associated phenotypes in human fibroblasts. Importantly, compared to previous studies in this field, the authors go one step further in the degree of induction of cellular reprogramming by expressing the reprogramming factors for a longer time until the maturation phase (MPTRs). Using this approach, they demonstrate not only the rejuvenation of fibroblasts at the transcriptional and epigenetic level but also the restoration of cellular identity following partial dedifferentiation. First, the authors showed that DNA methylation age is progressively decreased following the expression of the Yamanaka factors using a doxycycline inducible lentiviral system. In addition, they demonstrate that although fibroblasts transiently lose their identity during reprogramming to the maturation phase (transient reprogramming intermediate), this identity is recovered after the expression of the factors has been terminated following doxycycline withdrawal. Subsequently, analysis of DNA methylation at promoters and enhancers associated with fibroblast identity, reveals what they authors refer to as epigenetic memory, which might be responsible for the restoration of fibroblast identity following termination of transient reprogramming. Lastly, the authors elegantly demonstrate the rejuvenation of human fibroblasts following reprogramming at the transcriptional and epigenetic level using clocks based on these analyses, as well as the restoration of epigenetic marks and expression of dysregulated genes associated with fibroblast function.

      This is very interesting manuscript that follows up on previous studies demonstrating the amelioration of age-associated phenotypes by cellular reprogramming. Although the concept presented in the manuscript is not completely novel, the authors try to go one step further by investigating the rejuvenation of aging phenotypes and recovery of cellular identity following a more extensive and longer reprogramming protocol. This manuscript elegantly reinforces the potential of cellular reprogramming for the rejuvenation of aging and proposes a hypothesis for the restoration of cellular identity after cellular reprogramming based on the existence of a certain degree of epigenetic memory in the cells. This manuscript will be of interest to scientist working in cellular reprogramming, aging and epigenetics. Nevertheless, before publication the authors would need to address the points stated below:

      Major points:

      • The timeline of induction and termination of cellular reprogramming by addition and withdrawal of doxycycline is not very clear across the manuscript. This is a critical aspect of the manuscript that should be better explained at the beginning of the results as well as in the methods section. Specifically, the authors only refer to the time of analysis following withdrawal of doxycycline in the legend of Figure 1B. Even there, the authors mentioned that "Sorted cells were also further cultured and grown in the absence of doxycycline for at least four weeks". The precise timeline of induction and termination of reprogramming by addition and withdrawal of doxycycline is critical for the whole message of the manuscript and therefore should be explained in much more detail. How long after doxycycline withdrawal were the transient reprogrammed fibroblasts analyzed?

      The amount of time that doxycycline was withdrawn has been further clarified in results and methods sections. This was 4 weeks in the first experiment and 5 weeks in the second experiment. Cells had returned to fibroblast morphology by four weeks in the second experiment, however, they needed to be further expanded to generate enough material for downstream analyses.

      • In the same line, the authors described in the methods section that transient reprogrammed sorted cells were replated on irradiated mouse embryonic fibroblasts (iMEFs) in fibroblast medium without doxycycline. Were the negative control cells processed in the same way and plated on top of iMEFs? Was there any effect observed due to the growth of fibroblasts on top of iMEFs? How long were the cells cultured on top of iMEFs before culturing under normal conditions? Could the authors explain the rationale for the use of iMEFs during this protocol?

      All cells (including the negative controls) were initially replated onto iMEFs after flow sorting to replicate the culture conditions before the flow sort and aid in the reattachment of cells to the culture surface. At subsequent passages, cells were replated without iMEFs. Culturing on iMEFs had no obvious effect on negative control cells, which were similar (based on transcriptome and morphology) to cells that had never been grown with iMEFs. These points have been added to the methods section.

      • There is a certain degree of contradiction between the results shown in some sections of the manuscript as well as the discussion of the results. In one hand, the authors claim that reprogramming can rejuvenate fibroblasts not only at the transcriptional but also at the epigenetic level, more specifically at the level of DNA methylation, and that this rejuvenation is maintained even weeks after the induction of reprogramming has stopped. On the other hand, the authors propose that epigenetic memory based on the absence of changes in DNA methylation in promoters and enhancer of fibroblasts-related genes allow the restoration of cellular identity following reprogramming. Why epigenetic age does not display this memory? Why some regions will maintain DNA methylation memory but the site used for the analysis of epigenetic age by DNA methylation show changes in methylation status and therefore rejuvenation? The authors should discuss in more detail this important aspect of their data.

      This is a very interesting point. Cell identity CpG sites (at fibroblast specific enhancers) and age-associated CpG sites (used to calculate epigenetic age) are distinct. In addition, DNA methylation has different dynamics at different times during reprogramming, with cell identity CpG sites changing methylation during the stabilisation phase, and age-associated CpG sites changing methylation during the maturation phase. This suggests these sites may be regulated by different mechanisms. In addition, methylation changes at age-associated CpG sites include both gain and loss of methylation, implying these are targeted methylation changes rather than global changes. These points have been highlighted in the discussion section.

      • In Figure 3A the changes induced by reprogramming in the PC2 direction look orthogonal to the aging changes observed in PC1. Could the authors maybe use a more stringent criteria for the selection of age-related genes. How do they explain the direction of these changes? In the same line, have the authors tested the effect of MPTRs in young fibroblasts? Could young fibroblasts be included in Figure 3C?

      Our method for identifying age-associated genes is already quite stringent as we have applied Bonferroni multiple testing correction to the p values generated by the Pearson correlation analysis.

      We have not yet attempted MPTR on young fibroblasts, this is certainly an interesting future direction but we feel it is beyond the scope of the current study. Instead, we have discussed this idea in the discussion section.

      Unfortunately, we cannot include young fibroblasts in figure 3C as they have been included in the training dataset for the transcription clock.

      • Based on the standard protocols used for the culture of fibroblasts using culture medium containing fetal bovine serum (FBS), I hypothesis that the recovery of cellular identity following reprogramming is mainly due to differentiation signals coming from factors present in the medium. For these reasons, knockout serum (KSR) is used at later stages (day 8) of reprogramming to allow generation of iPSCs. The authors should rule out the possibility that recovery of fibroblast identity is due to the culture of reprogramed fibroblasts in FBS containing medium. For this purpose, the authors should test whether the fibroblast identity can be recovered following doxycycline withdrawal by culturing the cell in KSR or 1% FBS containing medium instead of 10% FBS. This is a very important concept for the message that the manuscript tries to communicate regarding an epigenetic memory responsible for the recovery of fibroblast identity.

      The effect of FBS in promoting the return to fibroblast identity is an interesting possibility. As mentioned above in response to the essential revisions, we attempted to investigate this by growing cells in fibroblast medium containing 10% KSR instead of 10% FBS after withdrawal of doxycycline, however, KSR containing medium was unable to support long term culture of fibroblasts. In addition, we examined the effect of 1% FBS on fibroblasts and found that their growth rate was substantially impeded, which would be major confounder.

      Minor points: - The authors claim that their maturation phase transient reprogramming protocol (MPTRs) induces further rejuvenation compared to previous studies because of the use of a longer induction timeline. Nevertheless, the authors mentioned that cells took around 50 days to reach a fully pluripotent state. This is a very long timeline to reach pluripotency compared to previous studies inducing reprogramming in mouse or human fibroblasts where typically iPSCs can be generated after 2 or 3 weeks following expression of the reprogramming factors. In the same line, secondary systems based on the use of cells isolated from transgenic mice carrying a cassette for the expression of Yamanaka factors have been shown to be very rapid and efficient in the generation of iPSCs. For these reasons, the authors should not use during their discussion a direct comparison with previous studies based just on time of induction since it is possible than systems with a more efficient or higher expression of the reprogramming factors and therefore a much more rapid alteration of cell fate have been used in previous studies.

      We could isolate iPSCs that could be cultured without doxycycline earlier than this at 30 days of reprogramming. However, we analysed iPSCs after 50 days of reprogramming to ensure the stabilisation phase had been completed and donor memory was erased. We agree that other reprogramming systems can achieve pluripotency faster than lentiviral based systems so we have acknowledged this in the discussion section.

      • In Figure 1D, principal component analysis shows "Not reprogramming" fibroblasts represented by a cross that cluster between young, old, and rejuvenated fibroblasts. What cells are the authors referring to? Are these cells represented in the diagram included in Figure 1B? Why do they cluster differently compared to the other fibroblasts?

      The ‘not reprogramming cells’ in figure 1E are reference data samples from our Sendai virus reprogramming experiment that were annotated as ‘failing to reprogram’ based on surface markers. Therefore, these samples are not in figure 1B. Like the ‘failing to reprogram intermediates’, they may be expressing the Yamanaka factors, and have therefore moved along PC1 relative to fibroblasts, but they have failed to upregulate SSEA4. We have clarified this in the figure legend and methods section.

      • The authors propose in the discussion that the use of transient reprogramming protocols like the one presented in the manuscript could be used in vivo to safely induce the rejuvenation of tissues and organs. Although previous studies have shown that this in fact the case, the authors should be aware that the recovery of cellular identity might be easier to achieve in vitro where differentiation signals like the ones in FBS are present during in vitro culture but not present anymore in an adult fully differentiated animal. For these reasons, the authors should be cautious when discussing the potential of in vivo application of these approaches and their safety.

      This is an interesting point and we have clarified that our method may be suitable for ex vivo approaches where our method is performed on cells in vitro.

      • The figure legend of Supplementary Figure 2B shows iPSC but this population is not present in the figure.

      The figure legend for supplementary figure 2B has been corrected to remove the iPSC group, which the reviewer correctly noticed is not present in the heatmap.

    1. Author Response:

      Reviewer #3 (Public Review):

      The authors aim to understand the functional roles of Atm transporters, a family of ABC transporters with cellular localization of mitochondria and play important roles in transition metal homeostasis. To understand their functions, the author captured the structures of Atm3 from plant Arabidopsis in several functional states by cryo-EM--inward-facing, inward-facing with substrate GSSH bound, nucleotide bound closed state and nucleotide bound outward-facing states. Although many of those functional states have been reported for Atm orthologues in other species, the authors did elegant analyses on how the rareness of cysteine residues in the transport pathways could be very important for efficient glutathione transport. Moreover, the authors systematically show the unlikelihood of capturing an ABC transporter with both substrate and nucleotides bound. Although conceptually, a functional state with substrate and nucleotides bound should be very transient to avoid backflow of the substrate, it is wonderful that the authors put in the effort to capture such state and present a systematic principal component analysis (PCA) to show it may not be possible to attain such state structurally. The usage of PCA on analyzing a plethora of ABC transporters with different functional states could be applicable to other systems. The conclusions of this paper are mostly well supported by data, with some aspects of analysis and discrepancy that need to be addressed.

      1) In Figure 1, the authors report the ATPase activities of AtAtm3 are quite different in detergent and membrane environments, with a more than 10-fold decrease in basal activity from detergent to POPC containing nanodiscs. Is this composition a good representation of Arabidopsis mitochondria membrane? The nanodisc belts in Figure S4 looks quite tight. Could that contribute to the decrease in ATPase activity? In one of the papers you cited, Schaedler et. al, 2014, ATM3's activity is not stimulated by GSH versus in your data, 10mM GSH strongly stimulated AtAtm3. How would you explain the discrepancy? In terms of data analysis, the fit for 10mM GSH and 2.5mM GSSG ATPase activity in nanodiscs is poor visually, the addition of goodness of the fit in the figure could help the reader to assess the real quality of the data.

      In our experience, the ATPase rates of ABC transporters are highly dependent on the precise solubilization conditions, particularly for detergents as some detergents/detergent combinations are stimulatory, while others are inhibitory. In addition, the POPC used in the nanodiscs might not be the best substitute of the natural lipids that are found in the mitochondrial membrane. Given the variability that has been observed in ATPase activities for different solubilization conditions, we do not know which condition more likely reflects the true physiological activity. (As an aside, we would have thought that basal ATPase rate should be zero in the absence of transported substrate to avoid uncoupled (and presumably wasteful) ATPase activity, but this has been rarely reported).

      The Schaedler paper used GSH at two concentrations, 1.7 and 3.3 mM, to test the stimulation of ATPase activity. In our study, we used 10 mM GSSG for the ATPase activity assay to be consistent with our previous studies on NaAtm1. This discrepancy presumably reflects concentration differences utilized in these studies, and other experimental conditions including the nature of the protein construct. Different expression systems were used in the two studies, Schaedler et al employed L. lactis, while we used E. coli; it is possible that different lipids co-purified with the proteins that could influence the ATPase activity. In addition, we used a truncation with an 80-residue deletion, while Schaedler et al utilized a 60-residue truncation; our initial characterization of various constructs showed that the 80-residue deletion had a higher ATPase activity, although we did not characterize these differences in detail – certainly not for publication. While both studies used AtAtm3 solubilized in DDM, it is possible that there are still differences in the solubilization conditions reflecting the precise way in which detergents are introduced. We have added a sentence noting this discrepancy in the GSH stimulation results, without ascribing a particular mechanism.

      We have added the goodness of fit to the figure.

      2) There are a couple of structural studies on Atm transporters available to date, and the authors did a couple of overall structural comparisons in Figure S3 and S5. However, it is still not clear what exactly those systems are similar or dissimilar, and what are new structural insights gained with these structures. It could be helpful to compare the substrate-binding sites side by side or have a cartoon representation of different functional states in different systems. The authors brought this reader's attention to "a ~20 amino acid loop between TM1 and Tm2 of AtAtm3 that would be positioned in the intermembrane space and is absent from the structures of ScAtm1 and NaAtm1" without further explanation. Does this loop have any proposed functional roles? Is it present in other Atm or ABC transporters?

      We included a new figure, Figure S4, comparing different homologs, AtAtm3, human ABCB7 (an Alphafold model), human ABCB6, yeast Atm1 and prokaryotic Atm1. Among these five structures, only AtAtm3 presents a loop between TM1 and TM2. A quick protein sequence blast in PubMed showed the presence of this loop in many other plant atm transporters. This long loop has not been observed in other structure we know so far, but an external helix was observed for PglK, a lipid-linked oligosaccharide flippase, and this loop has been implicated in substrate interaction (Perez et al. Nature 524, 433 (2015); doi: 10.1038/nature14953).

  2. Feb 2022
    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors challenge the long-standing conclusion that Orco and IR-dependent olfactory receptor neurons are segregated into subtypes such that Orco and IR expression do not overlap. First, the authors generate new knock-in lines to tag the endogenous loci with an expression reporter system, QF/QUAS. They then compare the observed expression of these knock-ins with the widely used system of enhancer transgenes of the same receptors, namely Orco, IR8a, IR25a, and IR76b. Surprisingly, they observe an expansion of the expression of the individual knock-in reporters as compared to the transgenic reporters in more chemosensory neurons targeting more glomeruli per receptor type than previously reported. They verify the expression of the knock-in reporters with antibody staining, in situ hybridization and by mining RNA sequencing data.

      Finally, they address the question of physiological relevance of such co-expression of receptor systems by combining optogenetic activation with single sensillum recordings and mutant analysis. Their data suggests that IR25a activation can modulate Orco-dependent signaling and activation of olfactory sensory neurons.

      The paper is well written and easy to follow. The data are well presented and very convincing due in part to the combination of complementary methods used to test the same point. Thus, the finding that co-receptors are more broadly and overlappingly expressed than previously thought is very convincing and invites speculation of how this might be relevant for the animal and chemosensory processing in general. In addition, the new method to make knock-ins and the generated knock-ins themselves will be of interest to the fly community.

      We thank the reviewer for their enthusiasm and support of our work!

      The last part of the manuscript, although perhaps the most interesting, is the least developed compared to the other parts. In particular, the following points could be addressed:

      • It would be good to see a few more traces and not just the quantifications. For instance, the trace of ethyl acetate in Fig. 6C, and penthyl acetate for 6G.

      Thank you for the suggestion. We have added a new figure supplement (Figure 6-Figure Supplement 3) with additional example traces for all odorants from Figure 6 for which we found a statistically significant difference between the two genotypes (Ir25a versus wildtype).

      • In Fig. 4D, the authors show the non-retinal fed control, which is great. An additional genetic control fed with retinal would have been nice.

      For these experiments, we followed a standard practice in Drosophila optogenetics to test the same experimental genotype in the presence or absence of the essential cofactor all-trans-retinal. This controls for potential effects from the genetic background. It is possible our description of these experiments was unclear (as also suggested by comments from Reviewer 2). As such, we have clarified our experimental design for the optogenetic experiments in the revised manuscript:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      • It appears that mostly IR25a is strongly co-expressed with other co-receptors. The provided experiments suggest a possible modulation between IR25a and Orco-dependent neuronal activity. However, what does this mean? How could this be relevant? And moreover, is this a feature of Drosophila melanogaster after many generations in laboratories?

      We share this reviewer’s excitement regarding the numerous questions our work now raises. While testing additional functional ramifications of chemosensory co-receptor expression is beyond the scope of this work (but will undoubtedly be the focus of future studies), we did expand on what this might mean in the revised Discussion section of the revised manuscript. Previously, we had raised the hypothesis that chemoreceptor co-expression could be an evolutionary relic of Ir25a expression in all chemoreceptor neurons , or a biological mechanism to broaden the response profile of an olfactory neuron without sacrificing its ability to respond to specific odors. We now extend our discussion to raise additional possible ramifications. For example, we suggest that modulating Ir25a coexpression could alter the electrical properties of a neuron, making it more (or possibly less) sensitive to Orco-dependent responses. We also suggest that Ir25a coexpression might be an evolutionary mechanism to allow olfactory neurons to adjust their response activities. That is, that most Orco-positive olfactory neurons are already primed to be able to express a functional Ir receptor if one were to be expressed. Such co-expression in some olfactory neurons might present an evolutionary advantage by ensuring olfactory responses to a complex but crucial biologically relevant odor, like human odors to some mosquitoes.

      Reviewer #2 (Public Review):

      In the present study, the authors: 1) generated knock-in lines for Orco, Ir8a, Ir25a, and IR7ba, and examined their expression, with a main focus on the adult olfactory organs. 2) confirmed the expression of these receptors using antibody staining. 3) examined the innervation patterns of these knock-in lines in the nervous system. 4) identified a glomerulus, VM6, that is divided into three subdivisions. 5) examined olfactory responses of neurons co-expressing Orco and Ir25a

      The results of the first four sets of experiments are well presented and support the conclusions, but the results of the last set of experiments (the electrophysiology part) need some details. Please find my detailed comments below.

      We thank the reviewer for their support of our work and appreciating the importance of our findings. In the revised manuscript, we now provide the additional experimental details for the electrophysiology work as requested.

      Major points

      Line 167-171: I wonder if the authors also compared the Orco-T2A-QF2 knock-in with antibody staining of the antenna.

      We did perform whole-mount anti-Orco antibody staining on Orco-T2A-QF2 > GFP antennae (example image below). We saw broad overlap between Orco+ and GFP+ cells, similar to the palps. However, we did not include these results since quantification of these tissues is challenging for the following reasons:

      1. There are ~1,200 olfactory neurons in each antenna, many of which are Orco+.
      2. The thickness of the tissue makes determinations of co-localization difficult in wholemount staining.
      3. Co-localization is further complicated by the sub-cellular localization of the signals: Orco antibodies preferentially label dendrites and weakly label cell bodies, while our GFP reporter is cytoplasmic and preferentially labels cell bodies. For these reasons, we focused on the numerically simpler palps for quantification. For the Ir8aT2A-QF2 and Ir76b-T2A-QF2 lines, palp quantification was not an option as neither knock-in drove expression in the palps (and the available antibodies did not work with the whole-mount staining protocol). This is why we performed antennal cryosections to validate these lines. Below is an example image of the antennal whole-mount staining in the Orco-T2A-QF2 knock-in line, illustrating the quantification challenges enumerated above.

      *Co-staining of anti-Orco and GFP in Orco-T2A-QF2 > 10xQUAS-6xGFP antenna *

      Lines 316-319 (Figure 4D): It would be better if the authors compare the responses of Ir25a>CsChrimson to those of Orco>CsChrimson.

      The goal of the optogenetic experiments was to provide experimental support for Ir25a expression in Orco+ neurons in an approach independent to previous methods. Our main question was whether we could activate what was previously considered Orco-only olfactory neurons using the Ir25a knock-in. These experiments were not designed to determine if this optogenetic activation recapitulated the normal activity of these neurons. For these reasons, we did not attempt the optogenetic experiments with Orco>CsChrimson flies.

      Line 324-326: Why the authors tested control flies not fed all-trans retinal? They should test Ir25a-T2A-QF2>QUAS-CsChrimson not fed all-trans retinal as a control.

      We apologize for the confusion. The “control” flies we used were indeed Ir25a-T2AQF2>QUAS-CsChrimson flies not fed all-trans retinal as suggested by the reviewer. This detail was in the methods, yet likely was not clear. We have amended the main text in multiple locations to state the full genotype of the control fly more clearly:

      Modified text: “No light-induced responses were found in control flies, which had the same genotype as experimental flies but were not fed all-trans retinal (-ATR), a necessary co-factor for channelrhodopsin function (see Methods).” and “Bottom trace is control animal, which has the same genotype as the experimental animal but was not fed the required all-trans retinal cofactor (-ATR).”

      Figure 4-Figure Supplement 1 legend: “In all optogenetic experiments, control animals have the same genotypes as the corresponding experimental animals but have not been fed all-trans retinal.”

      Methods: “For all optogenetic experiments, the control flies were of the same genotype as experimental flies but had not been fed all-trans retinal.”

      Line 478-500: I wonder if the observed differences between the wildtype and Ir25a2 mutant lines are due to differences in the genetic background between both lines. Did the authors backcross Ir25a2 mutant line with the used wildtype for at least five generations?

      Yes, the mutants are outcrossed into the same genetic background as the wildtypes for at least five generations. Please see Methods, revised manuscript: “Ir25a2 and Orco2 mutant fly lines were outcrossed into the w1118 wildtype genetic background for at least 5 generations.”

      Line 1602-1603: Does the identification of ab3 sensilla using fluorescent-guided SSR apply for ab3 sensilla in Orco mutant flies. How does this ab3 fluorescent-guided SSR work?

      In fluorescence guided SSR (fgSSR; Lin and Potter, PloS One, 2015), the ab3 sensilla is GFPlabelled (genotype: Or22a-Gal4>UAS-mCD8:GFP), which allows this sensilla to be specifically identified under a microscope and targeted for SSR recordings. We generated fly stocks for fgSSR identification of ab3 in all three genetic backgrounds (wildtype, Orco mutant, Ir25a mutant).

      These three genotypes are described in the methods:

      “Full genotypes for ab3 fgSSR were:

      Pin/CyO; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (wildtype),

      Ir25a2; Or22a-Gal4,15XUAS-IVS-mcd8GFP/TM6B (Ir25a2 mutant),

      Or22a-Gal4/10XUAS-IVS-mcd8GFP (attp40); Orco2 (Orco2 mutant).”

      Line 1602-1604: There is no mention of how the authors identified ab9 sensilla.

      Information on the identification of ab9 sensilla is under the optogenetics section of the methods: “Identification of ab9 sensilla was assisted by fluorescence-guided Single Sensillum Recording (fgSSR) (Lin and Potter, 2015) using Or67b-Gal4 (BDSC #9995) recombined with 15XUAS-IVS-mCD8::GFP (BDSC #32193).”

      Line 1648: what are the set of odorants that were used to identify the different coeloconic sensilla?

      We have added the specific odorants used for sensillar identification for coeloconic SSR in the Methods. The protocol and odorants used were:

      *2,3-butanedione (BUT), 1,4-diaminobutane (DIA), Ammonia (AM), hexanol (HEX), phenethylamine (PHEN), and propanal (PROP) to distinguish coeloconic sensilla:

      o Wildtype flies: Strong DIA and BUT responses identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3, absence of PHEN response further rules out ac4.

      o Ir25a mutant flies (amine responses lost, so cannot use PHEN and DIA as diagnostics): Strong BUT response and moderate PROP response identify ac2 and rule out ac4. Absence of strong AM response rules out ac1, absence of HEX response rules out ac3. Ac4 is further ruled out anatomically based on sensillar location compared to ac2.

      Revised text: “Different classes of coeloconic sensilla were identified by their known location on the antenna and confirmed with their responses to a small panel of diagnostic odorants: in wildtype flies, ac2 sensilla were identified by their strong responses to 1,4-diaminobutane and 2,3-butanedione. The absence of a strong response to ammonia was used to rule out ac1 sensilla, the absence of a hexanol response was used to rule out ac3 sensilla, and the absence of a phenethylamine response was used to rule out ac4 sensilla. In Ir25a mutant flies in which amine responses were largely abolished, ac2 and ac4 sensilla were distinguished based on anatomical location, as well as the strong response of ac2 to 2,3-butanedione and the moderate response to propanal (both absent in ac4). Ac1 and ac3 sensilla were excluded similarly in the mutant and wildtype flies. No more than 4 sensilla per fly were recorded. Each sensillum was tested with multiple odorants, with a rest time of at least 10s between applications.

    1. Reviewer #1 (Public Review):

      In this manuscript, Yang et al. trained monkeys to play the classic video game Pac-Man and fit their behavior with a hierarchical decision making model. Adapting a complex behavior paradigm, like Pac-Man, in the testing of NHP is novel. The task was well-designed to help the monkeys understand the task elements step-by-step, which was confirmed by the monkeys' behavior. The authors reported that the monkeys adopted different strategies in different situations, and their decisions can be described by the model. The model predicted their behavior with over 90% accuracy for both monkeys. Hence, the conclusions are mostly supported by the data. As the authors claimed, the model can help quantify the complex behavior paradigm, providing a new approach to understanding advanced cognition in non-human primates. However, several aspects deserve clarification or modification.

      1. The results showed that the monkeys adopted different strategies in different situations, which is also well described by the model. However, the authors haven't tested whether the strategy was optimal in a given situation.

      Our approach to analyze monkeys’ behavior is not based on optimality. Instead, we centered around the strategies and showed that they described the monkeys’ behavior well. The model and its fitting process does not assume the monkeys were optimizing for something. Nevertheless, the fitting results suggested that the strategies that the monkeys chose were rational, which suggests validity of our model. As we have pointed out above, optimality is hard to define in such a complex game. In particular, most of the game is about collecting pellets, strategies that are only used in a small portion of the game can be ignored when searching for optimal solutions. We feel that further analyses on the issue of optimality would dilute the center message of the paper and choose not to include them here.

      According to the results, the monkeys didn't always perform the task in an optimal way, as well. Most of the time, the monkeys didn't actively adopt strategies in a long-term view. They were "passively" foraging in the task: chasing benefit and avoiding harm when they were approached. This "benefit-tending, harm-avoiding" instinct belongs to most of the creatures in the world, even in single-cell organisms. When a Paramecium is placed in a complex environment with multiple attractants and repellents, it may also behave dynamically by adopting a linear combination of basic tending/avoiding strategies, although in a simpler way. In other words, the monkeys were responding to the change of environment but not actively optimizing their strategy to achieve larger benefits with fewer efforts. The only exception is the suicides. Monkeys were proactively taking short-term harms to achieve large benefits in the future.

      One possible reason is that the monkeys didn't have enough pressure to optimize their choices since they will eventually get all the rewards no matter how many attempts they make. The only variable is the ghosts. Most of the time, the monkeys didn't really choose between different targets/ strategies. They were making choices between the chasing order of the options, but not the options themselves. It is similar to asking a monkey to choose either to eat a piece of grape or cucumber first, but not to choose one and give up the other one. A possible way to avoid this is to stop the game once the ghost catches the Pac-Man or limit each game's time.

      The game is designed to force the players to make decisions quickly to clear the pellets, otherwise the ghosts would catch Pac-Man. Even in the monkey version of the game where the monkeys always get another chance, Pac-Man deaths lead to long delays with no rewards. They will not be able to complete the game if they do not actively plan their route, especially in the late stage when they must reach the scarcely placed dots while escaping from the ghosts. In addition, we provided additional rewards when a maze is cleared in fewer rounds (20 drops if in 1 to 3 rounds; 10 drops if in 4 to 5 rounds; and 5 drops if in more than 5 rounds), which added motivation for the monkeys to complete a game quickly.

      The monkeys’ behavior also suggested that they did not just adopt a passive strategy. Our analyses of the planned attack and suicide behavior clearly demonstrated that the monkeys actively made plans to change the game into more desirable states. Such behavior cannot be explained with a passive foraging strategy.

      2. It is well known that the value of an element is discounted by time and distance. However, in the model, the authors didn't consider it. A relevant problem will be the utility of the bonus elements, including the fruits and scared ghosts. Their utilities were affected not only by their value defined by the authors but also by effects, including their novelty and sense of achievement when they were captured, as the ghosts attracted relatively much more attention than the other elements (considering the number is 2 for them, see in figure 3E).

      These are good points, and our strategies could be built with more complexity to account for other potential factors. However, we focused our investigation on how to account for monkeys’ behavior with a set of strategies. A set of simple strategies with a small number of parameters would make a strong argument.

      Using a complex game such as Pac-Man allows us to investigate all of these interesting cognitive processes. We can certainly look at them in the future.

      3. The strategies are not independent. They are somehow correlated to each other. It may result in, in some conditions, false alarming of more strategies than the real, as shown in figure 2A.

      We have computed the Pearson correlations between the action sequences chosen with each basis strategy within each coarse-grained segment determined by the two-pass fitting procedure. As a control, we computed the correlation between each basis strategy and a random strategy, which generates action randomly, as a baseline. Most strategy pairs' correlations were lower than the random baseline. The results were now included in Supplementary (Appendix Figure 3).

      Sometimes two strategies may give exactly the same action sequence in a game segment. To deal with this problem, now we include an extra step when we fit the model to the behavior, which was described in Methods:

      “To ensure that the fitted weights are unique (Buja et al., 1989) in each time window, we combine utilities of any strategies that give exactly the same action sequence and reduce multiple strategy terms (e.g., local and energizer) to one hybrid strategy (e.g., local+energizer). After MLE fitting, we divide the fitted weight for this hybrid strategy equally among the strategies that give the same actions in the time segments.“

      Moreover, as the reviewer correctly reasoned, correlations between the strategies would yield possibly more strategies. However, our finding is that the monkeys were using a single strategy most of the time. This possible false alarm would go against our claim. Our conclusions stand despite the strategy correlations.

      It is hard to believe that a monkey can maintain several strategies simultaneously since it is out of our working memory/attention capacity.

      Exactly, and we are among the first to quantitatively demonstrate that the monkeys’ mostly relied on single strategies to play the game.

      Reviewer #2 (Public Review):

      In this intriguing paper, Yang et al. examine the behaviors of two rhesus monkeys playing a modified version of the well-known Pac-Man video game. The game poses an interesting challenge, since it requires flexible, context-dependent decisions in an environment with adversaries that change in real time. Using a modeling framework in which simple "basic" strategies are ensembled in a time-dependent fashion, the authors show that the animals' choices follow some sensible rules, including some counterintuitive strategies (running into ghosts for a teleport when most remaining pellets are far away).

      I like the motivation and findings of this study, which are likely to be interesting to many researchers in decision neuroscience and animal behavior. Many of the conclusions seem reasonable, and the results are detailed clearly. The key weakness of the paper is that it is primarily descriptive: it's hard to tell what new generalizable knowledge we take away from this model or these particular findings. In some ways, the paper reads as a promissory note for future studies (neural or behavioral or both) that might make use of this paradigm.

      I have two broad concerns, one mostly technical, one conceptual:

      First, the modeling framework, while adequate, is a bit ad hoc and seems to rely on many decisions that are specific to exactly this task. While I like the idea of modeling monkeys' choices using ensembling, the particular approach taken to segment time and the two-pass strategy for smoothing ensemble weights is only one of many possible approaches, and these decisions aren't particularly well-motivated. They appear to be reasonable and successful, but there is not much in the paper to connect them with better-known approaches in reinforcement learning (or, perhaps surprisingly, hierarchical reinforcement learning) that could link this work to other modeling approaches. In some ways, however, this is a question of taste, and nothing here is unreasonable.

      Thanks for the suggestion. In the new revision, we include a linear approximate reinforcement learning model (LARL) (Sutton, 1988; Tsitsiklis & Van Roy, 1997). The LARL model shared the same structure with a standard Q-learning algorithm but used the monkeys’ actual joystick movements as the fitting target. The model, although computationally more complex than the hierarchical mode, achieves a worse fitting performance.

      Second, there is an elision here of the distinction between how one models monkeys' behavior and what monkeys can be said to be "doing." That is, a model may be successful at making predictions while not being in any way a good description of the underlying cognitive or neuroscientific operations. More concretely: when we claim that a particular model of behavior is what agents "actually do," what we are usually saying is that (a) novel predictions from this model are born out by the data in ways that predictions from competing models are not (b) this model gives a better quantitative account of existing data than competitors. Since the present study is not designed as a test of the ensembling model (a), then it needs to demonstrate better quantitative predictions (b).

      We concede to the point that our model, while fitting to the behavior well, does not directly prove that the monkeys actually solved the task in this way. The eye movement and pupil dilation analyses partly addressed this issue, as their results were consistent with what one would expect from the model. We also hope future recording experiments will provide neural evidence to support the model.

      But the baselines used in this study are both limited and weak. A model crafted by the authors to use only a single, fixed ensemble strategy correctly predicts 80% of choices, while the model with time-varying ensembling predicts roughly 90%. This is a clear improvement and some evidence that *if* the animals are ensembling strategies, they are changing the ensemble weights in time. But there is little here in the way of non-ensemble competitors. What about a standard Q-learning model with an inferred reward function (that is, trained to replicate monkeys' data, not optimal performance). The perceptron baseline as detailed seems very poor as a control given how shallow it is. That is, I'm not convinced that the authors have successfully ruled out "flat" models as explanations of this behavior, only found that an ensembled model offers a reasonable explanation.

      We hope the new LARL model provides a better baseline control as a flat model. It performs better than the perceptron, yet much worse than our hierarchical model. Yet, we must point out that any hierarchical models can be matched in performance with a flat model in theory (Ribas-Fernandes et al., 2011). The advantage of hierarchical models mainly lies in their smaller computational cost for efficient planning. Even in a much simpler task such as a four-room navigation task, a hierarchical model can plan much faster than a flat model, especially under conditions with limited working memory (M. Botvinick & Weinstein, 2014). Our Pac-Man task contains an extensive feature space while requiring real-time decision-making. The result is that a reasonably performing flat model would go beyond the limits of the cognitive resources available in the brain. Even for a complex flat model such as Deep Q-Network (it can be considered to be similar a flat model since it does not explicitly plan with temporal extended strategies (Mnih et al., 2015)), the game performance is much worse than a hierarchical model (Van Seijen et al., 2017). The performance of the monkeys was unlikely to be achieved with a flat model. In addition, we trained the monkeys by introducing the game concepts gradually, with each training stage focusing on certain game aspects. The training procedure may have encouraged the monkeys to generalize the skills acquired in the early stages and use them as the basis strategies in the later training stages when the monkeys faced the complete version of the Pac-Man task.

      Reviewer #3 (Public Review):

      Yang and colleagues present a tour de force paper demonstrating non-human primates playing a full on pac-man video game. The authors reason that using a highly complex, yet semi controlled video game allows for the analysis of heuristic strategies in an animal model. The authors perform a set of well motivated computational modeling approaches to demonstrate the utility of the experimental model.

      First, I would like to congratulate the authors on training non-human primates to perform such a complex and demanding task and demonstrating that NHP perform this task well. From previous papers we know that even complex AI systems have difficulty with this task and extrapolating from my own failings in playing pac-man it is a difficult game to play.

      Overall the analysis approach used in the paper is extremely well reasoned and executed but what I am missing (and I must add is not needed for the paper to be impactful on its own) is a more exhaustive model search. The deduction the authors follow is logically sound but builds very much on assumptions of the basic strategy stratification performed first. This means that part of the hierarchical aspect of the behavioral strategies used can be attributed to the heuristic stratification nature of the approach. I am not trying to imply that I do not think that the behavior is hierarchically organized but I am implying that there is a missed opportunity to characterize that hierchical'ness (maybe in a graph theoretical way, think Dasgupta scores) further.

      All in all this paper is wonderful. Congratulations to the authors.

      We thank the reviewer for the encouraging comments. We have included a new flat model in the new revision for comparison against our hierarchical model and discussed other experimental evidence to support our claim.

    1. Author Response:

      Reviewer #1:

      The authors have developed a method (scQuint) for analyzing alternative splicing using scRNA-Seq. The method performs both visualization and clustering and also differential analysis, although the differential analysis modeling is not novel and has been adopted from a bulk RNAseq method (Leafcutter) and applied to pseudobulked data by grouping reads from cells within a cell type. Therefore, the method is not able to capture the true splicing variation at the single-cell level. Also, authors have only applied the method to Smart-Seq2 data and therefore it is not clear if their method is applicable to 10x data which has much higher throughout compared to Smart-Seq2 and is able to capture rare cell types but is more challenging for splicing analysis due to its 3' bias and lower coverage.

      Authors have applied their method to two mouse scRNA-Seq datasets: Tabula muris (from multiple tissues) and BICCN (from brain) and provided a comprehensive analysis of alternative splicing in mouse cell types. They have found that cell-type-specific splicing is ubiquitous in mouse cell types and splicing variation augments the total gene expression variation as there is little overlap between top differentially spliced and differentially expressed genes. They also found that a considerable fraction of cell-type-splicing events involve novel transcripts. They applied predictive machine learning models to show that cell types can be well distinguished by the splicing information and identifies relationships between the splicing changes in known splicing factors and the splicing changes in their target genes.

      The authors provide several biological findings regarding alternative splicing at cell-type-level and have shown how scRNA-Seq (despite being underutilized for splicing analysis so far) can expand our understanding of splicing mechanisms in single cells. Additionally, authors have made their data publicly available through interactive data browsers that can serve as a resource tool for future studies.

      We thank reviewer 1 for their thoughtful consideration of our manuscript and for their comments and recommendations. We wish to briefly address three remarks made in the reviewer’s summary. First, although our differential splicing model is based on that of LeafCutter, we make several substantive changes for the setting of single-cell experiments which affect its scalability and statistical performance. Furthermore, we do not use pseudobulked data with the exception of our splicing factor regression analysis at the end of the results section. These facts have been made more clear in the manuscript. Finally, we discuss the challenges of working with 10x data in comment 5 response.

    1. Author Response:

      Reviewer #2:

      This study examined the effect of land-use change on soil animal food webs in Sumatra, Indonesia using datasets of stable isotopes and metabolisms of 23 animal groups. They found that the calibrated 13C values of soil animals are generally higher in the rainforests compared to those in the plantations, and multidimensional metrics of the soil animal community (e.g., isotopic dispersion) differed across the land-use types. They also showed that the community food web metrics are influenced by environmental variables (e.g., soil pH and tree density and species richness. These results demonstrate that the conversion of rainforest to plantations could affect not only the above- but also the below-ground components of the ecosystems and likely the ecosystem functions.

      Strength:

      This is the first attempt to investigate the effect of land-use changes in soil food web structure with stable isotope and metabolism datasets. This study is based on a great amount of data of high quality (biomass and metabolism rates, high taxonomic resolution for some taxa) and collected from four land-uses, which were achieved through a well-organized project. It is also noteworthy that the measurement of stable isotopes of tiny soil invertebrates required the improvement of the continuous flow isotope ratio mass spectrometer. The collaborations among international groups and across different disciplines shows a direction to be followed in studies on global environmental issues.

      Weakness:

      The only weakness of this study is that the isotopic measurements were conducted on the high-rank taxonomic group level (order or family) based on an assumption that "the high-rank animal taxa in soil are generally consistent in their isotopic niches and reflect the trophic niches of species in most taxa" (L164-165). Although I am not sure if I understand this assumption correctly, the authors consider that different species in some taxa should have similar isotopic (trophic) niches. If so, it is difficult to accept this assumption because previous studies have already reported large variations in C and N isotope ratios (~ 6 permil) within the order or family (e.g., oribatids, collembola, ants, and earthworms). The isotopic variation exceeds those observed across the land-use types. Because of the considerable variation within the high-rank taxa, which species were included in the mixed samples (only 3 to 15 individuals for each taxonomic group, Line 176) should have affected the present results and thus the key claims in this paper. Therefore, it would be necessary to scrutinize whether the samples used for the isotopic analyses could represent the high-rank taxa and whether the isotopic values presented in this study are understandable in light of the previous knowledge of their biology. I suppose that this could be a main focus of this study. Based on the compiled isotopic datasets, previous work (Potapov et al. 2019, Funct. Ecol.) successfully shows that the high-rank taxa can be treated as trophic nodes in food web studies. However, the work does not demonstrate that the isotopic values of relatively few individuals of a high-rank taxon could be used as the representative values of the taxon.

      For the justification of using stable isotopes of high-ranks taxonomic groups please see our response to this point in ‘Essential Revisions’ above. For Oribatids and Collembola we had 15 individuals, and we include 829 samples to represent Oribatida, and 202 samples to represent Collembola. Soil ants and earthworms were typically represented by a single (few) dominant species per site in our study, which were represented with our measurements (see https://link.springer.com/10.1007/s10530-021-02539-y for earthworms and https://iopscience.iop.org/article/10.1088/1755-1315/771/1/012031 for ants).

    1. Author Response:

      Reviewer #1:

      In this work Warneford-Thomson et al. developed an approach for surveillance screening for SARS-CoV-2, which involves the isothermic amplification of a region of the SARS-CoV-2 nucleocapsid gene using RT-LAMP, followed by detection with deep sequencing. High-throughput and cost effectiveness is achieved by two sets of barcodes that allow up to about 37,000 samples to be combined into one deep sequencing run. Moreover, the authors demonstrate they can do the detection from saliva collected on paper, which should make sample collection easier.

      The main strength of the work lies in the technical aspects, including setting up multiple controls such as a detection of a human gene, and multiplexing with detection of the influenza virus.

      The main weakness is that there are multiple other papers either published or archived that use RT-LAMP for SARS-CoV-2 detection, deep sequencing for SARS-CoV-2 detection, or both. These are cited in the current work, which is very well written and presented. Whether this method is better than the others which have the same aim of developing cost-effective and high-throughput detection is not conclusively demonstrated as only 8 clinical saliva samples are examined.

      We do not wish to claim that our method is better than the others. We think it has advantages and disadvantages and certainly it should be further optimized before scaling it up to population level. We have added these considerations to the text (lines 376–80).

      Furthermore, the requirement for deep sequencing and batching many samples for cost-effectiveness will, in most situations, greatly increase turn-around time. This will make surveillance much less effective, since by the time results are fed back, the asymptomatically infected individual would have had more opportunity to transmit the infection to others.

      We argue that time from sample to result is a mostly a function of logistics and not of the method. With proper set ups the time from sample collection to results could be < 16 hours, which would be compatible with population-level surveillance. We added these considerations to the text.

      However, the deep sequencing step may be very useful for surveillance of circulating SARS-CoV-2 spike sequences to detect emerging variants within a population, provided this method can be modified to do it.

      We agree and we mention this possibility in the discussion.

      Reviewer #2:

      In 'COV-ID: A LAMP sequencing approach for high-throughput co-detection of SARS-CoV-2 and influenza virus in human saliva', Warneford-Thomson et al. present a novel methodology to perform large numbers of COVID-19 tests in parallel. Their approach takes unprocessed saliva and requires only a small number of experimental steps before the results are sequenced overnight to generate many thousands of results. This straightforward experimental design should allow the protocol to be expanded to a number of settings where population-level monitoring is required in order to contain outbreaks and reduce transmission. In this paper, the authors demonstrate the efficacy of their approach and perform a large number of benchmarking experiments to quantify its sensitivity, specificity and limitations of detection. They are able to detect artificially created infections (spike-ins) with as low as 5 virions per µL and all clinically available samples agreed with the standard RT-qPCR test. This method can detect both SARS-CoV-2 and Influenza infection and can also be applied to saliva samples which have been collected on filter paper, a strategy which will further simplify the testing regime.

      The authors have spent much time testing this approach but these have largely been limited to analysing artificially created infections. The only results which were obtained were from eight clinically derived samples which are presented in Figure 2E. Although all results from this approach agreed with the standard clinical test this is a small number of tests compared to the total number of tests which are reported in this paper. It is also only a small proof-of-principle experiment to justify a quick rollout of this technology.

      We have now performed COV-ID on 120 additional patient samples (new Figure 2-figure supplement 2). These new results are described in the text.

      The potential for this technology to perform rapid, high-throughput SARS-CoV-2 testing alongside the potential for very low sequencing costs (Figure 4G) is impressive. It is noted in the manuscript that this will require 96 unique barcodes but only 32 are tested here. All but three of these 32 work for the SARS-CoV-2 N2 primers and required STATH control but how will the remaining 67 primers be derived (i.e. is it realistic that this can be made to work to deliver the promise of this approach)?

      The current COV-ID patient barcodes are 5 base pairs long. This allows for 4^5 = 1,024 combinations. Out of an abundance of caution, we excluded barcodes with homology to the reverse complement of the RT-LAMP primers used in any of the experiments (i.e. primers for SARS-CoV-2 N2, STATHERIN, ACTIN, and influenza virus) and then selected a set of 32 with Hamming distances of at least 2 from each other. This is now described more in detail in the methods.

      Regarding the numbers, out of 1,024 5-bp barcodes, 404 were removed due to homology, leaving 620. Of these, we could find at least 163 with Hamming distance ≥ 2 from each other. Even with a substantial failure rate, this should allow for 96 working barcodes. If we had only considered clashes with N2 and STATHERIN primers, the number of available barcodes would be substantially higher.

      Overall, this is an interesting paper which has very clear real-world application to helping to defeat the ongoing COVID-19 pandemic, but some extra validations are needed to fully demonstrate its performance in clinical and/or public health settings.

    1. Author Response:

      Reviewer #1:

      The experiments are well designed, generally well controlled, and carefully conducted, and are thoughtfully and appropriately discussed. The authors make conclusions that are well supported by their results.

      When describing the aptamer knockdown of the PPS, the authors explain that the western blot was too noisy for monitoring the knockdown, which is frustrating for the reader and must have been frustrating for the authors. The authors instead counter-intuitively use qRT-PCR to monitor the transcript abundance of the PPS transcript in the aptamer system - this aptamer system is thought to be a modifier of protein, not transcription or transcript abundance. The authors describe that this has been seen once before (using aptamer knockdown of PfFis1), and the authors of that study speculate that the TetR-DOZI aptamer might be degrading the target mRNA. This is a plausible explanation, but it isn't quite clear from the description how this experiment was performed. The authors explain that the knockdown parasites grew normally for three days, but the parasites may be becoming sicker over this period. It's therefore possible that the decrease in PPS mRNA abundance is a product, rather than a cause of the growth defect. Sick or dying parasites could plausibly impact the PPS differently to the two chosen controls, particularly since both control genes chosen have substantially longer half-lives than the PPS mRNA (according to the Shock and DeRisi datasets). I therefore I suggest that this experiment be performed in an IPP rescue scenario (where the parasites aren't dying) with biological replicates. There is no explanation of the replicates here, but the error bars in 6C are implausibly small for real biological replicates.

      To address these concerns, we have added western blot data showing down-regulation of PPS expression in -aTc +IPP conditions, relative to a loading control. We have also repeated the growth assay and RT-qPCR experiment (in biological triplicate) under IPP-rescue conditions. Parasites samples harvested on day 3 of the IPP-rescue assay were analyzed by RT-qPCR and show reduced PPS mRNA abundance that is similar to (and slightly lower than) that observed without IPP supplementation. This similarity is not surprising to us, since the day 3 harvest in the original growth assay (without IPP) was 3 days before observing a parasite growth defect in -aTc conditions. With respect to the mechanism of transcript loss in the aptamer/TetR-DOZI system, the fate of transcripts in this system has not been investigated in depth. However, DOZI is believed to target bound mRNA to P-bodies, which are a known site of mRNA degradation in cells. We have unpublished data with multiple parasite proteins tagged with the aptamer/TetR-DOZI system. In all cases, we see strong reductions in mRNA abundance in -aTc conditions, suggesting that such decreases are a general property of this knockdown system.

      Line 342 "These results directly suggest that apicoplast biogenesis specifically requires synthesis of linear polyprenols containing three or more prenyl groups." - I think that this might be overinterpreting those results - there could be a number of different reasons why polyprenols of different sizes do or don't rescue, including different solubility, diffusion, availability of transporters, predisposition to break down to useable subunits. Perhaps this needs a caveat.

      We have modified the text here to remove “directly” and to acknowledge uncertainty in beta-carotene uptake: “Although it is possible that β-carotene is not taken up efficiently into the apicoplast, rescue by decaprenol, which is similar in size and hydrophobicity to β-carotene, suggests that apicoplast biogenesis specifically requires synthesis of linear polyprenols containing three or more prenyl groups.” We have also added the statement that “this hypothesis is further supported by additional results described in the next two sections”, referring to our identification of an apicoplast-targeted polyprenyl synthase.

      Line 361 " the cytosolic enzyme, PF3D7_1128400" - I don't think we know the localisation of this protein based on the published data. The Gabriel et al study makes it clear the protein isn't apicoplast or mitochondrial, but it is punctate at stages in a pattern that doesn't look to me to be a straightforward cytosolic localisation (and the original authors don't describe it as cytosolic).

      We agree that the localization of PF3D7_1128400 requires further investigation. The Gabriel study, which (surprisingly) is the only study we found that has examined localization of this protein by microscopy, observed diffuse signal in trophozoites consistent with cytoplasmic localization, in additional to focal, punctate signals in schizonts that were distinct from the apicoplast or mitochondrion. The authors described their results as, “Analysis by fluorescence microscopy of live parasites confirms expression along the intra-erythrocytic cycle and shows FPPS/GGPPS localization throughout the cytoplasm and also forming spots, which increase in number as parasites mature from trophozoite to schizont stages.” For simplicity we referred to FPPS/GGPPS localization as cytoplasmic but agree that available data suggest more a complex localization that requires further studies to understand. We have modified the text to indicate that available data suggests a complex cellular distribution that includes both the cytoplasm and additional sub-cellular foci outside the apicoplast and mitochondrion.

      Line 423 "with strong prediction of an apicoplast-targeting transit peptide but uncertainty in the presence of a signal peptide". I don't think this describes well the bioinformatic analysis of the N-terminus. Although the experimental data are convincing that this is an apicoplast-targeted protein, bioinformatically this would not be predicted as an apicoplast protein. There is no obvious signal peptide, and "uncertainty" is too vague a descriptor. None of the versions of signalP, nor psort, predict this as possessing a signal peptide (which by definition means that PlasmoAP absolutely rejects it), and there is no obvious hydrophobic segment at the N-terminus that we would normally expect of a signal peptide. The toxoplasma hyperlopit doesn't suggest that the Toxoplasma orthologue is apicoplast, and the protein isn't found in the Boucher et al apicoplast proteome. This is somewhat of a mystery. It doesn't diminish the solid localisation data, with the excellent complementary data from IFA as well as the doxycycline+IPP experiment, but it should be pointed out clearly that this localisation isn't to be expected from the sequence analysis.

      We thank the reviewer for this perspective and agree that SignalP is unable to identify a signal peptide at the N-terminus of PPS. We have modified the text to remove our description of “uncertainty” and explicitly state that SignalP is unable to identify a canonical signal peptide at the N-terminus of PPS.

      We note that multiple proteins detected in the Boucher et al. apicoplast proteome also lack an identifiable signal peptide by SignalP yet are clearly imported into the apicoplast. These proteins include the key MEP pathway enzymes DXR (PF3D7_1467300) and IspD (PF3D7_0106900), holo ACP synthase (PF3D7_0420200), FabB/F (PF3D7_0626300), and the E1 subunit of pyruvate dehydrogenase (PF3D7_1446400). Thus, apicoplast import despite lack of identifiable signal peptide by SignalP is not unique to PPS but general to multiple (if not numerous) apicoplast-targeted proteins. These observations suggest to us that protein N-termini in Plasmodium can have sequence properties compatible with ER targeting that are broader and more heterogenous than other eukaryotic organisms that comprise the training sets upon which SignalP is currently based. It remains a future challenge to fully understand these properties.

      With respect to the lack of PPS detection in the Boucher et al. apicoplast proteome, PPS appears to have a very low expression level and unusual solubility properties that require overnight extraction of parasite pellets in 2% SDS (or LDS) for detection. In our experience, the RIPA extraction conditions (which contained 0.1% SDS) used in the Boucher et al. study are insufficient to solubilize PPS, which may explain lack of PPS detection in their study.

      To explicitly address these questions regarding PPS targeting to the apicoplast, we have added a new section to the Discussion to explore PPS targeting in the absence of a recognizable signal peptide, its unusual solubility properties and lack of detection in the Boucher et al. proteome, and planned future studies to further test, refine, and understand targeting determinants.

      With respect to Toxoplasma, T. gondii appears to also express two polyprenyl synthase homologs, TGME49_224490 and TGME49_269430, that are ~30% identical (in homologous regions) to PF3D7_1128400 (FPPS/GGPPS) and PF3D7_0202700 (PPS), respectively. TGME49_224490 appears to be targeted to the mitochondrion in T. gondii (based on MitoProt and HyperLOPIT analysis), in contrast to its P. falciparum homolog, PF3D7_1128400, which localizes to the cytoplasm and other cellular foci outside the mitochondrion. TGME49_269430 does not appear to target the apicoplast in T. gondii (based on HyperLOPIT data), which contrasts with our determination of apicoplast targeting for the P. falciparum homolog, PF3D7_0202700. These differing localizations may suggest distinct cellular roles for these homologs in T. gondii compared to P. falciparum. We are also aware of a recent study (Pubmed 34896149) showing that loss of MEP pathway activity in T. gondii (due to loss of apicoplast ferredoxin) does not impact apicoplast biogenesis, in contrast to our observations in P. falciparum based on FOS treatment, DXS deletion, and PPS knockdown. These distinct phenotypes further suggest differences in isoprenoid utilization and metabolism between T. gondii and P. falciparum that remain to be understood. We have added a new section to the Discussion to address these considerations.

      The section after line 344 "Iterative condensation of DMAPP with IP…", up until line 377 doesn't sit well within the section that has the heading "Apicoplast biogenesis requires polyprenyl isoprenoid synthesis". I suggest either creating a separate subheading for this material, or moving it into the start of the subsequent section "Localization of an annotated polyprenyl synthase to the apicoplast.".

      We thank the reviewer for this suggestion, which we have followed. We have moved the referenced text to the beginning of the subsequent section to better align the text with that section heading.

      Reviewer #2:

      Minor comments:

      The authors emphasize that this study reveals a previously unnoted interconnection between apicoplast maintenance and pathways that produce an output from the apicoplast to serve the cell. But is the prevailing view really that these two are separate? Isn't the interconnection already clear from many other studies and observations? E.g., the fatty acids produced inside the apicoplast provide membrane- and lipid- precursors for the rest of the cell as well as for the apicoplast itself (Botte et al., PNAS, 2013) (although not essential in Plasmodium blood stages). Other pathways that function inside the apicoplast such as the Fe-S cluster synthesis are critical to support enzymes that provide exported metabolites (e.g., IPP synthesis, IspG/H) and function in maintenance (e.g., MiaB) (Gisselberg et al., PLoSPath, 2013). Perhaps the authors could tone this conclusion down and acknowledge that maintenance and output are interconnected in other cases, which have been acknowledged in the literature.

      We thank the reviewer for this perspective and agree that in Toxoplasma as well as in mosquito- and liver-stage Plasmodium there are multiple apicoplast outputs (i.e., metabolic products exported from the apicoplast) that contribute to parasite fitness, including IPP, fatty acids, and coproporphyrinogen III. To clarify, we are specifically referring to blood-stage Plasmodium in our manuscript, when heme and fatty acid synthesis are dispensable and where the prior literature has intensely focused on IPP as the key essential output of the blood-stage apicoplast and consistently stated that IPP is not required for organelle maintenance.

      We agree that prior work has firmly established that apicoplast housekeeping functions (e.g., synthesis of proteins and Fe-S clusters) are required for organelle maintenance and to support IPP synthesis. However, our work is the first to demonstrate in blood-stage Plasmodium that the reverse is also true- that IPP as an essential apicoplast output is also required for organelle maintenance and that apicoplast maintenance and IPP synthesis are thus reciprocally dependent. We have modified the Discussion section to clarify these points and to explicitly acknowledge that apicoplast maintenance and other metabolic outputs may also be interdependent in Toxoplasma and other Plasmodium stages.

      Could the authors elaborate more on the leader sequence predicting apicoplast localization for the PPS characterized here and discuss why it might have been missed in previous detailed study of apicoplast localised proteins (Boucher et al., PlosBiol, 2018)?

      Please see our response above to Reviewer #1.

      Could the authors discuss conservation of the PPS gene(s) in other Apicomplexa with (e.g., T. gondii) and without (e.g., Cryptosporidium spp.) an apicoplast? This could be relevant for other people in the field and could give further insights into the enzyme's role in apicoplast maintenance.

      Please see our response above to Reviewer #1. Polyprenyl synthases are diverse enzymes that perform a variety of cellular functions, whose specific roles can differ between organisms. Although the two Plasmodium prenyl synthases show preferential homology with each of two different prenyl synthase homologs in Toxoplasma and Cryptosporidium (CPATCC_003578 and CPATCC_001801), the differing localizations of these homologs in each parasite suggest differing cellular roles. The differing dependence of apicoplast biogenesis on MEP pathway activity in T. gondii and P. falciparum and the absence of an apicoplast in Cryptosporidium further support differences in isoprenoid utilization and metabolism in these organisms. We have added a new section to the Discussion to address these considerations.

      Reviewer #3:

      The paper is very nicely written and was a true pleasure to read. The introduction is concise yet dense with all relevant background of our current understanding of functioning of the apicoplast in relation to IPP production and utilization. The rational of the experiments and the interpretation of the results are presented clearly and everything is discussed well in the context of the current understanding of the field. The main conclusion of the paper that isoprenoid is not solely essential for critical functions elsewhere in the cell, such as prenylation-dependent vesicular trafficking but also for apicoplast biogenesis via its processing by an essential polyprenyl synthase conserved with plants and bacteria is well substantiated and very exciting. The authors demonstrate an equally beautiful and clever use of available and newly generated genetic mutants in combination with complementary pharmacological interventions and metabolic supplementation. There are no true major weaknesses that could jeopardize the conclusions or change the interpretation of the results. However, the authors do consistently perform statistical analyses on data obtained from individual cells obtained in no more than two independent experiments, which in my humble opinion does not qualify for statistical analysis. That said, the results are so clear-cut that no statistics are required to convince me, or to quote Ernest Rutherford: '"If your experiment needs statistics, you ought to have done a better experiment."

      We thank the reviewer for these positive comments and suggestions. For growth assays, we have performed a third biological replicate and updated those figures and the indicated statistical analyses. For microscopy experiments, we have removed p values.

    1. Author Response:

      Reviewer #1:

      Significance: A central puzzle in evolutionary biology (and philosophy of biology) is the evolution of new (collective) entities that can evolve on their own right (e.g. the evolution of multicellular organisms from single cells). These evolutionary transitions are often conceptualized in terms of fitness decoupling (a fitness increase of the collective even as the fitness of the component particles decreases). Using a life-history model, the authors show that fitness decoupling is not possible when the conditions for fitness are the same. Thus, this paper has the potential to change how we think about the evolution of new collective entities.

      Strengths: This paper is conceptually rich and the overall argument is clear. Re-analyzing previous data/models using their new framework highlights new patterns of fitness change in these transitions of individuality, and as such, it provides novel and exciting avenues of research.

      Weaknesses: While the overall argument is clear, some of the details can be hard to follow (even as someone familiar with the literature). The initial description of their model is fairly clear, but given its conceptual novelty, the paper does not spend enough time developing the different concepts of fitness at the particle level.

      Moreover, it is not entirely clear what is at stake: what is the role of fitness decoupling in our understanding of fitness transitions? And how does the proposed mechanistic ("trade-off breaking") model serve as a replacement? It seems to me like trade-off breaking is a characteristic of many evolutionary innovations, not only of major transitions. It seems even possible to envision groups that allow for an escape in a trade-off without leading to the evolution of a new "Darwinian" individual.

      For example, one could conceive of a trade-off in zebras between time spent foraging and protection against predators. Coming together temporarily as a group is likely to allow for values outside this trade-off space (similar to those in Fig. 6). One could even imagine a new mutation that makes zebras switch activities (foraging/watching) depending on their position within the group. This mutation is only available to zebras that form groups (the phenotype does not exist in the absence of a group). But I would still want to argue that there is more to the evolution of new levels of individuality. Trade-off breaking seems (potentially) a necessary, but not sufficient step in these transitions.

      And while the language of the authors is careful to not suggest sufficiency, it is not entirely clear how this approach helps us understand the particularity of these transitions.

      Reviewer #1 asks first to clarify the stakes: what is the role of fitness decoupling in the explanation and how does tradeoff-breaking replace or supplement it? Second, they requested us to make a statement about the necessary or sufficient nature of tradeoff-breaking.

      With respect to the second point, we argue that tradeoff breaking is not sufficient, but is probably necessary for an ETI to occur.

      Let us now clarify the role of fitness decoupling and tradeoff breaking in the explanation of ETIs. It must be stressed that tradeoff breaking does not “replace” fitness decoupling; rather, tradeoff breaking is an event that cannot be understood readily in the framework of fitness decoupling. Thus, we claim that ETIs are better understood when seen through the lenses of traits and the evolutionary constraints that link them (i.e., tradeoffs) than via the export-of-fitness model (i.e., fitness decoupling). To illustrate this, we use the zebra herd example proposed by the reviewer. Coming together temporarily as a collective does not, in itself, constitute a tradeoff-breaking event, but rather simply a collective-formation event (similar to the first ace2 mutation in snowflake yeast or the first WS mutation in the Pseudomonas system). From this starting point, a number of mutations (i.e., change in traits values) can be fixed in the population that improve the performance of zebras within this environment. This is the “fast” part of the evolutionary trajectory that occurs on the ancestral tradeoff, which we called “low hanging fruit mutations” in the manuscript. As a consequence, “optimal herds within the ancestral tradeoff” evolve. As stated in the manuscript, if we assume that the tradeoff on traits is identical for lone zebra and zebra herd and also assume that the ancestral lone zebra exhibit trait values that are optimal (within these constraints) for lone zebras, it follows that the low-hanging fruit mutations that improve the zebra herd will probably reduce counterfactual fitness. This lowering of counterfactual fitness is not due to a “transfer” between real and counterfactual fitness (because there is nothing to transfer between real and counterfactual worlds), but is a consequence of the differential contribution of the traits involved in the tradeoff to the two fitness quantities. However, this specificity of the tradeoff might be significant because it could lead to stabilisation of the new collectives through ratchetting.

      There is, indeed, “more to the evolution of new levels of individuality,” as pointed out by Reviewer #1. We claim that it involves rare mutations that would overcome the ancestral constraint and call them “tradeoff breaking mutations”. Tradeoff-breaking mutations are not bound by ancestral tradeoff; therefore, there is no a priori theoretical or biological reason to think they would have any positive or negative effect on counterfactual fitness. Here, we must stop using the zebra herd example because no tradeoff-breaking mutation occurred. However, the tradeoff-breaking lineages in the Pseudomonas example exhibit an improvement of both counterfactual and within-collective fitness. This observation does not fit within an export-of-fitness framework, but makes perfect sense in a traits-based view of ETIs—as a tradeoff-breaking mutation.

      Reviewer #2:

      This work reviews the influential "fitness decoupling" heuristic for understanding evolutionary transitions in individuality (ETIs), describes some of its limitations, and clarifies its interpretation. The review of the fitness decoupling account capably describes an interpretation of this framework that has frequently occurred in the literature, for example in Okasha 2006, Godfrey-Smith 2011, Hammerschmidt et al. 2014, Black et al. 2019, and Rose et al. 2020. However, it does not address the interpretation advanced by its authors, Richard Michod and colleagues, which they have clarified in several papers cited in the present work. Michod and colleagues have argued that the fitness decoupling account describes a changing relationship between the fitness of groups and the "counterfactual" fitness of their component cells, that is, the fitness the cells would have if they were removed from the group. This point is made explicitly in Shelton & Michod 2104 and Shelton & Michod 2020 and was present (though perhaps not as obvious) in Michod 2005 and later works, in contrast to the claim in the Glossary that this is a "relatively recent development of the fitness decoupling literature." The interpretation that Michod embraces is similar to what is here described as f2, the fitness of a "theoretical mono-particle collective", but that interpretation is not mentioned in the present work until Section 2.3. It is possible that an argument could be made that Michod and colleagues have not consistently interpreted fitness decoupling this way, or have made statements inconsistent with this interpretation, but no such argument is present in this work. Thus the impression conveyed is that Michod and colleagues consider decoupling of "commensurably computed fitnesses" possible, which is counter to their explicit statements on the topic.

      The description of the limitations of the fitness decoupling heuristic (Section 2) is useful and goes a considerable distance toward clarifying the ways in which fitness decoupling can rigorously be interpreted. However, the final assessment (Section 2.3) does not make a compelling case for its central argument, the lack of utility of the fitness decoupling concept. Elsewhere in the work, the ratcheting model of Libby and colleagues is referenced in comparison to the tradeoff-breaking approach, but Section 2.3 does not acknowledge the relationship between Libby and colleagues' model and the counterfactual interpretation of the fitness decoupling heuristic. For example, the argument in Libby and Ratcliff 2014 that "If any of the yeast that evolved high rates of apoptosis within clusters were to leave the group and revert to a unicellular lifestyle, they would find themselves at a competitive disadvantage relative to other, low-apoptosis unicellular strains." and in Libby et al. 2016 that "…if G cells were to revert to unicellular I cells, they would be quickly outcompeted" are counterfactual fitness arguments essentially similar to that of Shelton and Michod 2020 that "the fitness a cell would have on its own declines as the transition progresses." Section 2 makes a convincing case that commensurable fitnesses cannot be decoupled, but by fixating on commensurability, which is not relevant to the counterfactual interpretation of fitness decoupling, Section 2.4 fails to make a convincing case that "fitness-decoupling observations do little to clarify the process of an ETI." That is, "because they are not commensurable" does little to explain why the counterfactual interpretation of fitness decoupling "does little on its own to clarify the process of an ETI," since commensurability is not a claim that the the counterfactual interpretation of fitness decoupling makes.

      We agree with the reviewer on two essential points: (1) the decoupling of commensurably computed fitness is impossible when collectives have a finite size and (2) counterfactual fitness is not commensurable to particle or collective fitness.

      While we recognise that Michod and collaborators did clarify that fitness decoupling referred to counterfactual fitness (although, to us, this becomes clear from 2015 onward), we argue that the fitness transfer (or export-of-fitness) metaphor implies (by its wording) a commensurability of fitnesses that undermine this welcome clarification.

      Indeed, for a quantity to be transferred from one place—or component—to another, the source and destination must be commensurable. It is incorrect to talk about a transfer between counterfactual and actual quantities. A better choice of words to discuss the relative change of counterfactual and actual quantities would avoid the physical transfer metaphor and focus instead on the correlation of the two quantities. It must be noted that, despite the clarification of counterfactual fitness, the word “transfer” continues to be used in recent work (Davison & Michod, 2021).

      This may seem like nitpicking; however, there is a real advantage in being careful about this. We do agree that, under some assumptions, counterfactual fitness would decrease while whole–life cycle particle fitness (or collective fitness) increases. From there, one might ask: what needs explaining? If one assumes an export-of-fitness framework, the transfer of fitness explains why it cannot be otherwise. If fitness decreases on one side, it must increase on the other. In other words, the existence of a tradeoff is taken for granted based on the improper physical metaphor. While there are strong reasons to think that such tradeoffs exist, they should be assessed in their own right and on a case-by-case basis rather than being assumed to hold. Otherwise, there is no way to make sense of the tradeoff-breaking scenario described in Section 4.

      By the same token, the metaphor of “decoupling” often associated with the export-of-fitness model is misleading because it is used to describe a part of the evolutionary dynamics where counterfactual particle fitness and whole–life cycle particle fitness are strongly dependant on one another (even if their changes are anticorrelated), through the existence of the tradeoff.

      Nevertheless, we welcome the reviewer’s urge to clarify our position and how this relates to Michod and colleagues’ counterfactual fitness proposal.

      The model based on trade-offs and trade-off breaking is useful and likely to be of interest to theorists interested in ETIs. The observation that this model can reproduce the (counterfactual) fitness-decoupling observation is a useful in showing the how the two models relate. The result that counterfactual fitness decoupling is a consequence rather than a cause of the evolutionary dynamics is an important point (though perhaps obvious in retrospect, since counterfactuals, things to do not happen, can't be the causes of anything).

      The caution in Section 3.3 that "the same [counterfactual fitness decoupling] observation will be made in any situation in which short-term costs are compensated by long-term benefits, not solely during ETIs" is a good point, and it sets up the argument that trade-off breaking is a "genuine marker for an ETI". However, no convincing case is made that the same criticism, that the observed phenomenon is not unique to ETIs, is not equally true of trade-off breaking. Some nice examples of trade-off breaking in the context of ETIs are given, but these do not amount to an argument that trade-off breaking is only observed during ETIs. The life history literature includes examples of trade-off breaking that are not related to ETIs, so it is not clear that trade-off breaking is either a reliable indicator of ETIs or superior in this respect to counterfactual fitness decoupling.

      This point is in line with one of the points made by Reviewer #1. We have now clarified our position with respect to the generality of the tradeoff-breaking approach.

      In the Discussion, the "inconveniences" associated with the fitness decoupling are cogent limitations of this heuristic. The "impossibility of decoupling between commensurable measures of fitness" is an important result, but it is not new and should thus probably not be presented as "[o]ur first main finding". Shelton and Michod 2014 includes a mathematical proof in the appendix that, given the model assumptions, "consideration of the births and deaths of colonies gives us exactly the same bottom line (fitness) as consideration of the births and deaths of lone cells." The second main finding, that "fitness decoupling observations cannot be reliably used as a marker for ETIs," is valid, but as described above, a convincing case is not made that trade-off breaking can be reliably used in this manner, either. Trade-off breaking may, however, be a useful way to think about ETIs in the other ways that are suggested, for example as key events and as stepping stones to new hypotheses.

      We have now clarified our position.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this paper, Qin et al. investigated the molecular mechanism of phospholamban (PLN) linked dilated cardiomyopathy (DCM), using structural approaches combined with biophysical measurements. Structures of the catalytic domain of protein kinase A (PKAc) in complex with PLN peptides (both wild-type and the R9C and A11E DCM mutants) provide insights into the mechanism of substrate recruitment and how it is perturbed in the disease state. Qin et al. show convincingly that the mutant peptides all have lower affinity for PKA than the wild-type peptide, suggesting models in which heterozygous DCM mutations act via sequestering PKA and thereby preventing phosphorylation of the wild-type peptide may be incorrect.

      The authors highlight significant differences between their structure of the WT-PLN:PKAc complex, which has a 1:1 stoichiometry, and a previous structure of the complex (PDB 3O7L), which has 1 PLN bound between two PKAc monomers (a 1:2 complex). The authors posit that the stoichiometry observed in 3O7L is an artifact of the crystal lattice, and does not occur in solution, supporting this with analysis of the elution volumes of the peptide complexes on size exclusion chromatography compared to PKAc alone. They further suggest that the AMP-PNP ligand included in the 3O7L structure is not bound, based on analysis of Fo-Fc maps calculated from the deposited coordinates. Inspecting 3O7L I am not convinced of this last point - it seems more likely that a technical error was made in assigning or refining the B-factor of the ligand in 3O7L, because there is clearly density present in SA-omit maps for the nucleotide.

      Taking these results together, the authors suggest a mechanism for DCM, whereby mutations in PLN result in lower affinity for PKA, and consequently reduced phosphorylation. This seems plausible and well supported by the data, although in the ADP-Glo assay used here, the reductions in phosphorylation observed for some of the mutant peptides are rather modest. However, as the authors state, it is plausible that even relatively subtle changes in PLN phosphorylation could have substantial effects on Ca2+ homeostasis via increasing SERCA inhibition.

      We thank the reviewer for the appreciation of our work.

      Reviewer #2 (Public Review):

      Strengths:

      The authors presented new high-resolution 3D crystal structures of the PKA catalytic domain (PKAc) in complex with PLN WT or mutant peptides (residues 8-22) containing the DCM-associated PLN mutations (R9C or A11E). These are novel and important data given that the present structures are dramatically different from those reported previously. The authors made convincing argument that the 3D model reported previously may result from a crystallization artifact.

      By characterizing the interactions between the PKAc domain and PLN WT or DCM-associated mutant peptides using surface plasmon resonance (SPR) analysis, the authors convincingly showed that the DCM-associated PLN mutations at positions 9, 14, and 18 alter the conformation of the PLN peptide and reduce the binding affinity of the PLN peptide with PKAc. These data provide an explanation how some DCM-associated PLN mutations at these positions reduce the level of PKA-dependent phosphorylation of PLN.

      The authors also performed nuclear magnetic resonance (NMR) to determine the structural dynamics of PLN WT, R9C, P-Ser16, and P-Thr-17 peptides. These NMR structures combined with the SPR analysis also support their conclusion that PLN phosphorylation and DCM-associated PLN mutations have an impact on its conformation.

      We thank the reviewer for the comments.

      Weakness:

      The present study used PLN-derived peptides (aa 8-22). Although technically challenging, it is important to consider if the full-length WT or mutant PLN will behave the same as those observed with the peptides. This is especially crucial in light of the prior work showing substantially different structures using a different segment of PLN.

      We are fully aware of the potential risk to draw conclusion from an isolated peptide instead of the full-length PLN as a transmembrane protein. In the previous study, people showed that the PLN peptide could be used as a good model substrate that gets phosphorylated as efficiently as the full-length PLN protein (L. R. Masterson et al., Dynamics connect substrate recognition to catalysis in protein kinase A. Nat Chem Biol 6, 821-828 (2010); D. K. Ceholski, C. A. Trieber, C. F. Holmes, H. S. Young, Lethal, hereditary mutants of phospholamban elude phosphorylation by protein kinase A. The Journal of biological chemistry 287, 26596-26605 (2012)). These results together with our biochemistry results suggest the tail peptides are indeed active substrates of PKA. Due to the technical difficulty, we were not able to crystallize PKAc in complex with the full-length PLN. To explain the potential difference between the peptides and the full-length PLNs, we added more text in the discussion section “Additionally, the trend of the reduced phosphorylation by DCM mutations can be significantly affected by the oligomerization state of PLN. Ceholski et al. showed that R9C severely inhibits PKA phosphorylation in the context of full-length pentameric PLN, but has a much milder effect in the context of full-length monomeric PLN or an isolated tail peptide [41].”

      Although it is convincing that DCM-associated PLN mutations likely reduce the interaction between PKAc and PLN (assuming that the peptides behave the same as the full-length PLN with respect to interaction with PKA) and, as a result, the PKA dependent phosphorylation of the mutant PLN, it is unclear how this impaired interaction between PKA and PLN mutant could explain the effects of the DCM-associated PLN mutations on SERCA function (either reduced or enhanced PLN-dependent inhibition of SERCA, as proposed previously). In this regard, can the authors predict if the DCM-associated PLN R9C mutation reduces or increases SERCA inhibition based on the results of their present study?

      It is indeed controversial how PLN mutations cause DCM. Previous studies have shown that the DCM mutations in PLN might change this regulation in either a phosphorylation-dependent or phosphorylation-independent manner. Our results show that the mutations may act through both manners: 1) the mutations reduce the phosphorylation level of PLN, which has been shown to enhance the inhibition of SERCA and inhibit the uptake of Ca2+; 2) the mutations change the conformation of PLN before binding to PKA or SERCA, which could have additional consequences, such as altered assembly state of PLN, phosphorylation of PLN by CaMKII, or changes in interactions of PLN with the lipid membrane. This could impact in either directions, reducing or increasing SERCA inhibition, which is difficult to predict based on our data. We added the explanation in the discussion “While decreased PLN phosphorylation is likely an important contributor to the physiological dysfunction associated with familial DCM, disease-causing mutations in PLN may have additional consequences, such as altered assembly state of PLN, phosphorylation of PLN by CaMKII, or changes in interactions of PLN with the lipid membrane. The influence of such factors on SERCA inhibition are unclear. In principle, they might further increase inhibition of SERCA and act in conjunction with lower PKA-mediated phosphorylation to manifest the disease symptoms. Conversely, it is possible that these factors could decrease the inhibition of SERCA, partially compensating for the decreased phosphorylation level, and mitigating the symptoms.”

      It is also unclear how reduced PKA phosphorylation of mutant PLN could lead to DCM. PLN is unlikely to be significantly phosphorylated by PKA at rest (in other words, PLN is likely to be phosphorylated by PKA during stress, i.e. during the adrenergic fight-or-flight response). Therefore, it is puzzling how such reduced PKA-dependent phosphorylation of PLN would significantly affect the PLN function during the absence of flight-or-flight response.

      As explained above, we think that this regulation could be through both phosphorylation-dependent and phosphorylation-independent manner. Even only considering the phosphorylation-dependent manner, the DCM phenotype could be due to an accumulation of the Ca2+ imbalance in the cell over repeated cycles of cardiac muscle contraction upon chronic accumulation of the sporadic phosphorylation events. It is also possible that the mutations affect the CaMKII-dependent regulation of PLN, which leads to DCM.

      Given that the DCM-associated PLN mutations have significant effects on the conformation of PLN itself, at least in the form of short-peptides, it is possible that these mutations could affect the folding, oligomerization, trafficking, degradation, etc., in addition to PKA-dependent phosphorylation. The relevance and contribution of reduced PKA-dependent PLN phosphorylation to DCM remain unresolved.

      We agree with the reviewers that both phosphorylation-dependent and phosphorylation-independent manners could contribute to the DCM disease phenotype. It remains unresolved which factor is the major contributor. We have added a statement in the discussion (see point above).

      Reviewer #3 (Public Review):

      This manuscript describes an elegant study utilizing the crystal structures for the elucidation of the disease mechanism of familial dilated cardiomyopathy. It has been known for decades that the mutations in PLN are associated with DCM, but the underlying mechanism remains controversial. In my opinion, Prof Yuchi and co-authors did excellent job on revealing the high-resolution crystal structures of PKA-phospholamban complexes, representing both the native and diseased states. Combined with various of biophysical and biochemical methods, including SPR, ADP-glo, thermal melts, NMR, etc, the authors systematically investigated the correlations between the PLN conformation, the binding affinity, and the phosphorylation level. The mechanism of PKA phosphorylation on another related substrate, ALN, was also convincingly revealed. The results are very helpful for understanding the pathological mechanism of PLN-related DCM. More importantly, the atomic structures of PKA-phospholamban complexes lay a solid foundation for the structure-based rational design of therapeutic molecules that can reverse the effects of the DCM-causing mutations in the future, e.g. by stabilizing the interactions between PLN and PKA.

      We thank the reviewer for the appreciation of our work.

    1. Author response:

      Reviewer #2 (Public Review):

      This work by Castledine et al. addresses the important question of whether results from in vitro (laboratory-based) evolution studies may be useful for predicting evolution during phage therapy in a clinical setting. In order to explore this question, the authors cultured a set of bacterial isolates from a patient pre- and during phage therapy, as well as phages from several time points during therapy. They then experimentally evolved (in vitro) a mixture of the bacterial isolates from the patient in the absence of phage, or in the presence of phage using two different treatments (phage added once or added repeatedly). Overall, they observed similarities between the evolutionary outcomes (genomic and phenotypic) in vitro and in the patient. Resistance evolved rapidly in the patient and in vitro under phage selection, and similar genomic changes were observed in both environments. The approach of using bacterial isolates directly from the patient (as well as the phages used for therapy) in vitro is clever, and the observed similarities are compelling.

      We thank the reviewer for appreciating the novelty in our results and methodology.

      However, I think there are some limitations with the study that should be addressed in the text.

      In particular, (1) While the similarities in vitro and in the patient are quite interesting, there are some differences that were dismissed as being minor without justification. Calling the results "highly parallel" is a bit subjective - in vitro in the repeated phage treatment (which is suggested to be most similar to the clinical context), there did appear to be phage coevolution that was not observed in vivo. The tradeoffs/relationships between traits (as shown in Fig. 3) also differed to some extent.

      We agree this could have been more objectively phrased at the start of the discussion – this has been edited to reflect this. We have highlighted the differences between in vivo and in vitro treatments with respect to phage evolution. Moreover, we have also highlighted that the observed trade-offs had different underlying mechanisms which may not always result in parallel evolutionary changes between in vivo and in vitro environments.

      Additionally, for the genomic results only a subset of variants were plotted (those in genes of known function), but there were far more significant variants in genes of unknown function that were not included. It is difficult to assess whether the genomic findings are truly similar across environments if only a fraction of those results were presented in the manuscript.

      We chose to concentrate on genes of only known function so that we could better understand their potential significance, and also because the figures and analyses (Figures 4 and 5) would become extremely complex and large and uninterpretable with genes of unknown function included. This is especially true for Figure 5, which would have required us to show 284 rows if all genes would have been included. Ultimately, whichever way we do this exploratory analysis, it is going to be difficult to see if findings are truly similar across environments because we only have a single patient who had phage therapy.

      However, we have redone the analysis with all of the significant genetic changes (SNPs and indels from both known and unknown genes) included.

      Figure 5 has been recreated and is now included as "Figure 5 - Figure supplement 1". All of the statistical analysis on (a) the number of SNP/indels seen (b) genetic distance from ancestor and (c) alpha diversity give quantitatively similar results. That is, although all the estimates are generally much higher after including many more genetic variants, all of the significant results from both the overall model fit and post-hoc multiple comparisons remain the same. One interesting result that came out of looking at all the genetic changes was that for genetic variants occurring in a gene of known function, 56% (28 out of 50) were de novo mutations, whereas this value was only 42% (98 out of 234) for variants in genes of unknown function.

      We then looked at the proportion of genetic variants (both in known and unknown genes) found in vitro that were also found in vivo. For genes of known function, 62% of genetic variants were found in vivo (31 of 50) and this was comparable to the 65% of genetic variants in genes of unknown function (153 of 234). Of the 26 genes of known function with differences identified in the in vitro analysis, 16 (61%) were also found to have genetic changes in vivo. The equivalent metric for genes of unknown function was 86% (85 of 99). Similar to in vitro, variants occurring in a gene of known function were more likely to be de novo mutations (77%) compared to variants occurring in a gene of unknown function (46%).

      While these patterns and exploratory analyses are interesting, they have extremely limited statistical power and therefore do not alter the conclusions or results of the work presented. For these reasons, we have chosen not to include these results in the already long manuscript. We have added a line to say we have done it both way:

      “We performed all downstream statistical analyses on (a) only genetic variants in genes of known function and (b) all genetic variants.”

      And we also added a line at the beginning of the genomic analysis results section:

      “Results were not affected whether we included only genetic variants occurring in genes of known function or all genetic variants (Figure 5-Figure supplement 1). As we were interested in attributing potential functions to the variants identified, we only present the results for genetic variants occurring in genes of known function.”

      (2) Much of the text is framed around whether in vitro outcomes are predictive of those in vivo, but this study only included results from a single patient. Thus, it is impossible to know whether these findings are by chance or representative of a more general relationship between in vitro and in vivo evolution.

      We agree that having a single patient for our in vivo comparison limits the generalisability of our results. We have highlighted this in the revised manuscript. However, that our replicated in vitro experiments agreed broadly with our in vivo results and that of other studies (finding resistance-virulence trade-offs) suggests that at least in some circumstance in vitro dynamics are predictive of in vivo dynamics. Further studies are clearly needed (and hopefully will arise as a consequence of this work) to determine the generalisability of this finding and the circumstances where this parallelism might break down.

      (3) Although the evolutionary outcomes appear to be similar, the pathogen was successfully cleared from the patient but persisted throughout experimental evolution. Whether the pathogen is successfully eliminated or not is presumably the most important clinical outcome, and while this difference is not surprising, it is an important one to point out to the reader. Essentially, evolution was similar to some extent but the consequences of evolution for bacterial persistence in each environment were quite different.

      We have now highlighted this difference to the reader in the revised manuscript.

    1. Author Response:

      Reviewer #2 (Public Review):

      The paper by Meyer and collaborators first describes the in silico identification of a putative beta 1-4-N-acetylglucosaminyltransferase in the model Crenarchaeon Sulfolobus acidocaldarius. Beta 1-4-N-acetylglucosaminyltransferases are involved in the N-glycosylation pathway for the synthesis of glycoproteins. To detect this enzyme, the authors have used as baits the bacterial enzyme MurG and the eukaryotic enzymes Alg13 and Alg14. These enzymes have no detectable similarities and it was not possible to detect their Sulfolobus homologs by simple BLAST search. However, they detected several putative candidates with very low sequence identity (10-17%) using Delta-BLAST. They selected one of them which was retrieved using the Alg14 protein of Saccharomyces cerevisiae as bait.

      We thank the reviewer for this comment. We tested all identified candidates by deletion approaches. However, Saci1262 was the most promising one, whose function could not be determined by the deletion approach. To confirm the function we developed the described in vitro assay. We restructured the text to make this point more clear.

      They report that the overall topology of this candidate protein is identical to those of MurG and that the N and C terminal part of the protein could correspond to the eukaryotic proteins Alg14 and Alg13, respectively. They give the name Agl24 to this protein (why 24?) and then describe the enzyme as if its identification was already demonstrated. Closely related homologues of this protein are present in most Crenarchaeota, except in Thermoproteales, but absent in other archaea (Fig. S7).

      We agree that this was confusing; we have changed the naming of Saci1262 to Agl24 after the confirmation of its function in the text. Based on the proposal for the naming of N-glycosylation pathway components in Archaea (Eichler et al. ”A proposal for the naming of N-glycosylation pathway components in Archaea”. Glycobiology 23:620–621) the enzyme was named archaeal glycosylation enzyme 24 (Agl24).

      At this stage of the manuscript (lane 153), the identification of Agl24 is only based on the detection of several patches of conserved amino-acids between the different enzymes (Fig.1, B and D) and on a structural model that fits the structure of the homologous enzymes. I suggest that the authors slightly change this part of the manuscript by first describing how they modelled the structure of their candidate enzyme (this is not indicated in the M&M) and how they identified these conserved amino-acids.

      We have added the process used to obtain the structural model to the Methods section.

      They can conclude that the structural and sequence similarities (although low) suggest that they have selected the right candidate and name it (then follow up with their detailed comparison of the different enzymes). An interesting result is the identification of a motif (GGxGGH) conserved between Alg24, MurgG and Alg14. If it's really the first time that this motif is detected, thanks to the identification of Alg24, this is worth to be much more emphasized, especially since the authors demonstrate later on that the histidine is essential.

      We report in this manuscript for the first time the sequence conservation of this motive in bacterial, eukaryotic and archaeal homologs. However, the motif has been described as an extended motif G13GTGGHX2PXLAXAX2LX9G39 in MurG sequences (Crouvoisier et al., 2007). This is now mentioned in the text.

      Lane 37 of the abstract, this sentence is ambiguous, there is no strong similarity between Alg24 and eukaryotic Alg13/14. There is possibly strong structural similarity but it is not obvious that it is higher than with MurG from Figure 2A. Is it possible to quantify these similarities? It could be also wise to compare with the structure of a bacterial EpsF, since, from the phylogenetic analysis of the authors (see below) Alg24 could also exhibit more sequence similarities with EpsF than with MurG.

      We agree that this sentence in the abstract was very ambiguous and it has now been modified. The structural similarity can be quantified and it is represented by the Root Mean Square Deviation (RMSD) values of the C-alpha backbone atoms of our model compared to the published structures (now added to the Table S2) or the Global Model Quality Estimation (GMQE) score. The resulting GMQE score is expressed as a number between 0 and 1, reflecting the expected accuracy of a model built with that alignment and template. GMQE is coverage dependent, and models covering only half of the target sequence is unlikely to get a score above 0.5. The obtained values obtained for Agl24 was given in Table S2, and vary between 0.46-0.56. We also performed the structural prediction with AlphaFold which resulted in a very similar prediction as obtained by Swissmodel (RMSD: 3.80). These findings are now reported in the manuscript.

      No EpsE or EpsF GT structures are published, therefore we could not compare those to Saci1262.

      In my opinion, the next section should not be "Agl24 is essential" but the biochemical characterization of the enzyme which confirms the in silico prediction. The authors have produced and purified a recombinant Alg24 enzyme. They show that the enzyme is membrane bond and has an inverting beta-1,4-N-acetylglucosamine-transferase activity. The section "Alg24 is essential…" could be possibly removed and the result mentioned in the discussion since they are not conclusive concerning the biological role of Alg24 but confirm the previous observation of Zhang and colleagues made by transposon mutagenesis on the Alg24 homologue in Sulfolobus islandicus.

      We thank the reviewer for this important comment. We have added a short text to highlight, that the essential properties indicated that we might had candidate proteins to identify the N-glycosylation enzyme, as our previous studies have indicated that the N-glycosylation is essential in Sulfolobus.

      In comparison with the first part of this paper, the last part, dealing with evolutionary aspects raises many problems. The authors do not seem familiar with evolutionary concepts as indicated by the use of the term "lower eukaryotes" lanes 163, 175 that is not used by evolutionists since it is highly biased (a bit racist!), animals, including ourselves, being " higher" eukaryotes. This weakness is also apparent from the use of the expression "conserved from yeast to human" lane 57. Yeast and Human belong to the same eukaryotic subdivision (Opistokonts) so this conservation does not testify for the presence of an enzyme in the Last Eukaryotic Common Ancestor (LECA). Similarly, the authors often limit their description of "lower eukaryotes" to Leishmania/Trypanosoma or Dictyostelium/Entamoeba. A more extensive survey of the Alg13/14 topology in all eukaryotic major groups would be necessary to conclude for instance that the protein was a monomer or a dimer in LECA.

      While we agree that the term “lower” is not technically correct (and probably slightly speciesist), it’s (even historically) a common colloquialism in the literature. Nonetheless, all the term has been removed or changed to “deep-branching” or more appropriate terminology. The mentions of non-Opisthokont Eukaryotes were limited to these particular Kinetoplastids and Ameobozoa, mainly due to the tidbit of their genes being fused, unlike other Eukaryotes. However, it is evident from both earlier work (Lombard, 2016) and our updated analyses that the distribution of these genes among Eukaryotes is much wider. Specifically, line 57 was referring to (Lehle et al., 2006) and is not linked to our inference of an early presence of these genes in Eykaryotes in general.

      In the revised manuscript, we have searched for homologs in 1611 eukaryotic genomes, and demonstrated a very wide presence of the Alg14-13 proteins among them (Supplementary Data 1). Of these, we used a subset covering the known eukaryotic diversity and from these data there seem to be specific fusion events within the Eukaryotes. Similarly, any fused sequences in Archaea and Bacteria are sparse independent events and we can conclude that at the original acquisition (in LECA or otherwise), the genes were split.

      The authors have performed two single gene phylogenetic analyses of several groups of Agl24 homologues, including either the eukaryotic Alg13 or Alg14, which correspond to the N and C-terminal domains of Alg24. These phylogenies are valid to identify different subgroups of enzymes, but they are not reliable to provide real information about the evolutionary relationships between these different groups and within these groups (for instance, they did not recover the strong clade formed by Thaumarchaeota, Bathyarchaeota and Aigarchaeota which is present in all robust archaeal phylogenies, see for instance Adam et al., PMID: 28777382). This is not surprising considering the small size of the genes and the very low similarities between the different subgroups. Since the two phylogenies are rather congruent, in particular for the identification of the different subgroups, it could be interesting to perform a concatenation (removing Methanopyrus kandleri which is a fast evolving species and disturb the phylogeny with Alg14).

      We agree with the reviewer that the internal topology of the TACK for Alg14-13 does not reflect the reference tree. As pointed out, these sequences are rather short and not very well conserved, which is reflected in our revised phylogenies where almost none of the relationships among TACK clades are strongly supported. Furthermore, as is evident from the various Asgard and the Thermofilales clades, ancient lateral transfer events are not out of the question. However, recovering a strongly supported TACK clade points towards an ancestral presence of that particular Agl24/Alg14-13 homolog in the lineage.

      Concerning a concatenation, we have followed the reviewer’s suggestion and removed the M. kandleri sequence from Alg14. We disagree on the subject of congruence between the two phylogenies; especially for the relationship between Asgards and Eukaryotes, the two trees are incongruent with the respective topologies being strongly supported. Although it is not good practice to proceed with a concatenation in such cases, we did it, if for no other reason to test the effect of mashing the two phylogenetic signals together. The concatenated dataset phylogeny is shown in Supplementary Figures S13 and S14 (collapsed and uncollapsed, respectively). Putting the concatenated dataset together required some planning, since we had to match the subsampled taxa for Eukaryotes and Bacteria in Alg13 and Alg14 and fuse the sequences semi-manually, and in some cases invert the gene order in fused eukaryotic sequences (see Methods section).

      I personally identify 4 subgroups in the two phylogenies that I will discuss in some detail below.

      Group 1: A first group includes a wide variety of archaea belonging to different phyla. Importantly, Euryarchaeota and other archaea (including one sequence of Odinarchaeota) are well separated. This group possibly correspond to descendants of an ancestral enzyme that was present in the Last Archaeal Common Ancestor (LACA). If correct, this indicates the position of LACA in the two trees. Some Crenarchaeota are present in this part. Did they correspond to Thermoproteales or did some Crenarchaeota have both Alg24 and this form?

      We are now able to address this point with our new analysis (mainly using the revised phylogenies), since our updated local database for Archaea contains far more genomes, including all Asgards from (Liu et al., 2021). Moreover, we searched for Alg14-13 homologs against local databases of Eukaryotes and Bacteria (1611 and 25118 genomes respectively), instead of simply relying on the sequences from (Lombard, 2016). Finally, we added the MurG sequences from the respective datasets of (Lombard, 2016) for outgroup rooting of the Alg14 and Alg13 trees, plus we performed outgroup-free rooting with the Minimal Ancestor Deviation method (Tria et al., 2017). The two approaches yield the same root as noted in Figure 7 (that would be without the MurG clade for the MAD root).

      “Group 1”: In the revised rooted phylogenies, this group is sister to the clade containing all other homologs (excluding MurG). It can be divided into two subclades. One consists of mainly methanogenic Euryarchaeota interspersed with other Archaea, mainly DPANN. At least for Methanosarcinales and Methanococcales (and perhaps Methanomicrobiales), the distribution is wide enough and the clades strongly supported, so we can trace the Alg14-13 homolog to the base of each of these lineages but not necessarily to all Euryarchaeota. That is due to the homologs missing from other lineages and disagreements with archaeal reference phylogenies (e.g., Methanomicrobiales and Methanosarcinales should have been together and monophyletic, without the Methanococcales). Any further inferences are problematic, since we have made clear that we mainly stuck with the clades from (Lombard, 2016) and very divergent Alg14-13-like homologs exist throughout Archaea including, for example, in Bathyarchaeota, two in Methanobacteriales.

      The second subclade corresponds to the TACK superphylum with the addition of several Asgard branches therein. The latter correspond to multiple independent ancient lateral transfers from the TACK to different Asgard lineages. There exists a second divergent Thaumarchaeota clade, whose position is inconsistent between the Alg13/EpsF and Alg14/EpsE trees.

      We understand the reviewer’s inclination to label this clade as having emerged at the LACA, however, we believe that our rooted phylogenies suggests that the node corresponding to the LACA is deeper.

      Group 2: A second group corresponds to orthologues of Alg24 include Crenarchaeota (Sulfolobales, Desulfurococcales) and Bathyarchaeota,. Questions: How many Bathyarchaeota? Are they widespread in Bathyarchaeota? Since Bathyarchaeota are also present in group 1 (same questions), these group 2 Bathyarchaeota could correspond to MAG contamination with Crenarchaeota? Or LGT between Crenarchaeota and Bathyarchaeota?

      “Group 2”: The reviewer’s description of this clade is fairly accurate. In the updated phylogenies, it is still composed primarily of Crenarchaeota with one Baldrarchaeota and a strongly supported cluster of Bathyarchaeota sequences (14 in total, mainly from o__B26-1) that indicates a single transfer event (Supplementary Figures 11, 12, 14) from Crenarchaeota to Bahtyarchaeota. A number of the Bathyarchaeota genomes in “Group 2” also contain homologs in “Group 1”. The Thermofilales member B67-G16 also possesses one homolog in each of these groups.

      It is noteworthy that there was another very divergent clade composed of Bathyarchaeota (mainly members of g__PALSA-986 but other as well) in the preliminary phylogenies that were used for dataset cleanup/curation that was omitted from the final datasets. Some of sequences in that clade are also additional homologs to the ones included in our phylogenies. However, further examining the Agl24-like homologs in Bathyarchaeota is beyond the aims of this study.

      The enzymes of group 2 were probably not present in LACA, except of they have evolved more rapidly than the other bona fide archaeal enzymes of group 1 (drastic modification of their function?). More likely, they have been introduced at the base of the Sulfo/Desulfo clade from an unknown source (extinct lineage?)

      We mostly agree with that deduction. However, since most Crenarchaeota only contain a single Agl24-like homolog (unlike some Bathyarchaeota) and we know nothing about the function of “Group 1” enzymes, we would like to avoid further conjecture.

      Group 3: A third group corresponds to EpsF in Archaea and Bacteria Question: How widespread in Bacteria? Were they present in the Last Bacterial Common ancestor? They are sister group to the eukaryotic enzymes. Are they their orthologues? Could it be that the eukaryotic enzymes originated from bacterial EpsF via mitochondria?

      Group 4: A fourth group includes all Eukaryotes (monophyletic) and very few sequences of archaeal MAGs belonging to different phylums, a few Thorarchaeota and one Odinarchaeota, but also several Verstraetearchaeota, one Geothemarchaeota, and one Micrarchaeota (DPANN). The lane 39 in the abstract is thus misleading since the phylogenetic analysis revealed similar sequences not only in two phylums of Asgard but also in Verstraetearchaeota, Geothemarchaeota, and Micrarchaeota! Moreover, these similarities remain very low. This does not fit with the classical situation observed in universal tree of life in which archaeal and eukaryotic proteins always exhibit a high level of similarity.

      “Groups 3 & 4”: We would like to address these two groups together.

      Firstly, bacterial sequences in the revised phylogenies are no longer confined to the EpsE/F-like clade (“Group 3”) but a few sequences are found among the Archaea of “Group 4”. When subsampling the non-MurG clades of our preliminary bacterial trees (Supplementary Data 1) we saw that the EpsE/F-like and Alg14-13-like homologs are quite ancient in some lineages (e.g., Actinobacteria, Cyanobacteria, some Firmicutes subgroups) but when we put sequences from all three domains together, there seem to have been multiple interdomain transfers between Bacteria and Archaea. The archaeal “Group 3” sequences, with the exception of some methanogens and Thermococcales are mainly from DPANN; there even exists a distinct Diapherotrites branch (Supplementary Figures S11, S12). Even though we cannot pinpoint its exact origin, as it is found among archaeal clades from “Groups 2 and 4”, “Group 3” didn’t emerge at the LBCA and is of archaeal origin.

      Other than the addition of a few bacterial sequences, the reviewer’s description of “Group 4” seems accurate. Nevertheless, we are not particularly surprised by the extremely low sequence conservation between Eukaryotes and the various archaeal lineages, seeing how many divergent homologs we have found throughout this analysis. The abstract has been altered to remove the mention to “similar” sequences and all mentions to eukaryogenesis in both the abstract and manuscript have been emended based on the revised phylogenies.

      At the moment we have no way of testing whether Eukaryotes acquired Alg14-13 through mitochondria, in spite of the few bacterial sequences in “Group 4”. All our data point toward an archaeal origin (see below).

      The authors suggest a split (red arrow) at the origin of the Group 3 and 4. However, since the tree is unrooted, one cannot exclude a fusion at the origin of groups 1 and 2?

      More importantly, it is profoundly misleading to conclude from this analysis that eukaryotes emerge from Asgardarchaeota!!!! The position of the lonely archaeal sequences in group 4 suggests either problems of MAG reconstruction (contamination, recombination, mis annotation) or, more interestingly, independent LGT of proto Alg13/14 from proto-eukaryotes to these archaeal lineages. Moreover, the few sequences of Asgard present in group 4 only correspond to two Asgard phylums, while the number of Asgard phylums has skyrocketed in recent years. I did a rapid BLAST search and homologues of the Thorarchaeal sequences of group 4 are absent not only in Heimdall and Loki but also in Hela and Gerda. The authors could contact two Chinese groups who published recently preprint describing several additional Asgard phyla (Liu, Y et al. BioRxiv 2020, Xie R et al., BioRxiv 2021).

      We have to thank the reviewer for this comment and sharing their thoughts with us, that ultimately spurred us to update our local databases and increase the number of Asgard genomes, without which the additional Asgard sequences in “Groups 1 and 2” wouldn’t have been found. On the issue of Archaea in “Group 4”, the monophyletic eukaryotic clade is found within a primarily archaeal clade, surrounded by several other archaeal homologs throughout the phylogeny, and not at root-level. While we did not go in-depth to check all the included Asgard MAGs for problems, the Alg14-13-like homologs are extremely widespread in the various archaeal lineages, covering essentially all Thorarchaeota, Odinarchaeota, and Verstraetearchaeota, which in our opinion would be a far-fetched series of MAG problems. Despite the incongruence on what the sister clade of Eukaryotes is (Odinarchaeota or Thorarchaeota), on the basis of the data we cannot infer that a transfer from Eukaryotes (proto- or otherwise) to Archaea occurred.

      Obviously, the authors have chosen to fit their paper into the mold of the now popular two domains (2D) scenario in which Eukaryotes emerged from Asgardarchaeota. There is presently a debate between proponents of the 2D and 3D (classical Woese) universal tree of life. The authors are obviously strong proponents of the 2D since they don't mention any of the papers that have recently supported the 3D scenario (Da Cunha et al., 2017, 2018). From lane 457 to the end of the paper, all the discussion turned around the 2D model and the Asgard origin of eukaryotes! They possibly consider that the debate has been closed by the paper of Williams and colleagues (Nat Ecol Evol, 2020) who criticized the work of Da Cunha and colleagues. They should notice that Williams et al still obtained a 3D tree with RNA polymerase (supplementary figure 1) except when they use amino-acid recoding a method that reduce the phylogenetic signal (Hernandez and Ryan, BioRxiv, 2020). A 3D tree was again obtained with the RNA polymerase (including those of giant viruses and the three eukaryotic RNA polymerases) by Guglielmini et al., PNAS, 2019. The debate should thus be considered as still open.

      In any case, the phylogenies presented by the authors are not universal tree of life and cannot be used in the 2D versus 3D debate. A proponent of the 3D scenario would said that the Odin sequence present in group 1 corresponds to the real position of Asgardarchaeota, in agreement with the results of Da Cunha et al (2017) who found that Asgardarchaeota are not sister group to eukaryotes but branch deep within archaea.

      We would like to abstain from criticizing any previously published phylogenies on 2D or 3D trees of life. The main point of our paper was to highlight the similarities in N-glycosylation between our crenarchaeal Agl24 and Eukaryotes.

      While we did point out a probable acquisition during eukaryogenesis, as would be implied by a 2D scenario, at no point in the original submission did we try to specifically address the 2D/3D debate, or go out of our way to discredit a 3D tree. That said, we are pointing out in the revised manuscript that the new phylogenies cannot directly support a 3D model. The Eukaryote branch covers their known diversity (barring taxa where we didn’t find any Alg14-13 homologs; Supplementary Data 1). With the exception of a Perkinsela sequence in Alg13, Eukaryotes are monophyletic and seem to emerge from within Archaea in all phylogenies. Moreover, in the rooted trees, they’re nowhere near forming a sister clade to archaeal homologs (all clades or even a subset of them). We hope to make clear in this version of the manuscript that the Asgard group forming the sister clade to Eukaryotes is inconsistent and not Heimdallarchaeota, as would be expected from most published 2D trees. We did not even find any Alg14-13 homologs in Heimdallarchaeota. Furthermore, the internal branching of Eukaryotes (Supplementary Figures S11, S12) does not even recover their major branches. It is unknown whether this is due to transfers or just poor signal, since sequence conservation in eykaryotic Alg14-13 is very low. Thus, even though we retain that acquisition through eukaryogenesis in a 2D tree is a possibility, our results are also compatible with a lateral transfer from Archaea very early in the history of Eukaryotes which is 2D/3D-agnostic.

      Since the enzymes studied here are apparently absent in most Asgards, it is profoundly misleading to label Asgard the group close to eukaryote in the Cover art and to have a highlight claiming that eukaryotic Alg13/14 are closely related to the Asgard homologs, suggesting their acquisition during eukaryogenesis, since the number of these Asgard homologues are very limited.

      Indeed, the cover art could have been misleading; as such, we have removed it. See other parts of this response concerning how we handled the sampling of Asgard genomes and the new results.

      It is also profoundly misleading to conclude in the title that their result "strengthens the hypothesis of an archaeal origin of the eukaryal N-glycosylation". One can only said that archaeal and eukaryotic N-glycosylation pathways are evolutionarily related.

      We disagree with this comment, even if they did not originate during eukaryogenesis, the eukaryotic homologs are clearly archaeal in origin.

      However, in the case of Alg13/Alg14, it seems that these eukaryotic proteins are more closely related to bacterial enzymes (EpsF) than to their archaeal homologues (group 1 and 2)! We would like to know more about the phylogeny and distribution of EpsF in Bacteria and Archaea. According to the authors, they are only present in Euryarchaeota but widespread in Bacteria, suggesting a LGT from Bacteria to Archaea. Was this enzyme present in the Last bacterial common ancestor? In summary, the authors conclusions and formulations on the evolutionary part of their paper, especially in the title, the summary, the discussion and the Cover Art are misleading and should be corrected.

      We have now address the issue of EpsEF and their relationship with Alg14-13. To recap the contents of the manuscript and this response, EpsEF and Alg14/13-related sequences are very widespread in Bacteria but do not date to the LBCA. The clade EpsEF+Alg14-13 (“Groups 3& 4”) clade is sister to Agl24 and archaeal in origin although there exist multiple interdomain transfers between Archaea and Bacteria. EpsEF sequences are found in Euryarchaeota but also in DPANN. There exists a single gene split event at the base of the EpsEF-Alg14/13 clade followed by multiple independent gene fusion events in all three domains.

      Reviewer #3 (Public Review):

      Meyer, Benjamin et al. identified the enzyme involved in the transfer of the second GlcNAC residue on the nascent oligosaccharide in protein N-glycosylation of the thermophilic Crenarchaeon Sulfolobus acidocaldarius. Although N-glycosylation is well-known in Euryarchaeota, the enzymes involved in this process, their substrates, and the mechanisms followed to produce the mature glycan are still elusive in Crenarchaeota, a phylum belonging to TACK archaeal superphylum, which contains also Thaumarchaeota, Aigarchaeota, and Korarchaeota. The authors, by screening the data banks with the sequences of the bacterial MurG and yeast ALg13/Alg14, which catalyze the transfer of GlcNAC in N-glycosylation, identified a gene, named saci1262 and alg24 showing very low identity. The authors characterized in deep the product of this gene with a very complete approach. Firstly, the authors could demonstrate by molecular modelling that Alg24 enzyme shows a 3D structure similar to those of MurG and Alg13/Alg14, and catalytic residues similar to the latter enzyme. The functional characterization was very complete, showing that alg24 is essential in vivo, and that the recombinant Alg24 specifically uses UDP-GlcNAC and lipid-GlcNAc as donor and acceptors substrates, respectively. In addition, the enzyme was thermophilic, did not require metals for catalysis, and followed an 'inverting' reaction mechanism in which the anomeric configuration of the product is the opposite to that of the substrate. Experiments of site-directed mutagenesis demonstrated that His14 is essential for catalysis as predicted by sequence multialignments and inspection of the 3D models, while the role of Glu114, also invariant, remained obscure. Then, the phylogenetic analysis of Alg13/Alg14 on TACK archaeal superphylum, showed that Alg24 are widespread among Archaea, suggesting that N-glycosylation in Eukaryotes was inherited by an archaeal ancient ancestor. This observation fostered the hypothesis that the first eukaryotic cell originated from Asgard superphylum.

      Strengths:

      The main question of the work, which is the enzyme involved in the first crucial step of protein glycosylation in Archaea? is of general interest in glycobiology. Although this process, in the past believed a peculiarity of Eukaryotes, has been well studied in Euryarchaeota, it is almost unknown in TACK superphylum that, being considered the closest to the Last Eukaryotic Common Ancestor, is a very interesting matter of study. The work shows several strengths:

      1) The authors unequivocally demonstrated that Alg24 is the enzyme catalyzing the transfer of the second GlcNAc unit on the nascent oligosaccharide, thereby completing the puzzle of the first step of N-glycosylation, for which only AlgH enzyme was known so far.

      2) The approach used to identify Alg24, the choice of the model system, the characterization of the enzyme are absolutely excellent and set a new standard to study N-glycosylation in Archaea.

      3) The identification and characterization of a novel Glycosyl Transferase is of great importance in glycobiology. GTs are elusive enzyme, difficult to purify, due to their instability and association to membranes, and to characterize because of their extreme specificity for donor and acceptor substrates. In addition, GT enzymatic assays use very expensive substrates and very laborious procedures. For this reason, characterized GT are by far less common than, for instance, glycoside hydrolases. This study is a milestone for glycobiology. GTs from thermophilic microorganisms could be interesting subject studies in general. Thermostable GT could be more easy to purify and characterize if compared to their mesophilic counterparts.

      4) The knock-out in vivo of alg24 gene, was possible because S. acidocaldarius model system is one of the few Crenarchaea for which reliable molecular genetics tools can be used. These experiments, confirmed that N-glycosylation is essential in Crenarchaeota as previously shown for AlgH and AlgB.

      Weaknesses:

      There are not many weaknesses in this work.

      1) How the characterization of Alg24 is directly connected to the evidence that N-glycosylation in Eukaryotes was inherited from an ancestral archaeal cell should be better explained.

      We thank the reviewer for this suggestion. We believe that in the revised manuscript and phylogenies we clarify how the Alg14-13 eukaryotic homologs are of archaeal origin (due to being nested within an otherwise archaeal clade) and the possible scenarios for their origin (eukaryogenesis or transfer).

      2) The novelty of the presence of Alg13/14 and Alg24 homologues in TACK superphylum shown in this paper should be commented in comparison with the available literature.

      We have added a new paragraph and references in the discussion part to highlight the novelty of Agl24, adding also the information that Agl24 will be assigned to a new GT family (see below).

      3) The Cover Art should be revised. The 'take home message' is not clear and the phylogenetic interdependencies of the different superphyla are a bit confusing.

      We have removed the cover art, as it might create more confusion than being helpful.

    1. Author Response:

      Reviewer #1 (Public Review):

      • Line 141: It would be beneficial to better understand how the sequenced sample of the population corresponds to the PCR confirmed sample of the population, in order to understand possible selection biases in the sequence data. Could you elaborate on how the composition of sequence PCR confirmed cases matches the composition of PCR confirmed cases, by the demographic characteristics listed in Table 1.

      Early in the pandemic (March-April), we tried to sequence every SARS-CoV-2 positive case diagnosed in our KWTRP laboratory from Coastal Kenya. However, with the sharp increase in the number of identified cases from the month of May 2020 onwards, and a limited in-house sequencing capacity, we changed strategy to sequence only a sub-sample of the identified positives. The criteria for sub-sampling included having a cycle threshold of < 30.0, spatial representation (at county level) and temporal representation (at month level). The consequent number and proportion of samples sequenced across the study period months and across the counties is summarized in Fig. 2C-E with the sample flow provided in Figure 2-figure supplement 1.

      In the revised manuscript we have provided a comparison of the demographic characteristics of the sequenced cases versus non-sequenced cases (shown as Table 2). The participants providing the sequenced and non-sequenced positive samples had a similar gender distribution and similar probabilities of being from either from Wave one or Wave two. However, the distribution of sequenced vs non-sequenced cases differed significantly in age distribution, nationality and travel history. Specifically in the sequenced sample, there were more participants in 30–39 years age bracket compared to the non-sequenced samples, a disproportionately representation of non-Kenyan nationals and persons with a recent international travel history in the sequenced sample.

      • Line 283: I am particularly interested in the observed inter county flows, but it is hard to interpret the numbers. Considering population sizes in each county, what are the phylogenetically observed import rates per 100,000? What are the rate ratios? Based on the observed data, is there any evidence that imports into coastal Kenya occurred statistically significantly through Mombasa?

      We thank the reviewer for these comments.

      In the revised manuscript we have added two new tables (1 & 4) which detail the population size in each of the six Coastal Kenya counties, population density and estimated import/export rates (per 100,000) for the counties.

      The alluvial plots are descriptive regarding genome flows. The underlying data on the pattern of virus movement is inferred using the ancestral state reconstruction which an established phylogenetic approach that has been applied elsewhere to infer SARS-CoV-2 local and global movement (Wilkinson et al, Science 2021, Tegally et al, Nature, 2021).

      The results we obtained from ancestral state reconstruction of Mombasa being a major gateway for variants entering the coastal region of Kenya is consistent with (a) the county showing the highest number circulating of lineages (n=28) compared to the other five remaining counties of Coastal Kenya, (b) approximately half (n=21, 49%) of the detected lineages in coastal Kenya had their first case identified in Mombasa and (c) Mombasa had an early wave of infections compared to the other Coastal counties.

      We are not aware of an approach to consider statistical significance on these plots. The graphical display is based on the observed number events, and we would argue this is more appropriate than presenting absolute rates which would be susceptible to sampling bias.

      Is it possible to account for potential bias in sequence sampling in these calculations, perhaps as done in Bezemer et al AIDS 2021? It should be possible to adjust for the proportion of sequenced individuals in PCR confirmed individuals, and it might also be possible to back calculate infected cases from cumulative reported deaths and to adjust for the proportion of sequenced individuals in infected individuals?

      The reviewer suggests helpful methods to examine sampling bias, but we found this beyond scope here. Our method was based on ancestral location state reconstruction of the dated phylogeny. The approach has been used elsewhere to answer similar questions (Wilkinson et al, Science 2021, Tegally et al, Nature, 2021). The Bezemer paper uses maximum parsimony ancestral state reconstruction algorithm implemented in phyloscanner, and the Bayesian method applied to impute incomplete sampling is applicable to chains of transmission which we have not tried to reconstruct in our analysis.

      Considering my earlier recommendation to document sequence sampling representativeness in Table 1, if Mombasa is found to be oversampled relative to infections, then it might also be helpful to perform sensitivity analyses in which sequences from over-represented locations are down-sampled. Another option might be to consider the approaches considered in de Maio PLOS Comp Bio 2015, or Lemey Nat Comms 2020. Thank you for investigating potential caveats and substantiating your findings in more detail.

      In the revised manuscript we have clarified that our sequenced sample was proportional the number of positive cases reported in the respective Coastal Kenya counties (see-Fig.2E and Table 1).

      The De Maio method uses BASTA (BAyesian STructured coalescent Approximation) into BEAST for purposes of phylogeographic analysis to compare ability to discriminate a zoonotic reservoir vs the implausible alternative cryptic human transmission. Analyses developed from these methods would be valid and interesting to apply to our dataset but would be a major new analysis and beyond the scope of the present paper. We have therefore taken the approach of: a) more clearly acknowledging sampling bias (see below) and b) undertaking sensitivity analyses (Supplementary File 5, see below). Using the larger global background sequence sets selected in a different way (more geographically balanced relative to the first round that was random), we still find that most of the virus introductions into coastal Kenya occurred via Mombasa consistent with our previous analysis.

      The results are consistent with the case numbers in that (i) Mombasa experienced an earlier peak during wave one relative to other counties and (ii) had in total more cases than all the other five counties, and (iii) was commonly the first county of detection for many of the identified lineages in the region. However relative to its population, the border county of Taita Taveta had a higher import rate (13.5. per 100,000 people) compared to that of Mombasa (11.6 per 100,000 people), Table 4

      Observations from our sensitivity analyses (Supplementary File 5) are included in the revised manuscript. We found that the absolute number of estimated viral imports/exports and intercounty transmission events fluctuated depending on the number of Coastal Kenya sequences and size of global comparison dataset but with a clear pattern of (a) counted events increasing with sample size (b) with Mombasa County consistently leading in the number of events; imports or exports.

      • Line 292: The results are of course subject to differences in sequencing rates in each of the countries listed, and differences in reporting of these data.

      This is a valid concern; to mitigate the bias that arises with these differences, unlike in the previous comparison dataset where we randomly selected a specified number of samples per month for each continent, in the revised analysis we have done the selection at country level. We limited the comparison data to maximum of 30 genomes per country per month per year. In this way, countries with high sequencing rates do not become overrepresented in our comparison dataset.

      Some of these biases could be elicited through comparison to international travel data. For example, are the US and England also the top two countries from which most travellers arrive into Kenya? If such additional analyses are out of scope, it seems warranted to either strongly point to the substantial limitations of this analysis, or remove it altogether.

      We concur with the reviewer on the potential bias that could exist in conclusions that arise from inferring sources of importations based on genomic data alone, available from only a few countries. However, vital quality and curated international travel data into Kenya during the study period was not available to us at the time of this analysis. We have therefore agreed to remove the previous analysis on potential origins and destinations of observed Kenya lineages from the revised manuscript.

      What is perhaps striking is that Tanzania is entirely missing from this list, given extensive spread there. Another analysis that could be useful is a comparison of country specific lineage compositions, which might bypass some of the difficulties associated with substantial differences in sequence sampling/reporting rates.

      SARS-CoV-2 genomic data from Tanzania has not been publicly shared to date, and hence is not included. And as indicated above, we have removed the analysis that was trying to infer sources of SARS-CoV-2 importations into Kenya.

      To hypothesize on the potential lineages circulating in Tanzania, we have added a sentence detailing that 5 Pango lineages were identified among the 34 Tanzanian nationals who provided samples that were sequenced: B.1 (n=10), B.1.1 (n=10), B.1.351 (n=8), A (n=5) and A.23.1 (n=1)

      • Line 536: it seems problematic that the data used in the import/export analysis did not contain all available African sequences. Can these be included in the corresponding analysis please.

      In the revised manuscript we have included all accessible, good quality and contemporaneous Africa genomes in the revised manuscript (n=21,150). However due to the huge computational processing power need to process the phylogenetics for such large sequence data sets, we split the analysis into two parts, each with approximately 10,000 genomes (see Figure 3-figure supplement 1).

      Notably with the increased sample size (including the analysis of 390 more genomes from coastal Kenya), we detected far more imports of SARS-CoV-2 into Coastal Kenya compared to our previous analysis (n=280 vs n=69) but only a modest change in exports (n=95 vs n=105) and inter-county virus movement events (239 vs 190).

      Reviewer #2 (Public Review):

      Agoti et al. analyzed SARS-CoV-2 samples collected from infected patients in coastal Kenya, collected between March 2020 and February 2021. This period spans the first two waves of COVID-19 in Kenya, and the authors aimed to understand the lineages circulating throughout the region, in comparison to the virus circulating elsewhere in Kenya and in the world. The manuscript is clearly written, and the figures and results are thorough and well described throughout. These data add to our understanding of COVID-19 in Kenya and in East Africa, and the discussion of how different lineages spread in Kenya (single clusters versus dispersed over several regions) is both interesting and potentially useful for informing public health measures.

      The analyses are well done and excellently presented, but this paper is significantly lacking in a discussion of how sampling bias may affect the stated conclusions. Additionally, the paper focuses almost exclusively on genomic data and fails to closely examine epidemiological factors that may better contextualize the results presented.

      We thank the reviewer for bringing this to our attention, we have added the paragraph below to the revised manuscript.

      “Sampling bias is a potential limitation of this study arising from the fact that (a) demographic characteristics (age distribution, travel history and nationality) of the sequenced versus non-sequenced sub-sample differed significantly, (b) <10% of confirmed SARS-CoV-2 infections in Coastal Kenya were sequenced, prioritizing samples with a Ct value of <30.0 (Table 1); (c) the Ministry of Health case identification protocols were repeatedly altered as the pandemic progressed (Githinji et al., 2021) and (d) sampling intensity across the six Coastal counties differed, probably in part due to varied accessibility of our testing center that is located in Kilifi County (Figure 1A and Table 1). This may have skewed the observed lineage and phylogenetic patterns. To better contextualize the genomic analysis results, close examination of the case metadata is important, but unfortunately there was a lot of the metadata was missing (e.g., travel history, nationality, Table 2) which made it hard to integrate genomic and epidemiological data in an analysis. Although all analyzed genomes had > 80% coverage, very few were complete or near complete (>97.5%, n=344) due to amplicon drop-off or low sample quality and this may have reduced the overall phylogenetic signal.”

      Specifically:

      1) The authors do not discuss the potential effects of sampling on their import/export analyses. For example, they find that the USA and England are in the top six country sources of SARS-CoV-2 importation into coastal Kenya, as well as in the top six country destinations of viral export from the region. These two countries have generated huge numbers of sequences compared to the rest of the world, which may clearly bias these findings. While the authors do evaluate the sensitivity of their analyses by repeating them with different global subsamples, it is unclear if these subsamples corrected for large discrepancies in available data from different parts of the world.

      We concur and appreciate that sampling bias is indeed a common limitation in the type of analysis we have undertaken given the variation in data collection across geographies. Some of the approaches we took to correct for this have been highlighted in our responses to reviewer #1.

      In the revised manuscript, we have undertaken a reanalysis with a larger and more representative dataset at all scales of observation (Figure 3-figure supplement 1). Specifically, for the global dataset, we have revised our sub-sampling script to pick up the comparison dataset uniformly across months and countries for non-African countries. All the available African genomes have been included in our analysis including 605 collected in Kenya outside the coastal regional.

      Similarly, the authors find that new variant introductions were mainly through Mombasa city, but most of the Kenyan sequences were from this region, so it is perhaps unsurprising that more lineages were found there. The authors should repeat their analyses with a more representative global subsample, or at the very least discuss these caveats in the discussion and discuss what other evidence there may be to support their findings.

      Our sequencing rate by county is approximately proportional to the total number of cases seen in the county (Table 1 and Figure 2E). For Coastal Kenya, the revised manuscript included 389 additional genomes from coastal Kenya that became available while the manuscript was under review.

      Thus, in the revised manuscript, we have addressed the valid sampling bias concerns of the reviewers and editor by: (i) increasing the number of analyzed genomes in our dataset for previously under-represented periods and regions, (ii) including contemporaneous Kenyan genomes from outside the coastal counties in our import/export analysis, (iii) including all available Africa genomes into the analysis and selecting a balanced global sub-sample for inclusion into the analysis. In addition, were have also provided a paragraph in the discussion section highlighting sampling bias as a caveat to interpretation of the findings of the current study:

      “The accuracy of the inferred patterns of virus importations to and exportations from coastal Kenya are in part dependent on both the representativeness of our sequenced samples for Coastal Kenya and the comprehensiveness of the comparison data from outside Coastal Kenya. Our sequenced sample was proportional the number of positive cases reported in the respective Coastal Kenya counties (Figure 2E and Table 1). Also, we carefully selected comparison data to optimize chances of observing introductions occurring into the coastal region (e.g. by using all Africa data). But still there remained some important gaps e.g. non-coastal Kenya genomic data was limited (n=605). Despite this, we think the results from ancestral state reconstruction indicating that Mombasa is a major gateway for variants entering coastal Kenya is consistent with (a) the county showing the highest number lineages circulating (n=28) during the study period compared to the other five remaining Coastal counties Kenya, (b) approximately half (n=21, 49%) of the detected lineages in coastal Kenya had their first case identified in Mombasa and (c) Mombasa had an early wave of infections compared to the other Coastal counties and (d) is the most well connected county in the region to the rest of the world (large international seaport and airport and major railway terminus and several bus terminus).”

      2) Restriction measures enforced by the Kenyan government are briefly introduced at the very beginning of the manuscript and then mentioned at the very end as a possible explanation for observed transmission patterns. However, there is very limited discussion of the potential effect of restriction measures throughout, and no formal analyses are presented using this kind of epidemiological information. Adding formal analyses to back up the hypothesis that relaxation of interventions may have driven the second wave of infections would make this paper much stronger and potentially more interesting.

      In the revised manuscript, we have detailed the restriction measures the government of Kenya put in place in the introduction, methods, and results sections and discussed where appropriate on how we think they impacted the observed transmission patterns. We have added Supplementary Table 1 that provides the dates the various measures took effect or were relaxed.

      In a separate piece of work (Brand et al, 2021 published in Science journal, 10.1126/science.abk0414), we investigated the potential drivers of the first three waves of infection observed in Kenya and we have appropriately referenced this in the revised manuscript.

      We feel that additional analyses on the impact of the restriction measures on SARS-CoV-2 epidemiology and the lineage patterns observed are beyond the scope of this work whose focus was primarily genomic epidemiology.

      3) Generally, the text of the manuscript focused on waves of SARS-CoV-2 transmission, while the analyses presented data aggregated by month. A clearer connection between month and wave (particularly visually, on the figures themselves) would aid in interpretation of the data presented.

      This is a valid concern and a good suggestion. In the revised manuscript, for all temporal plots, we have added a line to demarcate when we switched from wave one to wave two period. Similarly, for several analyses, we have provided aggregations by wave period rather than by month.

      4) One of the strengths of this manuscript is the depth to which the authors discuss the detection of specific lineages in coastal Kenya. However, there is limited discussion of these results in the context of when various lineages appeared or disappeared globally, though these details are presented in a table. Discussing the appearance of the various lineages (was it surprising to see a particular lineage at a certain time or in a certain place?) would also improve this manuscript.

      In the revised manuscript, we have compared the patterns of lineage detection locally compared to all Kenya and to all continents in the newly added Figure 3. We have also discussed this aspect for the most frequent 4 lineages in both Wave one and Wave two.

    1. Author Response:

      Reviewer #1:

      Hauser et al, analyze two large datasets of GPCR-G protein interactions/couplings ("Inoue" and "Bouvier"), comparing and combining them with the widely-used literature-based Guide to Pharmacology (GtP) database. As the Inoue and Bouvier datasets were based on different experimental setups, this enables the identification of which couplings are supported by more than one method. The authors also establish a normalization protocol that enables to move from qualitative to quantitative comparisons and identify couplings that might be either below are above a rigid threshold. Overall, the paper describes a new resource and the methodologies used to build this resource. The resulting coupling map is available through the GPCRdb website, a widely used resource in the field.

      The authors have thus improved the ability of researchers to assess prior results and compare them to their own new data. This resource clearly and significantly upgrades options currently available and will likely be of interest and prove quite useful to scientists both in academia and in industry.

      We thank the reviewer for so nicely describing the study and its prospective application.

      Weaknesses include:

      • The data is described mostly by broad numbers, such as the number of receptors or coupling in a subset, or percentages. While this is helpful to understand the data, this reviewer found it hard to follow the mountain of numbers. A suggestion would be to add a section where the authors pick selected examples of particular experimental data and show how their combine database can resolve previously unanswered (or wrongly answered) questions of GPCR/G protein coupling.

      We have removed numbers in several places throughout Results where we had included multiple measures e.g., absolute numbers and percentages. Furthermore, where an overall number has been broken down into distributions, e.g., across different G proteins of families thereof, we moved other numbers to parentheses.

      The different sections of Results that answer questions of GPCR-G protein coupling have now been presented more clearly by updating their headings and grouping them all in a subsection of part of Results called “Research Advances – Insights on GPCR-G protein selectivity”. These sections are all based on our “combined database”/coupling map. In each such section, we start at the overall level – covering all GPCRs and/or G proteins – but then give selected examples thereof that are weaved into and exemplifies the text. This approach has also been used in the new Results section “Differential tissue expression gives G proteins in the same family large spatial selectivity”, which gives selected examples of G proteins with specific tissue expression profiles.

      Given that the paper has already exceeded the maximum of 5,000 words by quite a bit, we think that this approach of weaving selected examples into each selectivity insight section is the most appropriate, and that it brings most clarity. Furthermore, we hope that readers will be inspired to use our coupling map to generate additional questions for future experiments.

      • The paper does not reveal new biological findings. For example, while some emphasis is placed on new data on G15, it would be helpful to take the extra step and use this to suggest new biological insights.

      eLife’s author guidelines (https://reviewer.elifesciences.org/author-guide/types) state that “Tools and Resources articles do not have to report major new biological insights or mechanisms, but it must be clear that they will enable such advances to take place, for example, through exploratory or proof-of-concept experiments.” In case this manuscript is published as a Tools and Resources paper, it may therefore be sufficient to provide the foundation for future studies to reveal new biological findings.

      Nevertheless, the coupling map led to biological findings relating to patterns and mechanisms of GPCR-G protein selectivity that were not described in the original studies. I.e., while this study did not generate new data, it arrived at new insights based on published data. This seems to be in line with eLife’s publication format “Research Advances” (https://reviewer.elifesciences.org/author-guide/types), and the Analysis format of several other journals. Some insights described herein have not been presented before while others have been updated in scope and precision. Furthermore, we have added a new section of Results with insights on G protein expression profiles and co-expression.

      We have clarified this by updating the headings of the sections that present these insights, and grouped them under a common subheading of Results termed “Research Advances – Insights on GPCR-G protein selectivity”. However, in case we have overlooked very recent studies describing some of the same biological insights, we would please like to ask for their references and would be more than willing to revise the manuscript again to incorporate them. Furthermore, if the Reviewer is missing a particular analysis that is critical to understand GPCR-G protein coupling, please let us know.

      • The authors cautiously label couplings supported by only one dataset as "unsupported". It would seem more helpful to grade couplings by a reliability scale, providing users with a wider set of data. Perhaps only couplings that are directly conflicted by negative data should be labeled as unsupported?

      We understand that the term “unsupported” has been used in a confusing way. We have now replaced this term with “unique” and explained all terms in Table 1 of the revised manuscript.

      To address the need for a means to grade or filter couplings by reliability, we have added the following paragraph to the manuscript:

      “To enable any researcher to use the coupling map, we have availed a “G protein couplings” browser (https://gproteindb.org/signprot/couplings) in GproteinDb (2). By default, this browser only shows “supported” couplings with evidence from two datasets, but there is an option (first blue button) to changes the level of support to only one (for most complete coverage of GPCRs) or to three (for the highest confidence) sources. We propose a standardized terminology to describe couplings based on their level of experimental support from independent groups (Table 1). The criterion of supporting independent data, and the terms “proposed” and “supported”, are already used by the Nomenclature Committee of the International Union of Basic and Clinical Pharmacology (NC-IUPHAR) for GPCR deorphanization. Furthermore, the online coupling browser allows any researcher to use only a subset of datasets, or to apply filters to the Log(Emax/EC50), Emax, and EC50 values. Finally, users can filter datapoints based on a statistical reliability score in the form of the number of SDs from basal response."

      Furthermore, we have added references to the online G protein coupling browser in the:

      (1) Introduction ending: “On this basis, we develop a unified map of GPCR-G protein couplings that can be filtered or intersected in GproteinDb …”, (2) Fig. 2 legend ending: “Note: Researchers wishing to use this coupling map, optionally after applying own reliability criteria or cut-offs, can do so for any set of couplings in GproteinDb (1).” (3) Fig. S2 ending: “Unique couplings are hidden by default in the online G protein couplings browser in GproteinDb, as they await the independent support by a second group.”

      To many scientists the most reliable option is to involve NC-IUPHAR. Gloriam is a corresponding member of NC-IUPHAR, which has mentioned the possibility of involving its many worldwide pharmacological experts to update GtP on a case-by-case basis for receptors. For example, many of the “novel” couplings jointly supported by Bouvier and Inoue may be added. This option is advantageous as it involves experts in each receptor system (often with knowledge of other relevant studies) and is backed by the authoritative organization.

      • Given that this manuscript includes authors from both the Inoue and Bouvier studies, I can understand why they are not directly assessing which of the two datasets (in relation to the GtP) might be more accurate. Nevertheless, I believe this assessment should be done and that the advantages and disadvantages of the two experimental systems discussed clearly.

      We believe that the three-way intersection of couplings is the most informative and therefore preferred over individual comparison of each of the Inoue and Bouvier datasets to GtP. GtP is unfortunately not suitable as a stand-alone resource – neither to contradict nor support couplings (on the G protein subtype level). This is because GtP is incomplete (especially for G12/13) and does not provide any information on the level of G protein subtypes, only families. The three-way interactions will always use GtP but adds a second dataset on top of this when validating a third dataset. Our manuscript already included a three-way intersection of datasets, allowing readers to conclude which dataset might be more accurate (then Fig. 3 and Spreadsheet 3) on a per-G protein basis.

      In the revised manuscript, we have rewritten this section, which now has the heading “Bouvier’s and Inoue’s biosensors appear more sensitive for G15 and, Gs and G12, respectively. We have also made a completely new figure, Fig. 7, which more clearly illustrates for which G proteins that Bouvier and Inoue may have overrepresented or underrepresented couplings. This section specifically investigates the question of whether differential sensitivity can explain “unique” couplings. However, such unique couplings can either be due to overrepresentation or instead be true positives that are missing in GtP because of incompleteness and in the other biosensor due to lower sensitivity. Unfortunately, we will not be able to distinguish these possibilities until the research community has gained additional datasets from independent biosensors with as high sensitivity.

      Whereas our study compares datasets rather than experimental systems, we have added a paragraph in the Discussion describing which aspects should be considered when choosing a biosensor. There, we reference a review from last year dedicated to biosensors and describing their pros and cons (3), and the accompanying paper by Bouvier et al. (4), comparing several aspects of the experimental system used by Inoue et al. It is also important to note that the most advantageous biosensor may be one of the two for which data is analyzed in our paper. For many studies, researchers may instead be better off with another biosensor, for example those from Lambert/Mamyrbekov (5), Roth (2) (Gαβγ sensors first described in (6-11)) or Inoue (unpublished dissociation assays using wt G proteins fused with LgBit and HiBit). These are all referenced in the Discussion.

      References:

      1. Pandy-Szekeres G, Esguerra M, Hauser AS, Caroli J, Munk C, Pilger S, et al. The G protein database, GproteinDb. Nucleic Acids Res. 2022;50(D1):D518-D25. 10.1093/nar/gkab852
      2. Olsen RHJ, DiBerto JF, English JG, Glaudin AM, Krumm BE, Slocum ST, et al. TRUPATH, an open-source biosensor platform for interrogating the GPCR transducerome. Nat Chem Biol. 2020;16(8):841-9. 10.1038/s41589-020-0535-8
      3. Wright SC, Bouvier M. Illuminating the complexity of GPCR pathway selectivity – advances in biosensor development. Curr Opin Struct Biol. 2021;69:142-9. https://doi.org/10.1016/j.sbi.2021.04.006
      4. Avet C, Mancini A, Breton B, Gouill CL, Hauser AS, Normand C, et al. Effector membrane translocation biosensors reveal G protein and B-arrestin profiles of 100 therapeutically relevant GPCRs. bioRxiv. 2021:2020.04.20.052027. 10.1101/2020.04.20.052027
      5. Masuho I, Martemyanov KA, Lambert NA. Monitoring G Protein Activation in Cells with BRET. Methods Mol Biol. 2015;1335:107-13. 10.1007/978-1-4939-2914-6_8
      6. Gales C, Rebois RV, Hogue M, Trieu P, Breit A, Hebert TE, et al. Real-time monitoring of receptor and G-protein interactions in living cells. Nat Methods. 2005;2(3):177-84. 10.1038/nmeth743
      7. Gales C, Van Durm JJ, Schaak S, Pontier S, Percherancier Y, Audet M, et al. Probing the activation-promoted structural rearrangements in preassembled receptor-G protein complexes. Nat Struct Mol Biol. 2006;13(9):778-86. 10.1038/nsmb1134
      8. Schrage R, Schmitz AL, Gaffal E, Annala S, Kehraus S, Wenzel D, et al. The experimental power of FR900359 to study Gq-regulated biological processes. Nat Commun. 2015;6:10156. 10.1038/ncomms10156
      9. Breton B, Sauvageau E, Zhou J, Bonin H, Le Gouill C, Bouvier M. Multiplexing of multicolor bioluminescence resonance energy transfer. Biophys J. 2010;99(12):4037-46. 10.1016/j.bpj.2010.10.025
      10. Bunemann M, Frank M, Lohse MJ. Gi protein activation in intact cells involves subunit rearrangement rather than dissociation. Proceedings of the National Academy of Sciences of the United States of America. 2003;100(26):16077-82. 10.1073/pnas.2536719100
      11. Janetopoulos C, Jin T, Devreotes P. Receptor-mediated activation of heterotrimeric G-proteins in living cells. Science. 2001;291(5512):2408-11. 10.1126/science.1055835

      Reviewer #2:

      This study is a meta-analysis of previously reported studies on G protein-coupled receptor (GPCR) coupling to G proteins. The data sets are from three distinct sources: a compendium compiled by the International Union of Basic & Clinical Pharmacology (IUPHAR), and two data sets compiled by two separate laboratories. Each of these data sets describes the coupling of members of the superfamily of non-sensory GPCRs (~200 genes) to the large family of G protein alpha subunits (~20 genes). The authors try to arrive at a consensus for receptor-G protein coupling from the three data sets, as well as identify and highlight differences or incongruencies. Compiling these vast data sets into a unified format will be extremely useful for investigators to understand receptor and effector relationships. The meta-analysis will help to deconvolute the complex physiology and pharmacology underlying hormone or drug actions acting on receptor superfamilies. A better understanding of receptor-G protein selectivity and/or promiscuity will ultimately help in identifying safer therapeutics.

      We appreciate the summary and the explanation of the usefulness of our meta-analysis and its potential impact.

    1. Author Response:

      Reviewer #3 (Public Review):

      In both neurons and glia (astrocytes, microglia, and oligodendrocytes) of patients with amyotrophic lateral sclerosis (ALS) and/or frontotemporal dementia (FTD), the DNA/RNA-binding protein TDP-43 is mislocalized from the nucleus to the cytoplasm where it forms pathological inclusions. Because this subcellular redistribution leads to TDP-43 depletion from the nucleus, the pathogenic mechanism may involve (1) the loss of nuclear function, (2) a gain of cytoplasmic function, or (3) a contribution from both. Heo, Dongeun et al., teases apart this first possibility by investigating the depletion of TDP-43 within specific stages of the oligodendrocyte lineage in vivo and raising the possibility of glial damage in disease progression. The authors found that the consequences of TDP-43 deletion in oligodendrocytes was dependent on the stage of oligodendrocyte maturation. First, they find that deletion of TDP-43 from oligodendrocyte precursor cells (OPCs) resulted in their rapid death, however OPCs that retained TDP-43 expression repopulated to their normal density. Secondly, they find that constitutive deletion of TDP-43 from early premyelinating oligodendrocytes exhibited seizures and early lethality. Similar conditional deletion of TDP-43 from early premyelinating oligodendrocytes in the adult CNS induces abnormal morphological changes, motor discoordination (but no seizures), and premature lethality. Meanwhile, constitutive deletion of TDP-43 from myelinating oligodendrocytes did not lead to any gross phenotypes or shortened lifespan. However, early deletion led to oligodendrocyte degeneration and astrogliosis followed by oligodendrocyte regeneration. Interestingly, in both early and mature oligodendrocytes loss of TDP-43 lead to morphological changes, thinner myelin, less myelinated axons, and aberrant myelination. These results are very interesting and opens the door for further questions, such as why are the oligodendrocytes mislocalizing their myelination targets? Why does early deletion of TDP-43 cause such drastic phenotypes? Lastly, to understand the molecular consequences of TDP-43 deletion in both early and late myelinating oligodendrocytes the authors perform RNAseq on FACS isolated KO cells and controls. The authors uncover that loss of TDP-43 from oligodendrocytes in the adult CNS leads to altered splicing of key regulators of oligodendrocyte growth and morphogenesis.

      These findings complement those of Wang, J et al 2018 that shows depletion of TDP-43 in mature oligodendrocytes in the spinal cord is indispensable for the proper functioning of mature oligodendrocytes, including myelination and cell survival. Although Wang, J et al 2018 saw no apparent harm to motor neurons of mice with TDP-43 deleted in mature oligodendrocytes, the mice did have progressive motor deficits and early lethality - similar to Heo, Dongeun et al.

      Strengths:

      To investigate the role of TDP-43 within distinct stages of oligodendrocyte maturation, the authors used four different Cre and CreER mouse lines: 1) Pdgfra-CreERT2, 2) Mobp-iCre, 3) Mog-iCre, and 4) Mobp-iCreERT2, thus allowing them to inactive Tardbp in both the developing and mature CNS. By performing thorough analysis at each discrete stage within the oligodendrocyte lineage, the authors uncovered differential requirement for TDP-43 in cell survival and structural maintenance as OPCs transform into early and late myelinating oligodendrocytes. These results are important because they elucidate the contribution of a DNA/RNA-binding protein to oligodendrocyte development. Additionally, the results of the conditional deletion of TDP-43 from early premyelinating oligodendrocytes in the adult CNS is critical for understanding how nuclear depletion of TDP-43 from oligodendrocyte might contribute to disease pathogenesis.

      The authors also performed bulk RNAseq on early and late myelinating oligodendrocyte controls and TDP-43 KO cells. In doing so, not only did they uncover hundreds of differentially expressed (DE) genes between each control and KO, but thousands of DE genes between the two controls. This experiment also confirmed that Mobp-iCre and Mog-iCre mouse lines were able to target different stages of oligodendrocyte development. This dataset is very exciting to both the developmental glial biology community and to those trying to understand the molecular mechanisms within glia that contribute to neurodegenerative disorders.

      Minor weaknesses:

      In Figure 1, the authors observe that in the cKO mice, the OPCs are dying because they observe a lack of NG2 staining. Is it possible the OPCs have changed to another cell identity that is NG2- in the absence of TDP-43? Tunnel staining would clarify that indeed the cKO OPCs are dying. Furthermore, the authors note that despite the extensive death of OPCs, they do not see signs of GFAP+ astrogliosis. Is there instead an increase in microglia activation? Throughout the paper, the authors use only GFAP+ astrogliosis to measure widespread inflammation. It would be more compelling to also look at the contribution of microglia or other inflammatory markers to measure inflammation.

      To perform fate-mapping of TDP-43 deficient OPCs, we crossed the Cre-dependent EGFP reporter (RCE) to Pdgfra-CreER x Tardbp floxed mice (PDGFRα-TDP43). By assessing EGFP expression, we determined that disrupting Tardbp expression in OPCs results in rapid degeneration of OPCs within one month. The dramatic reduction in EGFP^+ OPCs indicates that knockout of Tardbp results in cell loss rather than downregulation of NG2 and/or a shift in cell identity.

      We agree that it would also be informative to extend these studies to examine other markers of neuroinflammation. We settled on GFAP immunostaining for visualization purposes, as cortical GFAP immunoreactivity is limited in control mice, providing the greatest sensitivity and has been established as a robust biomarker of neuroinflammation. As the external consequences are not the primary focus of this study and are likely to be both complex and time dependent, we feel that these studies are outside the scope of the present study.

      As shown in Figure 3, loss of TDP-43 in oligodendrocytes at early and mature stages leads to similar profound phenotypes within both the Mobp-TDP43KO and Mogp-TDP43KO mouse lines. However, only early when TDP-43 is deleted using the Mobp-TDP43KO, are there severe physical phenotypes in the mice and early lethality. However, the authors show that there is no change in the density of ASPA+ mature oligodendrocytes in Mobp-TDP43KO and Mogp-TDP43KO at any stage. If there is an increase in the turnover of oligodendrocytes and oligodendrocyte number stays the same, can the authors speculate in their discussion what they believe is causing the severe seizure and lethality phenotypes in the Mobp-TDP43 KO mice? The authors mention that there is an increase in astrogliosis. Are they suggesting this change in astrocyte activity could promote the severe phenotypes and early lethality? Because motor neuron number is not affected by TDP-43 deletion, but no direct measurements of motor neuron activity were taken, it is hard to make sense of the phenotypes observed.

      In the Discussion, we speculated that early deletion of TDP-43 in oligodendrocytes (Mobp-TDP43 KO) compromises their long-term survival. This gradual degeneration of oligodendrocytes, which is masked at the population level by OPC mediated regeneration, nevertheless induces widespread astrogliosis, indicative of progressive neuroinflammation. Oligodendrocyte loss has previously been shown to induce neuroinflammation, seizures and neurodegeneration (Traka et al. 2016). We agree that there are many interesting features that remain to be understood about the phenotypic consequences of TDP-43 loss within oligodendrocytes, such as possible reorganization of myelin patterns (see Orthmann-Murphy, Call, et al. 2020) and mechanistic links between oligodendrocyte degeneration, gliosis and inflammation. We are pursuing some of this analysis now, but they necessitate extensive interrogation of the transgenic mice using new assays. Thus we feel that these experiments are outside the scope of the current study.

    1. Author Respoinse

      Reviewer #2 (Public Review):

      In the results of Fig. 2, the proteins are emitted at distance epsilon from the cortical boundary. From there, they locally perform 1D diffusion to the boundary, so most of them would readsorb once they diffuse a distance epsilon. Only a small fraction would extend past epsilon, which I assume is why the concentration drops by orders of magnitude beyond epsilon. Is such a concentration drop realistic given typical numbers of proteins in cells?

      This is a good point. In [29], McInally et al. investigate kinesin-13 concentrations in Giardia and find that it drops sharply near the pole (about three to four orders of magnitude), as surmised by the referee. The drop off we see in our model is like what McInally et al see in terms of orders of magnitude decrease in the concentration gradient close to the pole.

      It should be clarified if the proposed size scaling is independent of the specific choice of the distance epsilon of the point of protein release from the anterior pole. I don't see any reason why this distance should increase with cell size as epsilon = 0.05 R (on page with equation 5). It's unclear if the size scaling of the concentration gradient might be dependent on the assumption epsilon ~ R.

      Figure R1 shows the dependence of the gradient on epsilon and see that the concentration gradient from the pole is unaffected everywhere beyond the source.

      Figure R1. Concentration gradient for cells with the source at different distances from the pole (ϵ) Concentration profiles with differing source points. We start very close to the pole and move further away. The radius of the sphere is 10 μm, the diffusion constant D=1 μm^2/s and the transport speed along the cortex is v=1μm/s.

    1. Author Response

      Reviewer #1 (Public Review):

      Canonical miRNA-targeting involves pairing between the miRNA seed region (nucleotides 2-7, counting from the miRNA 5' end) and a target mRNA. Pairing downstream of the seed can also influence target recognition, and in some cases 3' pairing can compensate for imperfect seed complementarity. In this study, McGeary et al. investigated the features of such miRNA 3' compensatory sites in a high-throughput manner by adapting the RNA bind-n-seq (RBNS) method used previously to characterize binding of purified Argonaute2-miRNA complexes to a random pool of target RNAs.

      Strengths To focus on 3'-compensatory sites, which are rare in random libraries, the authors designed libraries of RNAs containing imperfect seed complementarity followed by 25 nucleotides of random sequence. This approach allowed investigation of a range of 3' pairing possibilities far more extensive than any previous work. Results provide several unexpected findings. Contrary to the prevailing model that miRNA nucleotides 13-16 are most efficacious for 3' pairing, the authors found the optimal position varies between miRNA sequences and is often shifted to include G nucleotides in the miRNA. The number of unpaired nucleotides bridging seed and 3'-paired regions is also a factor-certain let-7 sites preferring an offset of +4 target nucleotides, indicating a high affinity target-binding mode previously unknown. Additionally, the contribution of miRNA 3' pairing correlates poorly with predictions from nearest-neighbor parameters. Overall findings greatly expand insights into miRNA 3' pairing and provide metrics for improving target prediction.

      Weaknesses Conclusions are drawn entirely from RBNS data sets, leading to a few limitations. Affinity measurements are limited to relative KD values, making comparison to other work in the field indirect and potentially problematic. For example, let-7 target sites in lin-41 have 11-19 3' compensatory pairing, +1nt offset, which (based on Fig. 2B and 2C) has a greater relative KD than the let-7 8mer canonical site. However, a recent result showed an in vivo lin-41 reporter with two 8mer sites is less repressed than same reporter bearing the wild type 3'-compensatory sites (1). In the absence of KD values and/or cellular repression data for these specific sites the noted differences are difficult to reconcile. Additionally, analyses assume miRNA-target complementarity directly correlates with physical pairing between miRNA and target. However, because physical pairing occurs within the Argonaute2-miRNA complex, this may not always be the case

      1. Duan, Ye, Isana Veksler-Lublinsky, and Victor Ambros. "Critical contribution of 3'non-seed base pairing to the in vivo function of the evolutionarily conserved let-7a microRNA." bioRxiv (2021).

      Although conclusions from our affinity measurements agree with those of the Duan et al. (2021) bioRxiv submission with respect to the relative importance of pairing to let-7a nucleotides 11 and 12 compared to that of pairing to nucleotides 18 and 19 (Figure 7—figure supplement 1D), some of the more detailed conclusions do indeed differ. These differences might be due to our measurement of site affinity rather than repression (as mentioned by the referee). Alternatively, they might be due to either differences between flanking sequences of the sites or differences between human and C. elegans systems. With respect to whether two 8mer sites without 3′ pairing are as effective as the endogenous lin-41 sites, the results of Duan et al. (2021) are another step removed from ours because they measure mutant phenotypes rather than lin-41 repression. Nonetheless, as the reviewer suggests, the strain in which lin-41 seed pairing is restored and 3′ pairing is disrupted has phenotypes consistent with insufficient repression by let-7, which would not be expected if relative affinity of 8mer sites matched the affinities of the endogenous lin-41 sites.

      To investigate the relationship between 3′-pairing affinity in vitro and repression in cells, we performed a massively parallel reporter assay and have included these new results in our revision (Figure 3). Overall, we found that affinity in vitro corresponded well to repression in cells (r2 = 0.71). With respect to the example mentioned by the referee, we observed that dual 8mer sites imparted repression similar to that of the lin-41 sites (Figure 3B, bottom). We also observed some benefit of the endogenous flanking-sequence context of the lin-41 sites, but this benefit extended to the other sites as well, including to the 8mer sites (Figure 3—figure supplement 2). Thus, differences between our results an those of Duan et al. (2021) appear to be primarily attributable to differences between our two systems—perhaps a difference between human and C. elegans.

      With respect to the referee’s last point, we use “pairs” and “pairing” as synonymous with “Watson–Crick matches” and “complementarity.” In our revision, we have clarified that our use of “pairing” refers to potential pairing, not physical pairing.

      Reviewer #3 (Public Review):

      The Bartel Lab tackles the elusive role of the 3' part of miRNAs to contribute to the binding of target RNAs. In short, the presented data lead to the following conclusions:

      1- The positions most important for 3′ pairing differed between different miRNAs;

      2- Compared to Grimson et al. 2007, the authors show that preferred pairing often does not correspond precisely to positions 13-16, but it does always at least partially overlap such stretch of nucleotides;

      3- Two distinct 3′-binding modes seem to exist. Yet, arriving to that conclusion (that is at the core of the title) is not easy for the reader (see below);

      4- Increasing miRNA length can sometimes improve 3′ binding affinity, but it cannot substitute for other features required for high affinity to the miRNA 3′ region.

      5- Central to the paper and underlying several analyses, the authors show that parameters derived from interactions of purified RNAs in solution are not directly relevant to miRNAs associated with AGO2;

      6- GG/GC/CG dinucleotides in positions 13-16 most likely participate in productive 3' pairing, and extra Gs beyond this stretch also favor.

      7- Importantly, there is a functional difference between 3′-supplementary and 3′- compensatory pairing in regard to the presence of mismatches in the seed.

      8- By using chimeric miRNAs, the authors separate effects of seed-mismatches, to those effects derived from the length, position, offset, and nucleotide-identity preferences of the 3′ region;

      9- Finally, the two different 3' binding modes presented in this manuscript help rationalizing some aspects of target-dependent miRNA degradation (TDMD).

      The title: Should the term "seed mismatch" be included to highlight one of the most important aspects of the paper?

      We have changed the title to “MicroRNA 3′-compensatory pairing occurs through two binding modes, with affinity shaped by nucleotide identity and position,” to indicate that the conclusions of the title were derived using seed-mismatch sites.

      The Introduction: Well-written and informative, but perhaps too long. The authors should explain why they have chosen Ago2 for all their experiments, when they continuously refer to "AGOs" in the Introduction.

      The Introduction has been shortened to enhance readability. We have also added justification for using AGO2, pointing out that this paralog is typically the most highly expressed and is the one most frequently used by others for biochemical and structural studies of AGO–miRNA complexes.

      The results: Specific comments: The authors jump from Fig. 1A to Fig. 1C. Fig. 1B is mentioned at the Introduction. Should Fig. 1B be moved to the supplement?

      Although we jump from Figure 1A to Figure 1C in the Results section, Figure 1A and Figure 1B are both cited in the Introduction section. Because of the importance of Figure 1B for the Introduction, we have opted to keep this panel in the main text.

      The authors mainly focus on let-7a and two well-known miRNAs: miR-1 and miR-155. The RNA bind-n-seq analysis reveals different binding behaviors. Are those miRNAs representatives? In how much the analysis provided by the authors get close to a (nearly) full picture of 3' miRNA binding modes?

      We initially observed evidence for the positive-offset binding mode in our analysis of let-7a (Figures 2 and 3) but not in our analyses of miR-1 and miR-155 (Figure 4). Nonetheless, we also observed evidence of this second binding mode in our analyses of miR-124, lsy-6, and miR-7, the three other miRNAs with AGO2-RBNS datasets (Figure 5—figure supplement 4). With respect to the question of whether there are more than two binding modes, we have no evidence for additional binding modes but of course cannot rule out this possibility.

      The (many) figures displaying color-gradient squares to calculate Kds are elegant but I would argue that replacing some of them by tables and numbers would be more informative and less demanding for the eye of the reader.

      Although replacing each of the color-gradient squares with a number might be more informative when comparing values for just a few sites, we wonder if digesting the entire table of numbers would be less demanding for the reader. Because replacing the heat maps with tables of numbers would also take much more space, we have opted to keep the heat maps.

      I would also suggest to bring back TargetScan at the Discussion (as in the previous paper by Mc Geary et al. 2019), to highlight the benefits of the biochemical approach on top of the powerful and universally used TargetScan.

      Ideally, the insights gained in this type of study could be used to improve the ability of TargetScan to predict the efficacy of sites with 3′ pairing. However, even with machine learning, we will need binding information for 3′ pairing of more miRNAs before we can build a model that generalizes to miRNAs of any sequence and thereby improve TargetScan.

      A general comment goes towards the presentation of the data. In contrast to other manuscripts, the authors rely on a unique type of data, that emerges from binding assays on nitrocellulose membranes, and their quantification. For a better visualization, I would encourage the authors to include examples of such bindings and quantifications.

      As the reviewer points out, most data of this paper come from AGO-RBNS experiments, which include a filter-binding step to separate library molecules that are bound from those that are unbound. However, because these data are derived from sequencing of amplicons prepared from RNA extracted from the nitrocellulose membranes, they cannot be visualized in the same manner as data from classical filter-binding experiments.

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript Shore et al determine that Nucleoporin 107 (Nup107) is required in developing female somatic cells [intermingled cells (ICs)] for proper ovarian development. The authors propose that Nup107 is required for proper orientation of ICs during development to ensure proper function of escort cells during adulthood. They show that loss of Nup107 results in ectopic germline stem cells (GSCs) away from the GSC niche (primarily cap cells and terminal filament cells) and that these ectopic GSCs display hypermorphic Bone Morphogenic Protein (BMP) signaling. The authors also find that Nup107 regulates expression of the transcription factor doublesex (dsx) and share a common transcriptional target of AdamTS-A. Through knockdown/rescue experiments, the authors show that expression of Dsx-F can rescue phenotypes observed with ovarian somatic knockdown of Nup107 and that ovarian somatic knockdown of AdamTS-A mimics loss of Nup107 and dsx (including loss of escort cell membrane protrusions and enhanced BMP signaling). These data provide an interesting non-cytoplasmic to nuclear transport mechanism for Nup107 in regulating oogenesis.

      This is a well written manuscript with proper experimental analysis (sufficient n's, proper statistics). However, it is unclear how the authors came to some of their conclusions and how their results significantly enhance findings from prior studies exploring how disruption of organization of somatic cells of the developing female gonad influences adult escort cell protrusions and preventing expansion of BMP signaling to ensure proper germline stem cell cyst differentiation.

      Points to consider:

      1. It was unclear reading the first part of the paper how the adult phenotypes were connecting to defects during larval development. It would further strengthen the rationale for describing the adult phenotypes first if the authors perhaps show the larval gonad defects (abnormal IC stacking/arrangement) prior to showing adult phenotypes. This would also help with citing previous studies that have also found the correlation with incorrect IC placement and ectopic GSC-like cells in adulthood (e.g., Tseng et al., 2018, Stem Cell Reports).

      We appreciate this insight and have changed the manuscript as advised, beginning with the larval gonads (line number 116) and then progressing to the adult ovaries (line number 156).

      1. It would also help the reader to know that nuclear pore complex proteins have roles outside of nuclear-cytoplasmic transport early in the manuscript as opposed to only in the discussion.

      As suggested, we have included this in the introduction (line number 63).

      1. It is unclear the model the authors are proposing for the Nup107-Dsx-AdamTs relationship. Are the authors proposing that Nup107 can regulate the import of Dsx or is directly regulating dsx transcription (based on the RNA-sequencing results)? A little more explanation would be helpful.

      We have elaborated on this in the discussion (lines numbers 482-512).

      Reviewer #3 (Public Review):

      This is an interesting story, providing molecular explanation for XX-OD, caused by mutations in Nup107 gene. Overall, the experiments are thoroughly conducted, and the results are important in understanding XX-OD. However, there are some issues that need to be addressed, as the data presented in this study still leaves some gaps that need attention. Whereas I do not think that all the gaps need to be filled for a paper to be published, the gaps that remain in this paper leaves confusions and inconsistencies, they need to be addressed.

      1. Dsx is the major target of Nup107. Although it is clear that Dsx is downregulated in Nup107 mutant, how exactly Nup107 regulates Dsx expression remains entirely unclear. Does Nup107 functions as transcriptional regulator of Nup107?

      As previously mentioned, this is an important question that we hope to answer in future. However, we think that the answer to this question is beyond the scope of this work as our data have clearly demonstrated that Dsx-F acts downstream of Nup107 to regulate activities of unique cell types in the larval as well as adult ovaries to modulate germline soma interactions underlying ovarian morphology and GSC development.

      1. Nup107  Dsx axis is required for escort cells to encapsulate germ cells to allow the downregulation of BMP signaling in germ cells, which in turn allows differentiation of germ cells. Whereas this axis appears to operate, the relationship between germ cell encapsulation and BMP signaling is quite unclear. Is encapsulation upstream or downstream of BMP modulation? Authors provide the evidence that adamTS-A, which modulates the amount of extracellular BMP ligands, is the downstream target of Dsx, and adamTS-A appears to regulate encapsulation. This makes it unclear whether encapsulation is required to down regulate BMP, or BMP regulates encapsulation (and if the latter, what is achieved by encapsulation that leads to germ cell differentiation?)

      This is an excellent point and we followed reviewer’s suggestion to resolve this issue. To answer this question, we knocked down coracle activity specifically just in the ECs (new Figure 4-Figure supplement 2 & 3). This turned out to be highly informative and we thank the reviewer for encouraging us to do this experiment.

      This manipulation revealed that encapsulation is likely upstream of BMP, as disruption of the cellular processes responsible for the encapsulation of germ cells, leads to dysregulation of BMP signaling. Our data thus argue that Nup107, Dsx and AdamTS-A likely function in ECs and are necessary for the formation and maintenance of the cellular protrusions which are required for restricting the BMP signal emanating from the GSC niche. These observations also suggest that Adam-TS-A, which typically resides in the extracellular matrix, is essential for the proper formation and/or maintenance of these cellular protrusions. Altogether these observations indicate that AdamTS-A modulates BMP signal distribution indirectly by regulating the density/length/ activity of the cellular protrusions that restrict the propagation of BMP signal. We have included a short discussion of this possibility at a relevant juncture in the revised discussion. See new addition to discussion, “AdamTS-A modulates BMP signaling via regulation of EC extensions (line number 556).

      1. Dsx regulates germ cell differentiation in a female specific manner. I am somewhat puzzled by distinct phenotypes of Dsx depletion described in this paper (germ cell differentiation defect in female only) compared to other previous reports on Dsx function in male and female germline. Is there any sex-transformation phenotype? The part of this manuscript that describes Dsx appears to be detached from the context (published literature on Dsx), and it's somewhat difficult for me to interpret the results.

      As mentioned previously (and also in the manuscript), we have not addressed the effect of compromising dsx in males since the effect of Nup107 is largely female specific and males are not obviously affected by Nup107 mutation. Whether this is unique to the specific point mutation in nup107 remains to be determined, however.

      We have addressed the question regarding previous reports in great detail above, in point number two of the editor and in the manuscript as well (See new addition to discussion, “Dsx is essential for proper ovarian development” line number 535).

    1. Author Response

      Reviewer #1 (Public Review):

      The genetic destruction of the prothoracic gland by the cell death gene Grim is an accepted method. They need to indicate in this case, how long the prothoracic gland is detectable in the third instar after exposure to the elevated temperature to inactivate the GAL80. For instance, are any gland cells still functional at the time of critical weight? How does this affect the onset of Achaete progression which normally occurs beginning at 5 hr after ecdysis which they show is dependent on low levels of ecdysone? Do you get the same result if you place them at the elevated temperature 12 hr after ecdysis to the 2nd instar, a time when the ecdysteroid titer has already risen to cause the molt to the third instar?

      These are all valid points, and future experiments should explore manipulating larvae at earlier developmental times. We have not tried transferring L2 larvae to higher temperatures, but would be keen to explore this option. We would need to work out the timing of the ecdysteroid pulse in L2 larvae reared at 17oC first to make this work.

      Our previous publication shows that some PG cells are still visible at 42 h after L3 ecdysis in PGX animals, although the whole ring gland is dramatically reduced in size (Herboso et al 2015 Supp Fig 3E,F). However, given the growth and patterning defects are similar to those found in previous studies where ecdysone signalling was inhibited in the wing disc (Mirth et al, 2009; Herboso et al 2015), and because these animals cannot pupariate, it is clear that the residual cells do not produce sufficient ecdysone to promote proper development. We believe the gland cannot produce normal ecdysone titres even early in the third instar because:

      • Delays in patterning for Achaete and Senseless in the wing discs are similar to those we found when over expressing a dominant negative form of the ecdysone receptor specifically in the wing (Mirth et al 2009)

      • The growth defects in the wing discs are similar to other manipulations that reduce ecdysone synthesis in the prothoracic gland, like expressing RNAi against smt3 (Herboso et al 2015)

      • In response to Reviewer 3’s recommendation, we have analysed the ecdysone titres in fed PGX and control larvae and found that PGX produce significantly lower, albeit variable, 20E titres than controls (Figure 2 Supplement 1).

      One concern with the paper is why the authors begin contrasting the effects of temperature on embryonic development and wing disc patterning in the Discussion on p25. This discussion seems irrelevant to the experimental manipulations done in this paper concerned with nutrition and hormone levels. Nothing was done relative to environmental temperature effects except to genetically kill the prothoracic gland, the source of ecdysone. I recommend omission of this section of the Discussion.

      We were attempting to make a point about rates of patterning across developmental contexts, but obviously missed the mark. We have deleted this paragraph.

    1. Author Response

      Reviewer #1 (Public Review):

      Identifying private peptides for generating personalised cancer vaccines is a promising approach to launch robust anti-tumor response; however, the challenges remain in developing an effective process to achieve that. In this manuscript, the authors present an interesting and powerful pipeline (PeptiCRAD) to achieve this goal by examining CT26 model. Overall, this manuscript is well written and presented. Despite that this work presents interesting findings and pipeline, I have the following concerns. I do feel that this manuscript will improve if these concerns can be addressed.

      We thank the reviewer very much for having appreciated the quality and the originality of our work.

      1. It will be critical to confirm TILs and T cells in draining lymph nodes indeed recognise the peptide used in Figure 7-8 by ELISPOT of IFNg.

      We agree with the reviewer´s comment as regard to confirming that TILS and T cells in draining lymph nodes recognise the peptide used in Figure 7-8 by functional characterization in an ELISPOT IFN-γ assay. In our experience, the ELISPOT assay works at the best when fresh samples are employed; additionally, the splenocytes are source of enough cells to test individual mouse reactivity to single peptide. To this end, as the samples from figures 7 and 8 were frozen, we decided to repeat the animal experiment according to figure 7 schedule treatment to perform then the ELISPOT on splenocytes freshly harvested from mice. Following the previous results, we selected the best group (PeptiCRAd1) to further investigate the peptide response; untreated mice (Mock) and Virus alone (VALO-mD901) were used as control as well. Interestingly, the peptide deconvolution showed T cell reactivity to one peptide (RYLPAPTAL, peptide 2) (Figure 1A) in the PeptiCRAd1 group, in contrast no T-cell reactivity was observed for SYLPPGTSL (peptide 1) (Figure 1B). These data highlighted the role of an individual antigen in eliciting specific anti-tumor T cell response, appearing an interested candidate for further proof of concept in animal experimental setting.

      Figure 1 Interferon-γ Elispot results Harvested splenocytes from the treatment groups (as indicated in the figure) were functional characterized in an IFN-γ ELISPOT assay; individual response to SYLPPGTSL A) and RYLPAPTAL B) for each mouse is reported as IFN-γ spot forming cells (SFC)/106 splenocytes. The data are depicted as single dots plot and mean + SEM is shown. (Virus=VALO-mD901, PC=PeptiCRAd).

      1. It would be interesting to see if this pipeline can be used to identify human peptides in human melanomas.

      We thank the reviewer for pointing out that this pipeline can be used to identify human peptide in human melanomas. Indeed, the work here described is a proof concept meant to be translated in human setting. To this end, in the lab we have two projects on-going that are exploiting the same pipeline to investigate the human epithelioid and human mesothelioma ligandome landscape. Regarding this latter, we are investigating four different human cell lines (H2B, MSTO211H, H2452 and JL1). As shown in the picture below (Figure 2), the peptide length distribution showed an enrichment in 9mres in both replicates (Rep1 and Rep2), in line with a ligandome profile. The analysis of the binders revealed that most of them were good binders (according to EL-Rank score) for at least one of alleles for each cell line. Following the pipeline reported in this manuscript, to select candidate peptides we applied two different approaches; the first approach relied on RNA seq analysis to check which source proteins of the peptides isolated in the ligandome analysis were reported as upregulated or downregulated in resected tumor compared to normal tissues.

      (Fromhttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51024 GSE51024). The second approach (analysis still on-going) will be the use of HEX software.

      Regarding the epithelioid project, we have analysed the ligandome profile of two human cell lines: NEPS and VA-ES-BJ. Please find below the example of the ligandome analysis for VAES- BJ cell line (Figure 3). Overall, the analysis outcome was similar to published dataset (aminoacidic length distribution, Gibbs clustering profile, amount of binders) confirming the good quality of the ligandome landscape identified. Next, we applied HEX analysis to narrow down the list of peptide candidates for further test. We are currently in the stage of collecting more different epithelioid cell lines to expand our cohort of samples.

      Figure 2 Mesothelioma project (data not published)

      Figure 3 Epithelioid project (data not published)

    1. Author Response

      Reviewer #2 (Public Review):

      The authors model drug uptake with a lag time (t0), after which there is a constant rate of drug uptake. But why is there such a lag time at all? This would suggest a positive feedback loop in drug binding. However, then one would not necessarily expect a constant drug uptake rate afterwards. The rationale for this model should be better explained. Correlating the fluorescence measurements (as in Fig. 1) with the single-cell elongation rates (as in Fig. 5) could help to identify if the lag in drug uptake coincides with the lag in cell growth.

      We agree with the reviewer that the lag in drug uptake coincides with the lag in cell growth, our data in Figure 2 clearly support the correlation between lag in roxithromycin-NBD uptake and lag in cell growth during treatment. We also agree with the reviewer that other intracellular mechanisms could contribute to lag in drug uptake including a positive feedback loop in drug binding or a positive feedback loop between efflux and drug accumulation (Le et al. mBio 12, e00676, 2021).

      We have added this new information on lines 330-333 of our revised manuscript.

      We have also rephrased the text on lines 103-104 and 216-217 to clarify that the drug uptake rate is not constant and that we have taken this into account by modelling the drug uptake rate as a dynamical variable (described by the second differential equation of the model).

      Finally, we have now explicitly stated that the primary focus of our model is to capture the phenomenology of drug accumulation in our experiments, for example in terms of the measured lag in drug uptake and the time-varying uptake rate. For this reason we have not made assumptions regarding the underlying biological mechanisms (e.g. positive feedback). However, more detailed models will be necessary in the future as we begin to dissect the mechanisms underlying antibiotic accumulation dynamics in individual cells.

      We have now discussed this information on lines 209-212 of our revised manuscript.

      The fact that t0 and k1 are always strongly anti-correlated suggests that these two fit parameters are not independent and simply reflect the same underlying process. It would be critical to clarify this point for the entire analysis of correlations between fit parameters. To this end, the confidence intervals for the fit parameters and their correlations should be estimated using a suitable numerical optimization algorithm. It only makes sense to interpret correlations between fit parameters obtained from different cells if such an analysis shows that the fit parameters are independent (uncorrelated or only weakly correlated) for each individual cell.

      Following the reviewer's input we have measured the correlation between the different kinetic parameters for each individual cell. We found that for 86% and 79% of cells across all antibiotic treatments there was a positive correlation between t0 and k1 and between t0 and Fmax in the posterior Bayesian distribution (i.e. the distribution of parameter values for which the model behaviour matches the data). In contrast, at the population level (using the maximum likelihood estimate for each cell) we found a significantly negative correlation between t0 and k1 for the accumulation of polymyxin B, octapeptin and roxithromycin probes and between t0 and Fmax for the accumulation of polymyxin B, octapeptin, linezolid and trimethoprim probes (Table 2). Finally, we found a significantly positive correlation between k1 and Fmax for the accumulation of polymyxin B, ciprofloxacin and roxithromycin probes (Table 2). The latter correlation was partially imposed by the definition of Fmax in the model (as already acknowledged in the submitted manuscript) and in fact we found that 78% of the cells displayed a positive correlation between these two parameters at the single-cell level. Taken together these data demonstrate that the measured negative correlation between t0 and k1 for the accumulation of polymyxin B, octapeptin and roxithromycin is not due to the fact that t0 and k1 reflect the same underlying process.

      This new information is discussed on lines 267-270 and 275-277 of our revised manuscript.

      The results in Figure 4 are confusing: It seems unlikely that cells are a large enough 'antibiotic sink' to protect neighbouring cells, especially given that cells do not seem to be affected by the nutrient uptake of their neighbours. Furthermore, the opposite correlation (where drug accumulation increases with more 'screening' cells) is very hard to rationalize. A plausible explanation for this effect would be needed. Here, it would be helpful to estimate the molecule numbers (concentrations), uptake rates, and diffusion coefficients of the different antibiotics, compare them to those of nutrient molecules in the growth medium, and explain based on this why the 'screening' cells can have different effects for these molecules on the relevant time and length scales. Without more support there is a concern that these observations may be due to technical artefacts.

      Following the reviewer input we have now explicitly advanced the hypothesis that delayed accumulation of membrane targeting drugs in bacteria screened by other cells could be explained by a transient reduction in the extracellular drug concentration around these bacteria (compared to the concentration in the main microfluidic chamber) due to rapid drug binding to the membranes of screening cells. In accordance with this hypothesis, when we run 2-dimensional numerical simulations of drug diffusion in channels hosting bacteria with a high drug absorption rate (g=0.2 mol m-2 s-1, see Methods), we found a gradient in extracellular drug concentration along the channel length: for the first 90min post drug addition, the concentration was highest around the bacterium without screens and lowest around the bacterium with four screens (new Figure 3-figure supplement 1A). On the contrary, in the presence of bacteria with a low absorption rate (g=0.002 mol m-2 s-1), the extracellular drug concentration equilibrated along the channel length within 2min post drug addition to the device (Figure 3-figure supplement 1C). Accordingly, in the presence of bacteria with high absorption rate, the intracellular drug concentration (that we simply modelled as concentration at the bacterial surface) reached saturation levels in the bacterium without screens within minutes post drug addition, whereas the bacterium with 4 screens reached saturation levels 90min post drug addition (Figure 3-figure supplement 1B). Conversely, bacteria with low absorption rate slowly accumulated the drug independently on the number of screens (Figure 3-figure supplement 1D). Therefore, according to these simplified 2-dimensional transport simulations (i.e. we do not take into account neither efflux nor transport across the gram-negative double barrier), delayed accumulation of membrane targeting drugs in bacteria screened by other cells is due to a transient reduction in the extracellular drug concentration around these bacteria, whereas other mechanisms must underpin increased roxithromycin accumulation in screened bacteria and this phenomenon should be investigated further in future studies. Finally, we would also like to reiterate that mechanisms other than the microcolony architecture must underlie phenotypic variants with reduced antibiotic accumulation. In fact, we registered significant cell-cell differences in antibiotic accumulation even within the same subpopulation of bacteria with the same number of screening cells (Fig. 3).

      These new data are discussed in the revised manuscript on lines 386-406 and 796-815 (where we have now reported details about the COMSOL simulations) and in the new Figure 3-figure supplement 1.

    1. Author Response

      Reviewer #1 (Public Review):

      Zaffagni et al. investigated the host cell response in a transcriptome level upon expression of viral proteins of SARS-CoV-2. They found that expression of Nsp14, highly conserved non-structural protein induces a dramatic remodeling of transcriptome that mimics SARS-CoV-2 infection. They revealed functional impacts of Nsp14 in various transcriptomic aspects such as transcript abundance, alternative splicing, and transcriptomic remodeling in a time course manner. They found IMPDH2, the rate-limiting enzyme in GTP biosynthesis as a key mediator of Nsp14 effects on host transcriptome, posing IMPDH2 and Nsp14 as a therapeutic target against SARS-CoV-2.

      The paper revealed various transcriptomic effects upon Nsp14 expression. But biological relevance of infected cells should be verified on these effects. It would be better to explain the mechanistic link among these observations and some data need to be further validated to support their conclusions.

      1. Are the alternative splicing pattern and increased circRNAs upon Nsp14 expression also observed in SARS-CoV-2 infection?

      We thank the reviewers for the insightful question, helping us to contextualize our data to the physiological events of SARS-Cov-2 infection.

      Regarding alternative splicing, previous publication showed that upon infection -90% of the genes show altered splicing events, mainly intron retention events (Banerjee et al. 2020). Previous studies attributed this to another viral protein, Nsp16, as expression of Nsp16 causes mRNA splicing suppression of a minigene by binding U1 and U2(Banerjee et al. 2020). However, those studies are mainly based on experiments in which Nsp16 is expressed as exogenous protein. Interestingly, we see a similar trend when we express Nsp14, with retention of more than 2,000 introns. This suggests that the massive intron retention observed during SARS-CoV-2 infection might be due to both Nsp16 and Nsp14. We have added some sentences contemplating this possibility although making clear that, while there is a known mechanisms by which Nsp16 expression provokes intron retention, this is not the case for Nsp14 and could be indirect (i.e., by activation of IMPDH2 since pharmacological inhibition of this protein partially rescues some intron retention effects, or other unknown pathway). Besides, we re-analyzed a published dataset of HEK293T-hACE2 infected with SARS-CoV-2(Sun et al. 2021) and we showed that there is a 10% overlap between the alternative splicing events in this dataset and in our dataset (in the HEK293T dataset a much smaller number of genes showed altered splicing upon infection). This is way over what is expected by chance (p-value <10-15), indicating that Nsp14 expression partially recapitulates the effects on alternative splicing that happen during the infection. Notably, this might be an underestimation because the sequencing depth of this published dataset was lower than our dataset. In any case, these effects in splicing strengthen the idea of parallel and comparable changes in gene expression during infection with those we observed upon Nsp14 overexpression.

      Regarding circRNAs, previous studies showed that circRNAs are generally degraded in the context of viral infection (T.-C. Chen et al. 2020 and Liu et al. 2019), so it is no possible to perform this comparison as it will be masked by this general phenomenon. Nevertheless, we think that the Nsp14 effect on circRNAs expression is very interesting. Indeed, it could be related to the observed anti-innate immunity activity of Nsp14. One proposed model is that circRNAs can trap PKR into the cytoplasm, to repress its activity (Liu et al. 2019). Upon viral infection, they are degraded, and PKR can shuffle into the nucleus to trigger immune surveillance. Indeed, upon SARS-CoV-2 infection, circRNAs are mostly downregulated (our own unpublished data, manuscript in preparation), suggesting that maybe Nsp14 could antagonize innate immunity by upregulating some circRNAs. We agree we didn’t address this in the present study. In any case, we have now included all this explanation and discussion in the new version of the manuscript. We thank the reviewers for bringing this up, as it has significantly improved the manuscript.

      1. The authors showed that IMPDH2 contributes to Nsp14-mediated transcriptome changes. I wonder whether catalytic activity of IMPDH2 also affects the alternative splicing events mediated by Nsp14 expression. Given that the GC content is associated with the sensitivity to Nsp14-mediated alternative splicing, I am curious whether increased GTP level upon Nsp14 expression could be related to the alternative splicing events. How are the alternative splicing events when IMPDH2 inhibitor (MPA) was treated to Nsp14-expressed cells (comparing Nsp14+MPA to Nsp14+DMSO as Figure 6E)?

      This is indeed an interesting point which we have now addressed experimentally. Unfortunately, we could not perform alternative splicing analysis on the original dataset generated for comparing Nsp14+MPA or Nsp14+DMSO because it is a 3’RNA seq experiment. To overcome this problem and address this point, we performed RT-qPCR for detecting some of the intron retention events observed upon Nsp14 expression. First, we confirmed that Nsp14 expression induces an increase in these intron retention events (Figure 6G and Figure 6 - figure supplement 6G, red bars). Second, we showed that both MPA and MZR treatment partially rescue the splicing of these introns (see Figure 6G and Figure 6 - figure supplement 6G). Taken together, these data demonstrates that IMPDH2 mediates also the alternative splicing events induced about Nsp14 and the idea that this might be mediated by the increased GTP is indeed a compelling hypothesis. We thank the reviewer for bringing this point.

      In addition, and as mentioned above, we see increased circRNAs levels upon Nsp14 expression, that might derive from increased back-splicing. In the original version of the manuscript, we showed that the expression of circRNAs is partially rescued upon MPA treatment, and we confirmed this result using another inhibitor (Mizoribine, abbreviated as MZR). Now we included these results in Figure 6 - figure supplement 2F. These data indicate that IMPDH2 is involved in modulating circRNAs expression, probably due to biosynthesis regulation, but we cannot exclude that IMPDH2, or other pathways might increase also circRNA stability.

      1. It would be nice to provide information of a responsible domain of Nsp14 for its effect on the host transcriptome. Also, I wonder whether this domain is required for its interaction with IMPDH2, which would further validate the IMPDH2-mediated Nsp14 effect on host transcriptome.

      We thank the reviewer for the insightful suggestion. In response to this comment, we generated a Nsp14 mutant for the N7-guanine–methyltransferase activity (Nsp14 D331A). Then, we compared how the expression of this construct affects some Nsp14 targets (mRNAs and circRNAs) by RT-qPCR (Figure 4G and 4H). Interestingly, we found that the N7-guanine-methyltransferase domain is required for mediating the gene expression changes induced by Nsp14 WT, as the level of the tested Nsp14-targets are not affected by expression of the N7-guanine-methyltransferase Nsp14 mutant. Importantly, we verified by Western Blot that the protein was expressed at levels comparable to WT Nsp14 (Figure 4 - figure supplement 2E). We thank the reviewer for suggesting checking more carefully which domain of Nsp14 is mediating the described effects. We believe that the information obtained with the N7-guanine-methyltransferase mutant provides a more comprehensive view of the mechanism. We agree with the reviewer that it would be nice in the future to check whether this domain is crucial for the interaction with IMPDH2 by performing CoIP or in vitro activity experiments.

      1. To make their conclusion "IMPDH2 is a key mediator of the effects of Nsp14 on the transcriptome of the hosting cell." more compelling, a rescue experiment using wild-type or catalytic dead mutant of IMPDH2 is needed. Or at least, the authors should confirm whether MPA effect on Nsp14-mediated transcriptome change can be reproducible using another IMPDH2 inhibitor.

      We understand the concern of the reviewer and hence we have performed the suggested experiment and showed that the effect on mRNAs, circRNAs, and alternative splicing can also be partially reverted by using a second IMPDH2 inhibitor (Mizoribine - MZR). Specifically, MPA binds targets the site of the cofactor, NAD+/ NADH, while Mizoribine binds targets the binding pocket of the natural substrate, inosine monophosphate (IMP)(Liao et al. 2017). We presented the new data in the Figure 6 - figure supplement 2. We thank the reviewer for the suggestion that we believe significantly improved the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very solid and exciting study.

      We thank the reviewer for finding our study to be very solid and exciting.

      I have several suggestions, comments and questions:

      1. The authors focused on examining the role of C129 as a regulator of PTPN22 redox sensitivity based on a published crystal structure of the catalytic domain. It would be great if they could demonstrate the existence of the disulfide bond between C129 and C227 also experimentally (in T cells).

      As we understand it, it is requested that the disulfide bond between C227 and C129, as previously suggested by Tsai et al. (2009) (1) with pure protein, should be documented to actually occur in the activated T cells. We fully agree that this would improve the study and we have therefore made several attempts to demonstrate this oxidation, or the oxidation state of the active site Cys residue in PTPN22 in situ. However, as we had also expected, it has proven to be technically very challenging. Nevertheless, as the functional consequence of the PTPN22 oxidation and the effect of the C129S mutation is clearly documented in the mouse, using in vivo experiments, we still think it is valid to conclude that the reversible oxidation state of PTPN22 as well as the involvement of the Cys129 residue regulates the function of PTPN22 in vivo, which is the main conclusion of our study.

      1. To this end, there are other cysteine residues in the vicinity of C227 such as the C231 that might be involved in the redox regulation PTPN22. The authors should at least discuss the their possible involvement.

      It is correct that Tsai et al. (2009) (1) found that mutating C231 to serine dramatically reduced phosphatase activity, thus suggesting its importance in catalysis. Reactivation assays showed higher reactivation rates for C231S mutants, and they suggested that C231 suppresses reactivation in a reducing environment by competing with C227 for reduction in the catalytic pocket. Therefore, C231 could also be a target for negative regulation of PTPN22. However, our project was from the start limited to the intention of studying whether PTPN22 could be shown to be redox regulated in vivo through modification of key cysteine residues, and the aim has not been to give the full picture of how the molecule is regulated. We have now extended this point in the discussion in the paper.

      1. How is mutation of C227 affecting T cell function? Are the effects similar with those of C129S?

      This would be interesting but to analyze if also the cysteine at 227 is regulating the T cell activation by creating another transgenic C227S mouse is outside the scope of the study. As said above and clearly described in the study, we have focused on the redox-mediated effects through C129 and hope that the reviewer can agree with us that this rather focused study is solid and fully sufficient for publication on its own merits.

      1. Although the in vitro evaluation of the PTPN22 activity is of highest quality, it would be good to demonstrate that C227 redox status is modified under physiological conditions. 25-100 µM H2O2 is a high concentration that might not be reached within a cell and might be lethal for T cells.

      See response to point 1.

      1. C129 seems not to be mutated in patients with autoimmunity but is an excellent tool to test the importance of C227 redox regulation and the findings of this study suggest that its over-oxidation will support autoimmune responses. When considering the clinical relevance of the study, a drug that will protect the oxidation of the catalytic cysteine and/or stabilize the disulfide bond would have beneficial effects. The authors could test such pharmacological modulators in isolated T cells.

      Indeed, such modulators would be very interesting to test; however, developing such drugs can hardly be demanded to be within the scope of this study. We have however included a statement on this topic in the Discussion of the manuscript.

      1. The authors discuss that NOX2-derived ROS most likely originate from antigen presenting cells. I fully agree with this discussion. However, some studies have proposed that NOX2 plays an important role also in T cells, a finding which was not confirmed by other following studies. It would be great if the authors could address this controversial issue in regards to their findings.

      The finding that the ROS that modify PTPN22 in fact come from the interacting APC rather than from the T cell itself we believe is very important. However, we have not made a major point of this as we have shown that aspect before in other studies, and we wanted in the current paper to focus on the take home message that PTPN22 could hereby be shown to be redox regulated in vivo. However, the last word about the source of ROS has not been said. The controversy whether the Ncf1 containing NOX2 complex is functionally expressed in T cells stems from the paper by Jackson et al. in Nat Immunol 2004 (2). We have not been able to reproduce those findings and in addition we have never detected a NOX2 dependent response in pure T cells, which has also been shown in several of our papers. There are certainly many pitfalls, contaminating NOX2 expressing cells, NOX2 containing exosomes and peroxides, and even NOX2 complexes picked up by interactions with antigen presenting cells. However, it is dangerous to completely exclude that Ncf1 could be expressed at minimal levels or to exclude that functional NOX2 complex can indeed be formed in T cells, and we all know that minute levels of any peroxide as produced by cells could have an impact on cellular functions. But, based on the present knowledge we conclude that T cells do not functionally express Ncf1-containing NOX2 complexes. We have now added two references to enlighten this point, (3, 4; refs. 38 & 39 in the manuscript).

      1. Fig. 1: Is the addition of bicarbonate affecting the pH and thus the activity of PTPN22?

      No, we believe that addition of bicarbonate is not acting by an altered pH but is instead required for formation of peroxymonocarbonate when reacting with H2O2, which is subsequently the molecular species that bypasses the cellular antioxidant systems in order to oxidize the active site Cys residues of target PTPs. This was shown by us in an earlier publication (Dagnell et al, ref. 11 in the manuscript) (5) and a sentence has now been added in the Discussion to further emphasize this point.

      1. The H2O2 concentration dependence of PTPN22_C129S should also be shown as for WT (see Fig. 1B)

      We agree with the reviewer that titration of the mutant with additional H2O2 concentrations could potentially have been done, but we thought that the comparison of WT and C129S enzyme side-by-side using either 0 µM, 25 µM or 50 µM as in Fig. 1D was a sufficient comparison in H2O2 sensitivity. Unfortunately, we do not have the possibility to analyze more purified C129S mutant protein at the moment and it would require a major effort to run those additional experiments. We thereby hope that the reviewer would agree with having the data presented as they currently are to be sufficient.

      1. Quantification of the slope based on only 3 measuring points is not accurate (Fig. 1D).

      Each data point in those curves represents the mean ± S.D. derived from duplicate samples ran three different times, with clearly very low standard deviations. Thus, we believe that the data are reliable and that the statistically significant difference when comparing the slopes between WT and the C129S mutant as shown in the figure, should be trustworthy.

      1. The pinna thickness measurements shown in Fig. 3B and C suggest that in NCF1 mice C129S has no effect. However, the thickness in NCF1 mice is already much higher than in WT mice (compare B and C). Does this mean that NOX2-derived ROS are the only factor that affects C227 redox properties?

      The effects of the decreased ROS due to the Ncf1 mutation is likely to have consequences for the functions of many proteins, in different pathways, and not only of PTPN22. The sum effect is that the Ncf1 mutated mice responds stronger than the wild type, which explains the difference. However, the main message here is that if there is no ROS from the NOX2 complex, the effect of the PTPN22 mutation is lost.

      1. The results shown in Fig. 5D could be moved to a supplementary figure.

      We prefer to keep it within Fig 5 as it is more logical in the context or the other parts of this figure. Of course, if there is a space layout problem, we can consider moving it.

      1. The calcium measurements are not convincing and the differences are rather small. The y axis labels show 50K, 100K etc. Are this ratio values? If yes the imaging settings need to be optimized. Why is the mutant labeled as Pep? How is the C129S affecting calcium signaling? These observations need be examined in more detail or maybe calcium is not playing an important role.

      We agree that the differences in calcium measurements are not very large but have nevertheless been repeated several times, and there is a significant difference as shown. The calculation is done on the slope of the curve, which is independent of the absolute values given on the y-axis. We agree that the figure was not properly labeled and have now changed this.

      1. I would suggest a more extensive evaluation of the proteomic data presented in Fig. 6D. The results might be very exciting and can further increase the impact of this study.

      We fully agree with this. We have chosen not to go into details of the results of the proteomic analysis. The data shown confirms our conclusion and we did not plan to identify the downstream targets of the PTPN22 oxidative regulation. Highlighting some of these targets will require biological confirmation, which can be done but must await future work. The full dataset has however been deposited in PRIDE for any reader interested to analyze the results further.

      1. Is 24h BSO treatment not toxic for the T cells (ferroptosis)?

      We have not seen any evidence for toxicity upon the BSO treatment of T cells in vitro, which however has been more thoroughly checked by others. Gringhuis et al (JI, 2000) (6) have shown immunofluorescence staining on T cells 72 hours post BSO treatment with intact cell membranes. Additionally, Carilho et al. (Chem. Cent. J., 2013, 7:150) (7) noted no changes in Jurkat T cell viability after 24 hours at a maximum dose of 100 µM BSO.

      Reviewer #3 (Public Review):

      The manuscript by James, Chen Hernandez et al. reveals a novel function for PTPN22 oxidation in T-Cell activation. The authors used a broad array of methods to demonstrate that PTPN22 is catalytically impaired in addition to being more sensitive to reversible oxidation in vitro. In the characterization process, the authors found that PTPN22 could be directly reduced by Thioredoxin Reductase and that oxidation of PTPN22 oxidation could be easily monitored by the appearance of a faster migrating band in non-reducing gels. Supporting the hypothesis that the catalytic Cysteine forms a disulfide with a backdoor Cysteine (Cys129), the authors found that this C129S mutant is prone to oxidation and cannot be reduced back to its active form by Thioredoxin Reductase. Using a new mouse model in which this key Cysteine of PTPN22 is mutated to a Serine residue (PTPN22C129S mutant) and can presumably not form a stabilizing redox intermediate between the catalytic Cys residue and this backdoor Cys (C227-C129), the authors study how the oxidation prone mutant affects T-Cell activation. The authors find that the C129S mutant mouse showed an increased T-Cell dependent inflammatory response that was dependent on activation of the reactive oxygen species-producing enzyme NOX2. This data adds an interesting redox twist to the function of PTPN22 in T-Cells that contributes to conversation on the protective effects of reactive oxygen species against inflammatory diseases in vivo.

      Strengths:

      The in vitro characterization of the WT and C129S mutant form of PTPN22 is very thorough. Determination of the Km and Kcat highlights the differences between the two enzymes that go beyond redox regulation of the phosphatase. The reduction studies are masterfully done and highlight a novel reduction mechanism that merits to be further studied in cells. Demonstrating that PTPN22C129S is prone to oxidation in vitro is a key and technically challenging result that may be applicable to other members of the PTP family that also form disulfides with a backdoor cysteine. Showing that PTPN22C129S mice (backcrossed to B6Q mice making them susceptible to autoimmune arthritis) displayed higher T cell activation in two models (DTH and GPI), in addition to studies in T cells stimulated with collagen, increased this reviewer's confidence that the PTPN22C129S mouse exhibited T-cell-dependent inflammatory response phenotype similar to the PTPN22 knockout phenotype. Validation of T-cell signaling events in PTPn22C129S T cells were in line with the in vitro characterization of the phosphatase.

      We thank the reviewer very much for the detailed summary of our findings and the appreciative words.

      Weaknesses: Although the paper has many strengths, some important weaknesses need to be addressed by the authors. In particular, the authors need to characterize better their mouse model and determine if PTPN22 is reversibly oxidized following TCR activation. If PTPN22 is oxidized, does it form an intramolecular disulfide between C227 and C129? The proposed model, that PTPN22C129S is more prone to oxidation, also has to be validated in vivo. Although this could be technically challenging in theory, the authors have shown that the migration pattern of the oxidized enzyme is different that of the reduced enzyme. Another major issue is that PTPN22 does not appear to be expressed in CD4+ T cells unless these cells are activated in vitro with anti-CD3/CD28 for 24 hours. This makes acute CD3-stimulation of CD4+ T cells studies - such as the measurement of acute calcium influx in Fig. 5E - very difficult to interpret. Perhaps the authors should explain why acute signal transduction studies in Figure 6 were performed in lymph node cells. If the reason is that PTPN22 (WT and C129S mutant) expression is higher, the authors should provide immunoblots for PTPN22 in these cells. Since the PTPN22C129S mouse model has not been sufficiently validated, the claims of the authors are unfortunately weakened and the underlying molecular mechanisms do not completely support their conclusions. However, given the clear in vitro work provided in figures 1 and 2, it is this Reviewer's opinion that the authors can address the issues related to the oxidation status of PTPN22 and of PTPN22C129S in vivo, support their claims, and make a significant contribution to the field.

      We again thank the reviewer for the detailed summary of our findings and for the suggestions. With regards to the in vivo oxidation status of PTPN22, please see the discussion above.

      1. Tsai SJ, Sen U, Zhao L, Greenleaf WB, Dasgupta J, Fiorillo E, et al. Crystal structure of the human lymphoid tyrosine phosphatase catalytic domain: insights into redox regulation. Biochemistry. 2009;48(22):4838-45.
      2. Jackson SH, Devadas S, Kwon J, Pinto LA, Williams MS. T cells express a phagocyte-type NADPH oxidase that is activated after T cell receptor stimulation. Nat Immunol. 2004;5(8):818-27.
      3. Gelderman KA, Hultqvist M, Holmberg J, Olofsson P, Holmdahl R. T cell surface redox levels determine T cell reactivity and arthritis susceptibility. Proc Natl Acad Sci U S A. 2006;103(34):12831-6.
      4. Gelderman KA, Hultqvist M, Pizzolla A, Zhao M, Nandakumar KS, Mattsson R, et al. Macrophages suppress T cell responses and arthritis development in mice by producing reactive oxygen species. J Clin Invest. 2007;117(10):3020-8.
      5. Dagnell M, Cheng Q, Rizvi SHM, Pace PE, Boivin B, Winterbourn CC, et al. Bicarbonate is essential for protein-tyrosine phosphatase 1B (PTP1B) oxidation and cellular signaling through EGF-triggered phosphorylation cascades. J Biol Chem. 2019;294(33):12330-8.
      6. Gringhuis SI, Leow A, Papendrecht-Van Der Voort EA, Remans PH, Breedveld FC, Verweij CL. Displacement of linker for activation of T cells from the plasma membrane due to redox balance alterations results in hyporesponsiveness of synovial fluid T lymphocytes in rheumatoid arthritis. J Immunol. 2000;164(4):2170-9.
      7. Carilho Torrao RB, Dias IH, Bennett SJ, Dunston CR, Griffiths HR. Healthy ageing and depletion of intracellular glutathione influences T cell membrane thioredoxin-1 levels and cytokine secretion. Chem Cent J. 2013;7(1):150.
    1. Author Response:

      Reviewer #1:

      The paper by Liu et al investigates the question of whether the mitochondrial protein import component Tom70 might be involved in the coordination of biogenesis and localization of mitochondrial proteins. It follows a smart hypothesis that positions Tom70 in a coordinating role of nuclear-encoded mitochondrial gene expression and subsequent protein incorporation. The paper shows that Tom70 overexpression uniquely promotes the expression of numerous mitochondrial proteins and that Tom70's mitochondrial localization is required for this. The data then suggest that both mtDNA and a combination of transcription factors are involved in Tom70 controlled nuclear gene expression. The authors then find that Tom70 is also required to dampen nuclear mitochondrial gene expression during import stress. Importantly, Tom70 and numerous other import machinery components become depleted with age in yeast, and the same is true for mitochondrial membrane potential. This is a very strong part of the paper. Tom70 OE rescues this effect, and Tom70 OE extends survival. Finally, suggestive data show that the age-dependent Tom70 depletion is due to reduced expression and enhanced degradation.

      This is an interesting study that uses cutting-edge methods. However, a clear focus of the paper is somewhat missing. The paper touches on many topics that remain unresolved. These include the role of CR and the role of LCD-containing TFs as an explanation for the age-dependent decline of Tom70. There is a role of mtDNA in the Tom70 OE but the link to transcription factors remains unaddressed. For example, degradation of Tom70 is investigated via MDCs, but is autophagy involved? There is a large amount of data in the manuscript that cover a lot of territory, but further mechanistic insights would significantly enhance the paper.

      The authors appreciate the enthusiasm and comments from the reviewer. The focus of our manuscript is to report the discovery that Tom70 moonlights to regulate the biogenesis of mitochondrial protein and mtDNA (Figure 1-3). After identifying this new role of Tom70, we applied it to understand physiological processes including mitochondrial import-related stress response and aging-related mitochondrial dysfunctions (Figure 4-7). We did these application studies because we think it is necessary to understand the physiological significance of this Tom70’s role, which was strongly encouraged by senior colleagues in the aging field during the progress of our project.

      Specifically, we removed the LCD/proteostasis part according to reviewers’ suggestions. We include caloric restriction (CR) because this Tom70’s role can explain the previously observed phenotypes in the aging field, and it is important to connect our discoveries to the previous works as encouraged by colleagues in the field during the progress of our project. We think this is a strength of our study that generated new knowledge which can be used to address previous questions. Regarding mtDNA, we focused on the biogenesis role of Tom70 on mtDNA as a necessary and immediate extension of our study on the biogenesis of mitochondrial proteins. Indeed, the detailed mechanism of how Tom70 signals through the secondary messengers and multiple TFs to achieve these biogenesis functions is still not clear, which will be our future goal. Similar to our manuscript, the seminal study that identified PGC-1 as a key mitochondrial biogenesis regulator also reported the increase of mitochondrial proteins and mtDNA through the induction of a few TFs by PGC-1 (Wu et al., 1999). Regarding MDCs, we simply applied previous knowledge from Hughes and Gottschling’s study that Fis1/Mdv1 regulate the sorting of Tom70 into MDCs in response to acute vacuole stress (Hughes et al., 2016). They have carefully dissected the role of autophagy downstream of MDC formation. However, our purpose is to see what regulates Tom70’s abundance but not how MDC is disposed during aging.

      Reviewer #2:

      The authors test the hypothesis that components of the TOM complex regulate efficient mitochondrial biogenesis by coordinating the synthesis of mitochondrial proteins with the rate of mitochondrial protein import. In general, the experiments are well developed and the findings and topic are likely of broad interest. The weaknesses are mainly related to the underdeveloped approaches and the vagaries related to the mechanism(s) by which Tom70 influences transcription of mitochondrial components.

      The authors performed a survey of TOM components (proteins required for protein translocation from the cytosol into mitochondria) and found that overexpression of Tom70 was sufficient to increase accumulation of 4 GFP-tagged mitochondrial proteins that localize to each of the four mitochondrial sub-compartments suggesting that Tom70 has a unique role in mitochondrial biogenesis.

      Interestingly, the authors demonstrate that Tom70 is required to limit transcription of mitochondrial components when Tim23 is impaired. In doing so, Tom70 prevents the aggregation of mitochondrial proteins that fail to be imported into mitochondria.

      The authors demonstrate that mtDNA is required for the increase in mitochondrial component transcription upon Tom70 overexpression. This is an exciting observation. However, experiments to understand the phenomenon are not considered.

      The authors appreciate the enthusiasm and comments from the reviewer. The focus of our manuscript is to report the discovery that Tom70 regulates the biogenesis of mitochondrial protein and mtDNA. According to reviewers’ suggestion, we tested several possibilities and the results have been discussed. The detailed mechanism of how Tom70 signals through the secondary messengers to achieve this function will be the next focus of our lab.

      Interestingly, overexpression of Tom70 prevents the decline in mitochondrial function typically observed in aging cells, while Tom70-deletion accelerated the loss of mitochondrial function.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Catela is a very interesting study investigating terminal selection factors in spinal motor neurons. While the field has focused largely on the development and specification of motor neurons, much less attention has been garnered on gene expression programs that endow the mature motor neuron with adult stage terminal characteristics. This question has recently been tackled in the nematode roundworm C. elegans, but a terminal selector code in mammals is lacking. The work here showing sustained activity of Hoxc8 acting as a terminal selector is interesting and may point towards this kind of encoding being a general rule throughout nervous systems, from invertebrates to vertebrates. In addition to the strengths of this work

      We are glad the reviewer finds our study interesting and comments on its evolutionary implications.

      However, this work would benefit from added approaches beyond RNAseq and RNA in situ to strengthen the data. Additional neuroanatomical, physiology, and behavior data would certainly also strengthen this work - especially since one would expect to see more phenotypes in hoxc8 mutants beyond only misexpression of downstream genes. There are a wealth of motor behavior/motor acuity tasks that can be performed in the mouse and adding such experiments would certainly strengthen this paper.

      We agree. Our new behavioral data (Fig. 6) nicely complement the molecular analysis of brachial motor neurons in Hoxc8 MNΔearly and Hoxc8 MNΔlate mice.

      Reviewer #2 (Public Review):

      Different motor neurons located in different parts of the spinal cord are known to perform distinct functions. It is known that these differences are specified during embryonic development by the expression of different transcription factors. But it is unknown how these differences are maintained in the adult spinal cord. In this manuscript, Kratsios and colleagues propose that the Hox transcription factor Hoxc8 acts as a terminal selector for brachial motor neurons in the developing mouse spinal cord. They perform a series of experiments in which Hoxc8 is deleted from embryonic (e12) and early postnatal (p8) motor neurons and show that this transcription factor is required for the establishment and/or maintenance of a set of terminal differentiation genes in this motor neuron population. Notably, a similar and larger body of work from this lab has previously focused on the role of terminal selector genes, including Hox factors, in the worm C elegans nervous system development. The main conclusions of this current study are 1) Hox factors also control mouse neuronal terminal differentiation, suggesting an evolutionarily conserved role; 2) Hox genes such as Hoxc8 act through multiple downstream effectors, including other transcription factors such as Irx family; and 3) single transcription factors such as Hoxc8 can have multiple distinct roles in terminal differentiation based on the timing of their expression/action. This latter point is perhaps the most interesting conclusion of this paper as it helps to uncover why Hox gene expression may be maintained in post-mitotic neurons beyond initial cell specification.

      This is an exciting paper and will be of broad interest to the spinal cord field. Their demonstration of similar logic in mouse as to what they reported earlier for C. elegans demonstrates the evolutionarily conserved mechanism by which Hox genes function in terminal differentiation of spinal motor neurons. Strengths of this study include the detailed transcriptomics analyses at different time points in mouse and functional studies using conditional mouse knockouts.

      Overall, the conclusions in this paper are well-supported and of high quality. However, one of the authors' main conclusions is that Hoxc8 expression helps control terminal differentiation of spinal motor neurons. But they identify relatively few potential terminal effector genes (8) that seem to require Hoxc8 at any stage. This is especially evident when examined in the context of the initial RNA seq analysis of wildtype e12 vs p8 motor neurons, in which there are >3000 differentially expressed genes between those time points. Therefore, while Hoxc8 may have some role in brachial motor neuron differentiation, it appears to not be a very significant role, or at least does not seem so based on the analyses presented here. The authors might wish to either clarify this point or try to put their findings in a broader context so the readers can appreciate the importance of Hoxc8 in motor neuron differentiation and the potential involvement of other collaborating factors.

      We completely agree and have modified the text and figures (Fig. 4F, new Fig. 5 - figure supplement 1) accordingly to clarify the importance of Hoxc8 in motor neuron terminal differentiation and the involvement of Hoxc8 collaborators.

      Of note, we found six other Hox genes to be expressed continuously in brachial MNs - these could act as Hoxc8 collaborators. Based on studies conducted in mice and chick embryos at early stages of MN development, the strongest candidate is Hoxc6 (Catela et al., 2016, PMID: 26904955; Jung et al., 2010, PMID: 20826310). Deletion of either Hoxc6 or Hoxc8 at the MN progenitor stage (with Olig2-Cre) results in similar axon guidance defects in brachial MNs, and these early defects can be molecularly explained by Hoxc6 and Hoxc8 collaborating to control the expression of Ret, an axon guidance molecule (Catela et al., 2016, PMID: 26904955). Interestingly, when we interrogated available ChIP-Seq datasets from mouse ESC-derived motor neurons in which Hoxc6 or Hoxc8 expression is induced with doxycycline (Bulajic et al., 2020, we found Hoxc6 and Hoxc8 bind on the same cis-regulatory regions of the terminal differentiation markers (e.g., Mcam, Pappa, Glra2) we identified with our in vivo genetic studies (new Fig. 5; new Fig. 5 – figure supplement 1).

      We have modified the Results and Discussion to clarify this important point, as well as highlight the potential involvement of additional factors outside the Hox family, such Islet-1.

    1. Author Response

      Reviewer #1 (Public Review):

      Dias et al proposed a new method for genotype imputation and evaluated its performance using a variety of metrics. Their method consistently produces better imputation accuracies across different allele frequency spectrums and ancestries. Surprisingly, this is achieved with superior computational speed, which is very impressive since competing imputation softwares had decades of experience in optimizing software performance.

      The main weakness in my opinion is the lack of software/pipeline descriptions, as detailed in my main points 36 below.

      We have made the source code and detailed instructions available publicly at Github. The computational pipeline for autoencoder training and validation is available at https://github.com/TorkamaniLab/Imputation_Autoencoder/tree/master/autoencoder_tuning_pipeline.

      1. In the neural network training workflow, I am worried it will be difficult to compute the n by n correlation matrix if n is large. If n=10^5, the matrix would be ~80GB in double precision, and if n=10^6, the matrix is ~2TB. I wonder what is n for HRC chromosome 1? Would this change for TOPMed (Taliun 2021 Nature) panel which has ~10x more variants? I hope the authors can either state that typical n is manageable even for dense sequencing data, or discuss a strategy for dealing with large n. Also, Figure 1 is a bit confusing, since steps E1-E2 supposedly precede A-D.

      We included more details in the methods section to address this question. It is true that computing the entirety of this matrix is computationally intensive, thus, in order to avoid this complexity, we calculated the correlations in a sliding box of 500 x 500 common variants (minor allele frequency (MAF) >=0.5%). In other words, no matter how dense the genomic data is, the n x n size will always be fixed to 500 x 500. Larger datasets will not influence this as the additional variants fall below the MAF>=0.5% threshold. Thus, memory utilization will be the same regardless of chromosome length or database size. Please note that this correlation calculation process is not necessary for the end-user to perform imputation, since we already provide the information on what genomic coordinates belong to the local minima or “cutting points” of the genome. This computational burden remains on the developer side. The reviewer is right to point out that Figure 1 is misleading in its ordering, we have corrected this in the revision.

      1. I have a number of questions/comments regarding equations 2-4. (a) There seems to be no discussion on how the main autoencoder weight parameters were optimized? Intuitively, I would think optimizing the autoencoder weights are conceptually much more important than tuning hyper-parameters, for which there are plenty of discussions.

      These parameters are optimized through the training process described in “Hyperparameter Initialization and Grid Search / Hyperparameter Tuning” - where both the hyperparameters and edge weights are determined for each autoencoder for each genomic segment. There are 256 genomic segments in chromosome 22, and each segment has a different number of input variables, sparsity, and correlation structure. Thus, there is a unique autoencoder model that best fits each genomic tile (e.g.: each autoencoder has different weights, architecture, loss function, regularizes, and optimization algorithms). Therefore, while there are some commonalities across genomic tiles, there is not a single answer for the number of dimensions of the weight matrix, or for how the weights were optimized. Instructions on how to access the unique information on the parameters and hyperparameters of each one of the 256 autoencoders is now shared through our source code repository at https://github.com/TorkamaniLab/imputator_inference.

      We included an additional explanation clarifying this point in the Hyperparameter Tuning subsection of the Methods.

      (b) I suppose t must index over each allele in a segment, but this was not explicit.

      That is correct, t represents the index of each allele in a genomic segment. We included this statement in the description of equation 2.

      (c) Please use standard notations for L1 and L2 norms (e.g. ||Z||_1 for L1 norm of Z). I also wonder if the authors meant ||Z||_1 or ||vec(Z)||_1 (vectorized Z)?

      We included a clarification in the description of equation 3. ‖𝑾‖𝟏 and ‖𝑾‖𝟐 are the standard L1 and L2 norms of the autoencoder weight matrix (W).

      (d) It would be great if the authors can more explicitly describe the auto-encoder matrices (e.g. their dimensions, sparsity patterns if any...etc).

      As we answered in comment 2.a, each one of the 256 autoencoders for each genomic segment is unique, so it would be unfeasible to describe the architecture, parameters, optimizers, loss function, regularizes, of each one of them. We realized it would be more suitable to share this information in a software repository and have now done so.

      1. It is not obvious if the authors intend to provide a downloadable software package that is user-friendly and scalable to large data (e.g. HRC). For the present paper to be useful to others, I imagine either (a) the authors provide software or example scripts so users can train their own neural network, or (b) the authors provide pretrained networks that are downloaded and can be easily combined with target genotype data for imputation. From the discussion, it seems like (b) would be the ultimate goal, but is only part dream and part reality. It would be helpful if the authors can clarify how current users can benefit from their work.

      We have now shared the pre-trained autoencoders (including model weights and inference source code) and instructions on how to use them for imputation. These resources are publicly available at https://github.com/TorkamaniLab/imputator_inference. We have added this information to the Data Availability subsection of the Methods.

      1. Along the same lines, I also found the description of the software/pipeline to be lacking (unless these information are available on the online GitHub page, which is currently inaccessible). For instance, I would like to know which of the major data imputation formats (VCF/BGEN..etc) are supported? Which operating systems (window/linux/mac) are supported? I also would like to know if it is possible to train the network or run imputation given pre-trained networks, if I don't have a GPU?

      We have now made the github repository publicly available. The description of the requirements and steps performed in the hyperparameter tuning pipeline is available at https://github.com/TorkamaniLab/Imputation_Autoencoder/tree/master/autoencoder_tuning_pipeline.

      1. Typically, imputation software supplies a per-SNP imputation quality score for use in downstream analysis. This is important for interpretability as it helps users decide which variants are confidently imputed and which ones are not. For example, such a quality score can be estimated from the posterior distribution of an HMM process (e.g. Browning 2009 AJHG). Would the proposed method be able to supply something similar? Alternatively, how would the users know which imputed variants to trust?

      We included further clarification in the data availability session of methods: Imputation data format. The imputation results are exported in variant calling format (VCF) containing the imputed genotypes and imputation quality scores in the form of class probabilities for each one of the three possible genotypes (homozygous reference, heterozygous, and homozygous alternate allele). The probabilities can be used for quality control of the imputation results.

      We included this clarification in the manuscript and in the readme file of the inference software repository https://github.com/TorkamaniLab/imputator_inference.

      1. I think the authors should clarify whether input genotypes must be prephased. That is, given a trained neural network and a genotype data that one wishes to impute, does the genotype data have to be phased? The discussion reads "our current encoding approach lacks phasing information..." which can be understood both ways. On a related note, I hope the authors can also clarify if the validation and testing data (page 7 lines 1423) were phased data, or if they were originally unphased but computationally phased via softwares like Eagle 2 or Beagle 5.

      The input genotypes are not phased, nor pre-phased, and no pre-phasing was performed before imputation. We included further clarification on the method section, stating “All input genotypes from all datasets utilized in this work are unphased, and no pre-phasing was performed.”. We also included further clarification in the Discussion session.

      1. It is unclear if the reported run times (Figure 6) includes model training time, or if they are simply imputing the missing genotypes given a pre-trained autoencoder? For the later, I think the comparison may still be fair if users never have to train models themselves. However, if users currently have to train their own network, I feel it is imperative to also report the model training time, even if in another figure/table.

      The end-users do not have to train the models, the computational burden of training the models remains on the developer side, so the runtimes refer to the task of imputing the missing genotypes given a pre-trained autoencoder set. This allows for distribution without reference datasets. We included further clarification on the Performance Testing and Comparisons subsection of Methods.

      Reviewer #2 (Public Review):

      In this manuscript the authors introduce a segment based autoencoder (AE) to perform genotype imputation. The authors compare performance of their AE to more traditional HMM-based methods (e.g. IMPUTE) and show that there is a slight but significant improvement on these methods using the AE strategy.

      In general the paper is clearly presently and the work in timely, but I have some concerns with respect to the framing of the advances presented here along with the performance comparisons.

      Specific Points:

      1. The authors aren't doing a good enough job presenting the work of others in using deep neural networks for imputation or using autoencoders for closely related tasks in population genetics. For instance, the authors say that the RNN method of Kojima et al 2020. is not applicable to real world scenarios, however they seem to have missed that in that paper the authors are imputing based on omni 2.5 at 97% masking, right in line with what is presented here. It strikes me that the RNNIMP method is a crucial comparison here, and the authors should expand their scholarship in the paper to cover work that has already been done on autoencoders for popgen.

      This is an important comparison that we erroneously misrepresented. We have now separated out this particular application of the RNN-IMP in the introduction of the manuscript. The major difference is that RNN-IMP needs to be retrained on different input genetic variants, much like a standard HMM-based method. The computational burden of RNN-IMP remains on the end-user side. It appears that computational complexity is tremendous in this model, given that the only example the authors provided with their software consists of 100 genomes from 1000 Genomes Project to perform the imputation on Omni by de novo training of the data. Given their approach does not achieve the benefits of distributing a generalizable pre-trained neural network, and the computational burden associated with training these models on the 60K+ genomes we use in our manuscript, we have opted for stating the benefits and downsides of their approach in the introduction.

      1. With respect to additional comparisons-Kenneth Lange's group recently released a new method for imputation which is not based on HMM but is extremely fast. The authors would be well served to extend their comparisons to include this method (MendelImpute)-it should be favorable for the authors as ModelImpute is less accurate than HMMs but much faster.

      We appreciate the reviewer pointing out this additional method, however their parent manuscript clearly shows substantially inferior imputation performance relative to BEAGLE/Minimac etc. which we already compare against. There is not much to gain by performing this comparison. Our autoencoder-based approach is already generating results that are competitive with the best and most cited imputation tools, which are all HMM-based and outperforming MendelImpute. The outcome of this comparison is forecasted based upon the parent manuscript.

      1. The description of HMM based methods in lines 19-21 isn't quite correct. Moreover-what is an "HMM parameter function?"

      Thank you for catching this. We were referring to parameter *estimation and have corrected this in the manuscript.

      1. Using tiled AEs across the genome makes sense given the limitations of AEs generally, but this means that tiling choices may affect downstream accuracy. In particular-how does the choice of the LD threshold determine accuracy of the method? e.g. if the snp correlation threshold were 0.3 rather than 0.45, how would performance be changed?

      This choice is driven by the limitations of cutting-edge GPUs. 0.45 is the threshold that returns the minimum number of tiles spanning chromosome 22 with an average size per tile that fits into the video memory of GPUs. While developing the tiling algorithm, we tested lower thresholds, which made the tiles smaller and more abundant, and thus made the GPU memory workload less efficient (e.g. many tiles resulted in many autoencoders per GPU, which thus caused a CPU-GPU communication overhead). Due to the obstacles related to computational inefficiency, CPUGPU communication overhangs, and GPU memory limits, we did not proceed with model training on tiles generated with other correlation thresholds. We’ve added a paragraph explaining this choice in the manuscript.

      1. How large is the set of trained AEs for chromosome 22? In particular, how much disk space does the complete description of all AEs (model + weights) take up? How does this compare to a reference panel for chr22? The authors claim that one advance is that this is a "reference-free" method - it's not - and that as such there are savings in that a reference panel doesn't have to be used along with the genome to be imputed. While the later claim is true, instead a reference panel is swapped out for a set of trained AEs, which might take up a lot of disk space themselves. This comparison should be given and perhaps extrapolated to the whole genome.

      This is an interesting point. For comparison, the total combined uncompressed size of all pre-trained autoencoders together is 120GB, or 469MB per autoencoder. The size of the reference data, HRC chromosome 22 across ~27,000 samples is 1GB after compression – or nearly 10X the autoencoder size. Moreover, unlike in HMM-based imputation, the size of the pre-trained autoencoders does not increase as a function of the reference panel sample size. The size of the autoencoders remains fixed since the number of model weights and parameters remains the same regardless of sample size – though it will expand somewhat with the addition of new genetic variants. Another point to consider is that privacy concerns associated with distribution of reference data are mitigated with these pretrained autoencoders.

      1. The results around runtime performance (Figure 6) are misleading. Specifically HMM training and decoding is being performed here, whereas for the AE only prediction (equivalent to decoding) is being done. To their credit, the authors do mention a bit of this in the discussion, however a real comparison should be done in Figure 6. There are two ways to proceed in my estimation - 1) separate training and decoding for the HMM methods (Beagle doesn't allow this, I'm not sure of the other software packages) 2) report the training times for the AE method. I would certainly like to see what the training times look like given that the results as present require 1) a separate AE for each genomic chunk, 2) a course grid search, 3) training XGBoost on the results from the course grid search, and 4) retraining of the individual AEs given the XGBoost predictions, and 5) finally prediction. This is a HUGE training effort. Showing prediction runtimes and comparing those to the HMMs is inappropriate.

      We consider the prediction only during the runtime comparisons because only the prediction side is done by the enduser, whereas the computational burden remains on the developer side. For the HMMs, we included only the prediction time as well (excluded the time for data loading/writing, computing model parameters and HMM iterations). The pre-trained autoencoders, when distributed, can take as input any set of genetic variants to produce the output without any additional training or fine-tuning required.

      1. One well known problem for DNN based methods including AEs is out-of-sample prediction. While Figure 5 (missing a label by the way) sort of gets to this, I would have the authors compare prediction in genotypes from populations which are absent from the training set and compare that performance to HMMs. Both methods should suffer, but I'm curious as to whether the AEs are more robust than the HMMs to this sort of pathology.

      Our test datasets in Figures 4 and 5 are independent of the reference dataset. MESA, Wellderly, and HGDP are all independent datasets, never used for training, nor model selection. Only HRC was used as reference panel or for training, and ARIC was used for model selection during tuning. We included a statement in the methods clarifying this point.

      Reviewer #3 (Public Review):

      Over the last 15 years or so genotype imputation has been an important and widely-used tool in genetic studies, with methods based on Hidden Markov Models (HMMs) and reference panels emerging as the dominant approach. This paper suggests a new approach to genotype imputation based on denoising autoencoders (DAE), a type of neural network. This approach has two nice advantages over existing methods based on Hidden Markov Models (HMMs): i) once the DAE is trained on a reference panel the reference panel can be discarded, and users do not need access to the reference panel to use the DAE; ii) imputation using a DAE is very fast (training is slow, but this step is done upfront so users do not need to worry about it). The paper also presents data showing that the tuned DAE is competitive in accuracy with HMM methods.

      I have two main concerns.

      First, it is unclear to me whether the accuracy presented for the tuned DAE (eg Figure 3, Table 4) is a reliable reflection of expected future accuracy. This is because the tuning process was quite extensive and complex, and involved at least some of the datasets used in these assessments. While the paper correctly attempts to guard against overfitting and related issues by using separate Training, Validation and Testing data (p7), it seems that the Testing data were used in at least some of the development of the methods and tuning (eg p14, "A preliminary comparison of the best performing autoencoder..."; Figure 2 and Table 2, all involve the Testing data). Because of the complexity of the process by which the final DAE was arrived at it is unclear to me whether there is a genuine concern here, but it would seem safest and most convincing at this point to do an entirely independent test of the methods on genotype data sets that were not used at all up to this point.

      MESA, Wellderly, and HGDP were not used for training, nor for tuning, they are completely independent. So all the results showing these datasets are completely independent. Only HRC and ARIC were used for training and validation/tuning, respectively. We included a statement in the methods session clarifying this point.

      Moreover, HGDP in particular includes 828 samples from 54 different populations representing all continental populations and including remote populations like Siberia, Oceania, etc. This reference panel is described in more detail in the reference below and likely represents the most diverse human genome dataset available. Thus, we have externally validated generalizability on a dataset with much greater diversity than our training dataset:

      Bergström A, et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020 Mar 20;367(6484):eaay5012.

      Second, there is a potentially tricky issue of to what extent distributing a black box DAE trained on a reference sample is consistent with data sharing policies. Standards of data sharing have evolved over the last decade. Generally there currently seems to be little hesitation to publicly share "single-SNP summary data" such as allele frequency information from large reference panels, whereas sharing of individual-level genotype data is usually explicitly forbidden. It is not quite clear to me where sharing the fit of a DAE falls here, or how much information on individual genotypes the trained DAE contains. The current manuscript does not adequately address this issue.

      Currently there are no official data sharing restrictions on deep learning data. We are aware that future policies may rise, and we have started a collaboration with Oak Ridge National Laboratory to explore differential privacy techniques and privacy concerns for these autoencoders. Another point to consider is that the autoencoders segment the genome, making reconstruction of an individual genome impossible even if reference data were somehow recoverable from the neural networks. Regardless, this is an interesting and important point that should be addressed in the manuscript and we have added a paragraph discussing this point.

      Reviewer #4 (Public Review):

      In this manuscript, Dias et al proposed a novel genotype imputation method using autoencoders (AE), which achieves comparable or superior accuracy relative to the state-of-the-art HMM-based imputation methods after tuning. The idea is innovative and provides an alternative solution to the important task of genotype imputation. The authors also conducted some experiments using three different datasets as targets to showcase the value of their approach. The overall framework of the method is clearly presented but more technical details are needed. The results presented showed slight advantage of AE imputation after tuning but more comprehensive evaluations are needed. In particular, the authors didn't consider post-imputation quality control. The reported overall performance (R2 in the range of 0.2-0.6) seems low and inconsistent with the imputation literature.

      Overall, the method has potential but is not sufficiently compelling in its current form.

      We show average accuracy of 0.2-0.6 in Table 4, but that is the average R2 per variant across all variants (no MAF filtering or binning applied). The reviewer points that the accuracy should be R2>0.8, but this R2>0.8 refers to common variants only (allele frequency >1%), and we have shown r2>0.8 for these variants (Figure 4). The aggregate accuracy displayed in Table 4 is lower because the vast majority of variants fall below 1% allele frequency threshold.

      The references bellow demonstrate this issue and agree with our results:

      References:

      Rubinacci S, Delaneau O, Marchini J. Genotype imputation using the positional burrows wheeler transform. PLoS genetics. 2020 Nov 16;16(11):e1009049.

      McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016 Oct;48(10):1279.

      Vergara C, Parker MM, Franco L, Cho MH, Valencia-Duarte AV, Beaty TH, Duggal P. Genotype imputation performance of three reference panels using African ancestry individuals. Human genetics. 2018 Apr;137(4):281-92.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Rekler and Kalcheim examines the role of neural tube-derived retinoic acid (RA) in neural crest development. They observe that the onset of expression of the RA-synthesizing enzyme RALDH2 in the dorsal neural tube coincides with the end of neural crest production. The authors propose that this local source of RA is essential to activate the transcription of Bambi other BMP inhibitors, leading to the disruption of BMP signaling. Loss of BMP activity at the dorsal neural tube would halt neural crest production, leading to the establishment of the definite roof plate. Thus, precise temporal regulation of RALDH2 in the dorsal neural tube would dictate the timing of neural crest production and the segregation of PNS and CNS progenitors.

      Previous studies have already identified a role for RA in the control of the timing of neural crest production. MartinezMorales et al (JCB 2011) have shown that during early trunk development, mesoderm-derived RA works with FGF signaling to jumpstart the BMP/Wnt cascade that drives neural crest migration in the trunk. Rekler and Kalcheim choose to focused on a distinct function of RA at a later timepoint. The main contribution of the present study is the demonstration that - at later stages - RA produced by the neural tube has the opposite effect, acting to inhibit the BMP/Wnt cascade and halt neural crest production. Thus, RA would be a major regulator of the timing of neural crest production, acting to both trigger and repress neural crest migration.

      The study's strengths lie in an experimental strategy that allows the authors to manipulate RA function in a stagespecific manner and therefore uncover a later role for the signaling system in neural crest production. The authors also show that RA inhibition results in an incomplete fate switch and results in the generation of cells that share regulatory features of neural crest and roof plate cells. A significant limitation of the study is that the molecular mechanisms that endow RA signaling with stage-specific functions remain unknown. This is of particularly important since the early vs. late RA seem to have opposing effects, acting to either promote or terminate neural crest production.

      We thank this referee for her/his positive comments on our manuscript. We agree with the referee that a key question is understanding how RA signaling is differentially interpreted over time given its multistage activity in dorsal NT development.

      This is based on the following findings: Years ago, we uncovered that the balance between activities of BMP/Wnt and noggin in the dorsal NT trigger the onset of NC EMT. Martinez-Morales et al. strengthened our findings by reporting that a balance between somitic RA and FGF works on the reported BMP/Wnt modules to initiate the process. This group found that at gastrulation stages, RA is required for NC specification, as revealed by analysis of VAD quail embryos. Next, during somite formation, somitic RA is necessary for the onset of emigration of specified NC progenitors but at advanced somite stages it is dispensable for the subsequent maintenance of cell emigration. Presently, we find that RP-derived RA ends NC production. Together, this highlights a dynamic behavior of RA at 4 sequential stages of NC ontogeny. Clearly enough, the two first effects are mediated by an influence of RA dorsoventral patterning of the early NT, as distribution of ventral NT markers was strongly affected. In our case, RA from the nascent RP has no such effects suggesting that RP-derived RA acts at a post-patterning phase to specifically affect the dorsal NT.

      All things considered, we think that the problem is not simply a binary question of “opposing functions of RA signaling in starting or terminating NC production”. Instead, it is the understanding of a differential interpretation to the same morphogen by progenitor cells with changing states and at sequential stages.

      To the referee’s request, we begun addressing the question of how does RA inhibit BMP signaling close to the RP stage. To this end, we decided first to examine the temporal regulation of Raldh2 expression that is restricted to the RP stage, and is therefore a prerequisite for the late activity of RA. Whereas repressing RA activity extends the NC phase including the continuous transcription of Foxd3, Sox9 and Snail2 (Fig.3), we now found that extending the activity of each of these transcription factors close to the RP stage represses the onset of Raldh2 transcription in the nascent RP (new Fig. 9). We interpret these results to mean that as long as NC genes are active in the dorsal NT (NC stage), local Raldh2 and consequent RA synthesis in the NT does not take place, so Raldh2 in RP is repressed by NC-specific traits. The significance of these data is twofold: first, they explain the late onset of Raldh2 production at the RP stage. Second, since we also report the reciprocal result, that RA represses NC genes (Fig.3), we conclude that a cross repressive interaction exists between NC and RP-specific genes downstream of RA, being an emerging temporal property of the network. These data further indicate that the changing roles of RA throughout development of the dorsal neural primordium, largely depend on a different interpretation of the signal mediated by changing and mutually repressive codes.

      We have now presented these data in Fig.9. To clarify our thoughts further, we now provide a working model summarizing the effects of RA in NC to RP transition (Fig.10B).

      Our article uncovers for the first time and thoroughly documents, a role of local RA activity on the end of NC production and ensuing RP architecture. We believe that a comprehensive elucidation of the molecular mechanism responsible for inhibition of BMP signaling by local RA is the next obligatory step. We show in this study the selective activation of BMP inhibitors by endogenous RA and previously found that one of them, Hes/hairy, indeed inhibits BMP signaling and NC EMT (Nitzan et al, 2016). Therefore we propose that upregulation of BMP inhibitors by RA is a possible mechanism. However, we also predict that this is not the only one, and a deeper understanding of this problem is beyond the scope of the present study.

      Additional possibilities that fit with our data were now discussed: RA expression in somites vs. RP can be regulated by different enhancers and thus have distinct functions. For example, a specific enhancer driving expression of Raldh2 was found to be activated only at the definitive RP stage (Castillo et al., 2010). This enhancer contains Tcf binding sites and thus may be activated by Wnt signaling. In turn, as we show, RP-derived Raldh2 and resulting RA could negatively feed-back on Wnt signaling in the formed RP either directly or through BMP acting upstream of Wnt (now presented in Fig. 10B).

      Another possible scenario is that RA represses BMP signaling by inactivating Smad proteins via ubiquitination, as shown to be the case in selected cell lines (Sheng et al., 2010). These possibilities were discussed and await to be systematically explored.

      Comments:

      Previous studies have demonstrated that early RA production (presumably from the mesoderm) is necessary for the expression of early dorsal neural tube / neural crest genes like Pax7, Msx, Wnt1, and even BMP ligands. This is in contrast to the local source of RA, which presumably would be silencing these genes. Thus, mesoderm-derived RA would have the opposite effect in these progenitors than the RA synthesized in the neural tube. The study does not provide a mechanism that explains these stage-specific effects of the morphogen.

      As elaborated in our reply above to the general comment, we believe that RA whether emanating from somites or nascent RP, provides an initial signal that is later relayed upon target factors unique to each stage. It is possible that the precise source of factor plays a role; along this line we showed that somitic RA is dispensable for late events, reciprocally, there is no RA synthesis in the early NT that could affect NC cells. Having said that, there is RA activity in the NT at both stages and the output is still different. Hence, there should be more to this: In the revised version, we report that NC and RP-specific genes stand in a mutually repressive interaction downstream of RA, and this may contribute to the stage-specific effects of the morphogen.

      The effects of RA manipulation are often examined with non-quantitative techniques, like in situ hybridization (Fig. 2, 3). The incorporation of quantitative approaches (e.g., qPCR) would allow for the precise characterization of phenotypes (and better estimation of penetrance, etc.). Furthermore, the study lacks molecular/biochemical strategies to define the regulatory linkages between genes and pathways. This is a considerable limitation of the study since it prevents the establishment of a regulatory axis that would directly connect RA signaling to the BMP pathway.

      As the referee may notice, most genes examined are not restricted solely to the dorsal NT/RP domains. Since it is technically not accurate to isolate only the regions of interest for qPCR analysis, collecting entire NTs following unilateral or bilateral electroporations for qPCR would be highly inaccurate. In situ hybridization and immunohistochemistry provide a precise tool to assess the spatial localization of the transcripts/proteins of interest. To note is that in all cases examined, development of the color reactions was for the same length of time for control and experimental cases and photography was performed under identical conditions. Furthermore, in most cases, effects between treatments were dramatic, readily apparent at a qualitative level and easily quantifiable from ISH or fluorescent images.

      As to regulatory linkages between genes and pathways, the referee is correct; we do not demonstrate direct molecular interactions between the different players at the biochemical level. The present study provides a wealth of novel data connecting morphogens such as RA with BMP and Wnt activities, and those with a variety of downstream genes specific for either NC or RP stages. The next step will be to ask about the precise nature of the linkages between specific molecules/pathways.

      The function (and the regulation) of RALDH2 at the dorsal neural tube has been studied thoroughly, and RA is a known player in the dorsal-ventral patterning of the CNS. It is not clear to what extent the phenotypes observed by the authors are due to the disruption of a neural crest-intrinsic mechanism or if they are secondary to the overall changes in the cellular organization of the neural tube caused by loss of RA.

      This is a good point as RA is known to have multiple effects on NT development whose nature changes with stage. Available data emanating from young caudal neural plate explants and from VAD embryos that lack RA, showed that early RA signaling from developing somites is required for ventral patterning of the neural tube (motoneurons and V1, V2 interneurons) and for neuronal differentiation (Diez del Corral et al, 2003, Sockanathan and Jessel, 1998, Liu et al, 2001, Maden et al, 1996). These effects were shown to depend, at least partly, on antagonistic activities of RA and FGF in mesoderm which affect ventral, but not dorsal NT patterning (Diez del Corral et al, 2003). Our study focuses on a later stage when D-V neural tube patterning is already established.

      To address the referee’s comment, we now examined the effects of RA attenuation on expression of Pax7, a dorsal factor, and Hb9, a motoneuron-specific protein. We found that RARa403 does not affect the localization and/or extent of expression of Hb9, and causes only a mild 12% increase in the area of expression of Pax7. Consistent with these results, we also show in several figures that in the absence of RA signaling pSmad and Wnt activities, Foxd3, Snai2 and Sox9 expression patterns are prolonged in time but not in D-V extent.

      These data corroborate that the effects documented are directed to the dorsal NT and do not result from overall changes in D-V patterning. The data were now added as Fig.7 Supplementary 1.

      The authors rely solely upon overexpression constructs to manipulate the activity of the RA signaling pathway, which may be prone to artifacts. Furthermore, both overexpression constructs aim at inhibiting RA activity. This limits the impact of the work since there is no demonstration that RA is sufficient to activate BMP inhibitors and halt neural crest production.

      The tools we used to repress RA signaling consist of RARa403 that acts as a pan-dominant negative construct to abrogate receptor activity, and Cyp26A1, an enzyme that degrades RA. To activate RA signaling in a ligand- independent manner, we now implemented VP16-RAR-alpha in the revised version of this manuscript. All these tools are extensively and routinely employed in the literature in a variety of animal species and were shown to act in vivo as expected both by others and further confirmed by us in the present study. Having said that, we are currently optimizing the CRISPR-Cas9 method for gene editing of RA-specific genes and hope to succeed in the near future.

      We have now performed experiments to address the sufficiency of RA. Data were now added as Fig.5 Supplem 2 and 3 and Fig.6 Supp.2 .

      As we expected, gain of RA function at NC stages is not sufficient to prematurely activate BMP inhibitors like BAMBI, to end prematurely BMP signaling (pSMAD) or NC EMT, to alter the dynamics of expression of NC-specific genes, or to cause an earlier appearance of RP-specific traits. This is fully consistent with RA being active at NC stages when BMP/Wnt signaling, NC EMT, etc are operational. The fact that RA is necessary but not sufficient for these processes further suggest that the key is how NC cells at various stages of their ontogeny and then RP cells, differentially interpret the signal given the profound changes in cellular and molecular landscapes apparent between these stages.

      Reviewer #2 (Public Review):

      The manuscript presents a novel role for RA signaling during development as the mediator of the switch that occurs in the dorsal neural tube after the neural crest cells have migrated and the roof plate forms. The finding is interesting and novel as the events that take place at the end of neural crest stage are poorly understood. The strengths of the manuscript are that the study is well planned and executed to show the interesting phenotype of delayed/disturbed roof plate formation accompanied with prolonged neural crest stage caused by inhibition of RA signaling in the dorsal neural tube. The results also show that RA signaling marks the RP territory and inhibits the DI1 interneurons from invading the region. The results bring novel information to the field. The original finding of the involvement of RA in the process was revealed in a RNAseq screen comparison between the neural crest and the roof plate (which was recently published by the same lab). However, the current study doesn't use any new technology such as high throughput screens or high resolution or live imaging etc., but rather relies mainly on "old fashioned" techniques: electroporation to induce transient inhibition of RA signaling in the dorsal neural tube followed by analysis of the phenotype by using chromogenic in situ hybridization. The chosen techniques are sufficient to convincingly show the point the authors want to make and the study serves as a reminder that fancy new techniques are not necessarily a requirement for creating a solid story. The manuscript is also well written and easy to follow.

      We thank this referee for a very positive feedback on our study. Although we are always motivated by the implementation of new techniques, we agree that the primary goal is to answer a biologically meaningful question with suitable means.

      Finally, the manuscript links the activation of RA signaling to the decline of BMP signaling and specifically the upregulation of BMP inhibitors in the dorsal neural tube at the end of the NC stage, but in its current form the proof of this proposed link remains weak.

      Our article uncovers for the first time and thoroughly documents, a role of local RA activity on the end of NC production and ensuing RP architecture. We believe that a comprehensive elucidation of the molecular mechanism responsible for inhibition of BMP signaling by local RA is the next obligatory step. We show in this study the selective activation of BMP inhibitors by endogenous RA and previously found that one of them, Hes/hairy, indeed inhibits BMP signaling and NC EMT (Nitzan et al, 2016). Therefore we propose that upregulation of BMP inhibitors by RA is a possible mechanism. However, we also predict that this is not the only one, and a deeper understanding of this problem is beyond the scope of the present study.

      Additional possibilities that fit with our data were now discussed: RA expression in somites vs. RP can be regulated by different enhancers and thus have distinct functions. For example, a specific enhancer driving expression of Raldh2 was found to be activated only at the definitive RP stage (Castillo et al., 2010). This enhancer contains Tcf binding sites and thus may be activated by Wnt signaling. In turn, as we show, RP-derived Raldh2 and resulting RA could negatively feed-back on Wnt signaling in the formed RP either directly or through BMP acting upstream of Wnt (this was now presented in a working model in Fig. 10B).

      Another possible scenario is that RA represses BMP signaling by inactivating Smad proteins via ubiquitination, as shown to be the case in selected cell lines (Sheng et al., 2010). These possibilities were discussed and await to be explored systematically.

      Similarly, the manuscript does not address the consequences of exposure of RA to the dorsal neural tube during NC stage and it thus remains unknown whether RA signaling is sufficient to end the NC stage and activate roof plate formation prematurely. Additional experiments of this kind would help clarify the role of RA in the dorsal neural tube and the reciprocal roles of the two signaling pathways (RA and BMP).

      We have now performed experiments to address the sufficiency of RA. Data were now added as Fig.5 Supp.2 and Supp.3, and Fig.6 Supp.2, and discussed.

      As we expected, gain of RA function at NC stages is not sufficient to prematurely activate BMP inhibitors, to end prematurely BMP signaling (pSMAD) or NC EMT, to alter the dynamics of expression of NC-specific genes, or to cause an earlier appearance of RP-specific traits.

      This result is totally understandable in light of RA being anyway active (but not produced) in NT at NC stages (original Fig.1) when BMP/Wnt signaling, a NC-specific gene network, and NC EMT are operational.

      The fact that RA is necessary but not sufficient for these processes further suggests that the key is in the following, perhaps complementary mechanisms: 1) a different interpretation of the same signal by NC progenitors at sequential stages of their ontogeny and then by RP cells, accounted for by the profound changes in cellular and molecular landscapes apparent between these stages. 2) the possibility that somite-derived versus RP-derived RA are differentially interpreted by the dorsal NT cells owing, for example, to a distinctive mode of ligand presentation (e.g; by CRABP1 expressed in RP but not NC, etc).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses: 1. The paper lacks some clarification in some terms used in the paper such as "round chondrocytes". Additional histological data on the organ cultures would provide supporting evidence on the role of CNP in chondrocytes.

      In bone research fields, growth plate chondrocytes are generally classified into three major cell types; round, columnar, and hypertrophic chondrocytes, and “round chondrocytes” is generally used for specifying growth plate chondrocytes. We have briefly explained cells assessed in the Materials and Methods section of the revised manuscript.

      In response to the comment, we histologically analyzed cultured CNP-treated bones from the chondrocyte specific Trpm7-knockout (Trpm7fl/fl, 11Enh-Cre+/−) and control (Trpm7fl/fl, 11Enh-Cre−/−) mice. The new data obtained clearly indicate that CNP treatments extend the columnar chondrocytic zones in control bones but not in chondrocyte-specific Trpm7-knockout bones (Figure 7 in the revised manuscript). We further analyzed in detail CNP-treated metatarsal bones from wild-type mice. CNP treatments consistently expanded the columnar chondrocytic zones but did not affect the cell densities (Figure 7-figure supplement 1 in the revised manuscript). Moreover, CNP-promoted extracellular matrix production seemed to contribute largely to the extension of columnar chondrocyte zone toward bone outgrowth stimulation.

      1. The authors also generated NPR2 chondrocytes-specific null mouse mode. It would be interesting to include some data on the phenotype of these mice by micro-CT, histology, etc. More specific details are provided below.

      Overview of histological analysis in chondrocyte-specific Npr2-knockout (Npr2fl/fl, Col2a1-Cre+/−) mice has been previously reported (Nakao K., et al., Sci Rep., 5;10554, 2015). Since femoral bones from E17.5 mice were too fragile to analyze using micro-CT, we performed von Kossa-staining of femoral bones from the chondrocyte-specific Npr2-knockout (Npr2fl/fl, Col2a1-Cre+/−) and control (Npr2fl/fl, Col2a1-Cre−/−) mice instead. As reasonably expected, femoral bones from the knockout mice exhibited insufficient mineralization.

      The reviewer believe the authors achieved their aims, while some additional supporting data and clarifications are needed.

      The overall summary of the data provided is supportive, methods utilized are well justified, again, additional histological data would support the overall summary.

      Data generated and presented in the report will enhance our understanding on the role of CNP and possible utilization of novel therapeutic targets for CNP for the treatment of bone growth diseases.

      We thank the reviewer for his valuable and helpful comments.

      Reviewer #2 (Public Review):

      Strength: 1. Experiments are well designed, performed with proper control and conclusion drawn from data is highly convincing. 2. Valid animal model (floxed mice) using chondrocyte specific deletion of gene were used to show in vivo effect of gene deletion on bone formation. 3. Preclinical model of OA were used in vivo using mice as an experimental animal, OA progression were characterized using established grading system. 4. All the Western blots are shown along with densitometric quantification. 5. Ex vivo molecular mechanism is also provided which added strength in the manuscript.

      Methods, results and data interpretation: 1. Methods section is adequate and describes enough details to replicate in independent study. 2. Relevant statistics are used to infer conclusion form data. 3. There is no objective error in presenting the data, conclusions drawn from experiments are convincing.

      We thank the reviewer for the encouraging comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper provides experimental and modeling analysis of the inter-brain coupling of socially interacting bats, and reports that coordinated brain activity evolves at a slower time scale than the activity describing the differences. Specifically, the paper finds that there is an attracting submanifold corresponding to the mean (or "common mode") of neural activity, and that the dynamics in the orthogonal eigenmode, corresponding to the difference in brain activity, decays rapidly. These rapid decays in the difference mode are referred to as "catch up" activity.

      There are two main findings:

      1) Neural activity (especially higher frequency LFP activity in the 30-150Hz range) is modulated by social context. Specifically, the ratio of the averaged, moment-to-moment MEAN:DIFF ratio is much higher when the bats are in a single chamber, clearly indicating that the animals are coordinating their neural activity. This change also seems to hold -- although not as striking -- in lower-frequency LFP and spiking activity.

      2) The time scales of the mean vs. difference dynamics are segregated: the "difference dynamics" evolve at a faster time scale than "similarity dynamics", seems to be well supported.

      The basic finding is presented in Figure 1. The rest of the paper is focused on a modeling study to garner further insight into the dynamics.

      Weaknesses:

      This is an entirely phenomenological paper, and while it claims to garner "mechanistic insight", it is unclear what that means.

      We regret not clarifying sufficiently what we meant by “mechanistic insight.” The insight is the following: functional across-brain coupling acts as positive feedback to the mean component of neural activity, which amplifies it and slows it down; at the same time, it acts as negative feedback to the difference component, which suppresses it and speeds it up. Thus, findings (1) and (2) in the reviewer’s summary above can be explained by the same model mechanism. As the reviewer pointed out below, the details of the model are complex, which could have made the simple mechanism above opaque. Thus, we analyzed two simplified versions of the model to make the mechanistic insight clear. This is detailed below in our response to the reviewer’s comment on model complexity.

      The basic idea of the model is simple and somewhat interesting, but the details are extremely complex. There are many examples of this, but the method used to "regress out" the behavior was very hard to interpret.

      The method for regressing out behavior was described in Materials and Methods section 3.10, and we regret having neglected to reference it in the main text. We now reference it at the first instance in the main text where this is relevant.

      On the face of it, the model is extremely simple: a two-state linear dynamical system. However, this simplistic description buries extreme complexity. The model is extremely complex as involves a large number of parameters (e.g., time switching 'b' values, the values of which are completely unclear), the switching over time of these parameters based on hand-scored animal behavioral state, and the complex mix of markovian and linear dynamical systems theoretic results.

      As the reviewer pointed out, the core of the model is very simple: a linear dynamical system that models neural activity coupling. The model mechanism of positive and negative feedback, which is responsible for reproducing the two experimental results summarized by the reviewer above, is contained in this core (see Materials and Methods section 3.7 for details). On top of this, the model has a layer of complexity, involving a Markov chain model of behavior and a large number of behavioral parameters. This layer of complexity is independent from the feedback mechanism of the core of the model. Thus, while it makes the model more biologically realistic, it is not required to reproduce the two main experimental results. To explicitly show this, and to better understand the dependence of model behavior on its parameters, we analyzed two reduced versions of the model. The first reduced model replaces the behavioral inputs with white noise. The original model is , where a is neural activity, , is the coupling matrix, b is behavioral modulation, and τ is a time constant. b is where the complexity lies, as it is simulated using a Markov chain and involves many parameters. To strip away this layer of complexity, we replaced b with noise having a simple structure, namely, the mean and difference components of b having identical, flat power spectra. Importantly, this noise input does not induce correlation between bats, and it amounts to inputs of the same magnitude and same timescales to the mean and difference components of a. The resulting reduced model has only two parameters, the functional self-coupling C_S and functional across-brain coupling C_I (for simplicity, τ can be absorbed into the other parameters). We are interested in the two results the reviewer summarized above: (1) the mean component of neural activity having a larger variance than the difference component; (2) the mean component having a slow timescale than the difference component. In the manuscript, these are respectively quantified using the variance ratio and the power spectral centroid ratio of the mean and difference components. The reduced model allowed us to derive analytical expressions for these two quantities (see Materials and Methods section 3.8 for details). We found that they have very simple dependence on the functional coupling parameters: the variance ratio (mean variance divided by difference variance) is approximately , and the centroid ratio (mean centroid divided by difference centroid) is approximately .

      This parameter dependence is visualized below (note that the color maps are in log scale, and the white spaces are regions where the model is unstable).

      In the experimental data, the mean component had larger variance and lower power spectral centroid than the difference component. This corresponds to the parameter regime of (enclosed by dashed lines). Thus, a positive C_I acts as positive feedback to the mean component and negative feedback to the difference component, modulating their variance and timescales in opposite directions. This is consistent with the analysis of the original model in Materials and Methods section 3.7. In the revised manuscript, we’ve now added analysis of this reduced model to the Results section, and the above figure has been added as Figure 3I-J.

      The reviewer has stated a concern regarding the large number of parameters that set the input level according to behavioral state (b_resting, b_(social grooming), b_fighting, etc.). These parameters are important for ensuring that the model outputs realistic levels of behaviorally modulated neural activity (discussed below in our reply regarding model fit), but they are not important for the main results on variance and timescales. To demonstrate this, we studied a second reduced model. This model is identical to our original model except that, for each simulation, each of the behavior parameters (b_fighting, etc.) was independently drawn from the uniform distribution from 0 to 1. Despite the completely random behavioral parameters, this reduced model reproduces the variance and timescales results just like the original model, as shown in the figure below (compare with Figure 3E-F).

      To summarize, the reduced models allowed us to identify the simple parameter dependence of the modeling results, and showed that the simple linear dynamical system at the core of the original model is sufficient to reproduce the two main experimental observations.

      Indeed, a fundamental weakness of the model is that the Markov chain is taken as an "input" to the 2-state linear systems model, as if somehow the neural state does not affect the state transitions.

      Yes, this is a limitation of our model. We’ve added a discussion of this limitation, as well as future directions for overcoming it, in the Discussion section. The reason we did not model neural control of behavioral transitions is that it is under-constrained by existing data. While the brain obviously controls behaviors, not every part of the brain controls every behavior. Of the 11 behaviors observed in this study, we do not know which of them is controlled by the bat frontal cortex, and we do not know how they might be controlled (i.e., what specific spatiotemporal activity patterns affects behaviors in what ways). Without this knowledge, it’s unclear how to implement neural control of behavior in the model. This knowledge requires perturbation studies (lesion, inactivation, or activity manipulation) to establish casual relationships from neural activity to specific behaviors in the bat, which will be an important future direction.

      On the other hand, as the reviewer stated, our model included behavioral modulation of neural activity. It is well known that in mammals, arousal and movement modulate neural activity globally across cortex (McGinley et al., 2015, Neuron). Thus, given that different behaviors in general involve different levels of arousal and movement, our model included behavior-dependent modulation of frontal cortical neural activity. Finally, for the reviewer’s convenience, we also quote below the paragraph addressing this issue in the revised Discussion. “Another limitation of our model is the “open-loop” nature of the relationship between behavior and neural activity. Specifically, we modeled neural activity as being modulated by behavior, but behavior was modeled using a Markov chain that is independent from the neural activity. In reality, neural activity and behavior form a closed-loop, with different social behaviors being controlled by the neural activity of specific neural populations in specific brain regions. Thus, an important future direction is to close the loop by incorporating neural control of social behaviors into models of the inter-brain relationship in bats. This will require future experimental studies to identify which frontal cortical regions and populations in bats are necessary or sufficient to control social behaviors, as well as the detailed causal relationship from neural activity to social behavior. Furthermore, as social interactions can occur at multiple timescales, it will be interesting to investigate how these are controlled by neural activity at different timescales, and how those timescales are shaped by functional across-brain coupling. In summary, such a closed-loop model will shed light on how inter-brain activity patterns and dynamic social interactions co-evolve and feedback onto each other.”

      Further, the Markov assumption is not rigorously tested.

      We have now tested the Markov assumption, using the following methods. We compared three models of bat behaviors: (1) the independent model, where the behavioral state at a given time point is independent from the state at other time points; (2) the 1st-order dependency model, where the behavioral state at a given time point depends on the state at the previous time point only; (3) the 2nd-order dependency model, where the behavioral state at a given time point depends on the states at the two previous time points. The Markov assumption corresponds to model (2), which is used as a part of the main model of the paper. Note that models with longer time-dependencies (≥3) were not tested because the number of parameters grows exponentially with model order and our dataset is not large enough to fit them.

      To compare the three models, we split the behavioral data into a training set and a test set, fitted each model on the training set (Laplace smoothing was used to avoid assigning zero probability to unobserved events), and calculated the log-likelihood of the test set under each model. The figure below shows the cross-validated likelihoods for the behavioral data of one-chamber (A) and two-chambers (B) sessions, which were fitted separately; circles and error bars are means and standard deviations across 100 random splits of the data into training and test sets.

      As the figure above shows, the 1st-order model had the highest likelihood on average. This does not necessarily prove that bat behavior obeys the Markov assumption (if we had a lot more data, we might be able to fit better 2nd-order and higher-order models). But this does mean that, given the amount of data we have, the best model that we can fit is the 1st-order Markov chain. Thus, this result supports our usage of the Markov chain in the main model of the paper. In the revised manuscript, the above figure is included as Figure 3—figure supplement 2A-B, and the analysis is described in Materials and Methods section 3.5.

      No model selecting or other model validation appears to be done.

      To evaluate model fit, we simulated our model using experimentally observed behaviors (rather than simulating behaviors using a Markov chain), and compared the simulated neural activity with the experimentally observed activity (see Materials and Methods section 3.6 for detailed procedures). The comparison for an example experimental session is shown below, where we’ve plotted the experimentally observed neural activity and behaviors for bat 1 (A) and bat 2 (B), along with the simulated neural activity. The correlation coefficient between data and model are indicated above each plot. These are representative examples, as the average correlation over all sessions and bats is 0.72 (standard deviation is 0.10). This figure was added to the revised manuscript as Figure 3—figure supplement 1.

      In evaluating model fit, we realized that the model in the original manuscript produced outputs with a DC offset different from that of the data. Thus, in the revised manuscript (including the figure above), we added one more behavioral parameter (b_constant) that adjusts the DC offset, which is a parameter that reflects the effect of a baseline arousal level on neural activity (Materials and Methods section 3.4). Note that, since the only effect of this parameter is to adjust the DC offset of neural activity, it does not change any of the results in the paper.

      In short, the model, while very interesting, is so complex that it is literally impossible to evaluate. The authors report literally no shortcomings of their model. They do not report parameter estimation methods. They do not report fitting errors or other model validation metrics. The only evaluation is whether it can produce certain outputs that are similar to biological data. While the latter is certainly important, all models are wrong, and it essential to have a model simple enough to understand, both in terms of how it works and how it fails.

      The comments on the complexity of the model and on fitting errors have been addressed above. Regarding parameter estimation methods, they were described in Materials and Methods section 3.14, and we regret having neglected to directly reference it in the original manuscript. We now reference the section in the legend of Figure 3A which is the first place to introduce the parameters. Briefly, the behavioral parameters (b_resting, b_fighting, etc.) were simply chosen to be the average neural activity during the respective behaviors from the data; the other parameters were chosen by hand to roughly match the levels of activity from the data, keeping within the parameter regime of identified from the analyses. As we showed above, these parameters provide a reasonable fit to the data.

      The reason we chose the parameters heuristically in this way, rather than by minimizing some error objective, is the following. Our goal was to build a model that could qualitatively reproduce the experimental findings in a robust manner, that is, without fine-tuning of parameters. Thus, we analyzed the model to understand how model behaviors depend on the parameters, and to identify the parameter regime that reproduces the qualitative trends seen in the data (Figure 3I-J; Materials and Methods sections 3.7 and 3.8). Guided by these analyses, we chose parameters heuristically without algorithmic fine-tuning.

      Finally, following suggestions from reviewer 1 and reviewer 3, we have added discussions of shortcomings of the models (the last two paragraphs of the Discussion). With these discussions of model limitations, along with the presentation of simple insights into model mechanism from the reduced models above, we believe we have now presented a model that is “simple enough to understand, both in terms of how it works and how it fails.”

      In general, while the basic finding is fairly interesting, and the experiments and their findings are highly relevant to the field, the modeling and its explication fall short.

      It is not that it is wrong or bad; however, it is not clear that such a complex model increases our understanding beyond the experimental findings in Figure 1, and if it does, there has to be a major caveat that the model itself is not carefully vetted.

      Based on the reviewer’s comments on the model’s complexity, we have analyzed reduced versions of the model to understand its simple underlying mechanisms, as described above. This goes beyond the experimental findings in Figure 1, as it provides a computational mechanism that could give rise to those experimental findings. Moreover, based on the reviewer’s comments, we have more carefully vetted the model, by evaluating model fit and testing different behavioral models that assume or doesn’t assume the Markov property. Finally, we now discuss caveats of the model in the Discussion section, including the open-loop nature of the model as pointed out by the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      Yang. et al. use computational modeling to explore how neurons can co-regulate different properties (firing rate, excitability thresholds, energy consumption) by adjusting ion channel expression. To do so, they rely on the activity-dependent channel expression model introduced in O'Leary et al. 2014 and assume that any regulation loop can be boiled down to such a model. They thus propose a parallel feedback loop regulation model, in which each loop regulates a property by chasing its target value and regulating ion channel expression according to an integral control law.

      The authors start by proving experimentally and computationally ion channel pleiotropy. This preliminary analysis is clearly developed and confirms/rediscovers in CA1 neurons known facts originally observed in invertebrate systems like the STG. They subsequently use their regulation model to provide elegant geometric explanations for the emergence of ion channels correlations and for the success or failure of homeostatic regulation, as a function of the number of regulated properties and the number of ion channels.

      The model they rely on exhibits two basic characteristics. Suppose the model possesses N different types of ion channels.

      1.As in the model in O'Leary et al. 2014, when taken in isolation, the regulator of each neural property possesses an N-1 dimensional submanifold of steady-states in the N-dimensional maximal conductance space where the target for the regulated property is reached.

      2.Regulation loops are independent of each other.

      Given these two characteristics, in the presence of M regulated neural properties, the regulation target is reached simultaneously for each of them on an (N-M)-dimensional joint target submanifold of homeostatic steady states, i.e., the intersection of M (N-1)-dimensional target submanifolds. Depending on initial conditions, homeostatically regulated maximal conductances will spread out along this submanifold, thus creating N-M dimensional correlations. This (mathematically) elementary observation leads to the various general conclusions of the paper, the main of which are i) that increasing the number of regulated properties increase ion channel correlations (by reducing the dimension of the joint target submanifold) and ii) that increasing the number of regulated properties makes homeostatic regulation more likely to fail (when the intersection between the target submanifolds is empty). This simple geometric view on multi-property regulation is neat and, most importantly, experimentally verifiable/falsifiable.

      The main drawback of this approach is that characteristics 1 and 2 (above) of the model used in the paper are non-generic and makes the model a useful but maybe oversimplified (and fragile) testbed. Let me develop.

      Property 1 reflects one of the main limitations of the model in O'Leary et al. 2014, namely, that this model provides biologically meaningful results and predictions only when initial conditions are small and in the absence of any disturbance to its regulation dynamics (including the presence of multiple master regulators).This follows exactly from the fact that, in that model, the homeostatic regulator has zero effect once the state reaches the N-1 dimensional submanifold of target steady states. Thus, different initial conditions or exogenous disturbances will arbitrarily spread conductances on this submanifold, making it unrealistically unrobust to both types of perturbations.This limitation was overcome in a simple, biologically plausible way in Franci et al. 2020 by adding a molecular regulatory network between the homeostatic sensor and the regulated conductances. The revised model still exhibits the same correlated variability between maximal conductances as the original model. But only in the revised model the correlation ray exhibits biologically-meaningful levels of robustness both to disturbances and initial conditions. In dynamical systems terminology, the model in O'Leary et al. 2014 is structurally unstable (non generic) whereas its 2020 revision is structurally stable (generic). Crucially, in the 2020 model the homeostatic target is approximately reached at a unique steady state (i.e., the target submanifold is zero-dimensional) but the presence of a slow direction v_slow in the regulation space amplifies heterogeneities and disturbances, which leads to correlated variability (along v_slow).

      We have identified several issues, and will address each in turn.

      Initial conditions: Our results do not rely on initial conditions being small. This is first illustrated in Fig. 4 with the conditions immediately post-perturbation. Whereas O’Leary et al. (2014) presented results in the context of development (starting from nothing), we have focused on recovery from an acute perturbation (recovering from something). The statement that our model provides “biologically meaningful results and predictions only when initial conditions are small” is inaccurate – this depends on the question being asked. We use (a simplified version of) the O’Leary regulation mechanism to show that correlations arise in different ways depending on the dimensionality of solution space. This does not depend on initial conditions and, as explained in a new figure (Fig. 8), the dimensionality of solution space still has an important impact on the correlations that emerge under noisy conditions. We believe these to be meaningful results.

      Unrobustness / structural instability: Noise is an important, ongoing perturbation that we did not consider in the original version of our paper. Franci et al. (2020) show that noise disrupts solutions found by the simple regulation mechanism used by O’Leary et al. (2013, 2014). The basis for this – that the simple regulation mechanism brings conductance densities to the solution manifold but does not control their spread across the manifold – is now highlighted in our revised text (lines 208-210). As now shown in a new figure (Fig. 8), ion channel correlations are disrupted by noise, depending on solution space dimensionality, but other correlations arise. Importantly, our regulation mechanism successfully maintains regulated properties near their target values despite noise; in that respect, regulation is robust.

      Conductance densities do not remain / return to a restricted location on the solution manifold. In that sense, the solution set is not attractive, which, if we understand correctly, is equivalent to being unrobust, non-generic, and structurally unstable. Being unstable sounds bad, but from our perspective, so long as the system can regulate properties to their target values, the conductance density combinations used to do so are not a main concern (see Discussion, lines 310-314). Indeed, we highlight that conductance densities do not converge on the same restricted space if we control different ion channels (Fig 4) or if we control the same ion channels with different regulation rates (Fig 7). In fact, being too stable would reduce the hidden variability required to explain why neurons that appear similar respond differently to a perturbation, which is a concern. Furthermore, even if there is an attractive subspace, this could arise from regulation of other unaccounted for properties (which would reduce solution space dimensionality) rather than because of cooperative molecular interactions, which is to say that there are alternative explanations for attractive solution sets. Addressing such issues arguably extends beyond the scope of our study.

      Unbounded spread: Noise-induced spread would be a problem if it caused conductance densities to spread infinitely, but there are biological mechanisms (apart from cooperative molecular interactions) that prevent this. (i) Conductance densities cannot become negative. One conductance density hitting zero prevents other conductance densities from continuing to rise when solution space dimensionality is low (see Fig. 8C). In other words, a lower bound is sufficient to prevent infinite spread under certain conditions. (ii) Upper bounds undoubtedly emerge from saturation of rate-limiting steps controlling the transcription, translation and/or insertion of ion channels (lines 657-659, including reference 89). With upper and lower bounds on conductance densities, the solution space is bounded regardless of its dimensionality. When the solution manifold is bounded, conductance densities cannot spread infinitely. Newly added simulations also revealed, under noisy conditions, that conductance densities drift in different preferred directions depending on regulation rates (see Fig. 8–figure supplement 1). The direction of that drift can reduce the likelihood of conductance densities ever reaching an upper bound.

      In summary, we have verified that noise impacts the (stability of) solutions found by our simple homeostatic regulation mechanism. We have clarified how ion channel correlations in our model are affected. Unbounded spread is not a problem that necessitates cooperative molecular interactions. Other issues such as the initial conditions and attractiveness of the solution set (which are in fact connected) are interesting but not directly relevant to our study. We have struggled to grasp all the mathematical arguments presented in Franci et al. (2020) and were unable to implement that regulation mechanism in our model (see below), but we hope that our newly added simulations and edits to the text address concerns about Property 1.

      Characteristic 2 of the model used in this paper is also non-generic. Consider the simple homeostatic parallel control scheme,

      output = x+y

      tau_x*dx/dt = tgt - (x+y)

      tau_y*dy/dt = tgt - (x+y)

      which (in line with the present paper) reaches the desired target output tgt on the 1-dimensional subspace of steady states x+y=tgt. Let's now introduce small coupling between the two variables as follows

      output = x+y

      tau_xdx/dt = tgt - (x+y) - epsilony

      tau_ydy/dt = tgt - (x+y) - epsilonx

      where epsilon>0 is small. It is easy to verify that the new model has a unique exponentially stable steady state given by

      x = y; x+y= 2/(2+epsilon)*tgt ~ tgt (for epsilon sufficiently small)

      Thus, introducing an arbitrary small coupling between the two regulation loops is sufficient to change the dimension of the target subspace from 1 to zero (without introducing new regulated properties!) while only leading to a small (O(epsilon)) error in the regulated property. It is also easy to show that the uncoupled/parallel model is not structurally stable, while the weakly coupled model is.

      These observations lead to the question of whether the mathematical/computational results of the paper are realistic or whether they are artifacts of the non-generic modeling assumptions used for the regulation loops.

      We do not understand the rationale for coupling the feedback loops. Is the motivation for doing this (i) because of experimental evidence that feedback loops are coupled, or (ii) because coupling solves some problem? We suspect it is the latter, but we will try to address each issue in turn.

      (i) With respect to experimental evidence, firing rate and energy efficiency have different targets values (and different units) and the difference from target ought to be calculated separately. This relates to whether both error signals are encoded by calcium. As addressed at some length in our Discussion (see lines 332-351), we do not believe that all error signals are encoded by calcium and that feedback loops are coupled because of a common feedback signal. Even if multiple error signals are encoded by calcium, we suspect that such signals need to be functionally independent, e.g. spatially segregated and independently sensed (see lines 336-339), which would preclude the feedback loops from being coupled. We highlight evidence of other ways to encode error signals. If different error signals are encoded differently, it seems unlikely that feedback loops are coupled in the manner proposed here.

      (ii ) With respect to solving some problem by coupling the feedback loops, we return to points raised above in connection with Property 1. Our main concern in this study is whether outputs can be regulated to their target value. Newly added simulations demonstrate that this occurs with uncoupled feedback loops even in the presence of noise (Fig. 8). Accordingly, we really do not see the impetus for coupling the feedback loops, especially since the experimental evidence (based on our interpretation of it) points away from this (see above).

      The comment “without introducing new regulated properties” suggests that we considered additional regulated properties to stabilize the solution for property X. On the contrary, we considered additional regulated properties because we believe that neurons simultaneously regulate many properties. That regulating property Y stabilizes the solution for property X is an interesting observation, not a contrived explanation, and might obviate the need for cooperative molecular interactions (see above) if such a need exists. Insofar as uncoupled feedback loops can regulate multiple outputs, even in the presence of noise, coupling is unnecessary from our perspective and might actually compromise co-regulation. We are not starting from the assumption that the target subspace has dimension 0, and we are uncomfortable with that assumption (see above re. solutions being “too stable”). We nevertheless tried to adapt the suggested implementation, but it opened up a can of worms (see below).

      If the question is ultimately whether coupling is required to enable co-regulation of multiple properties under noisy conditions, the answer is no, as now shown in Figure 8C.

      Reviewer #2 (Public Review):

      Yang, Shakil, Ratté, and Prescott conducted a combined dynamic clamp and modeling exploration on the geometry and dimensionality of single-output and multi-output solution sets embedded in the cellular parameter spaces of single neurons, i.e. CA1 pyramidal neurons for the dynamic clamp studies, and a generic single-compartment model with several varieties of sodium and potassium channels for the computational studies. Both types of neurons were stimulated with a randomly fluctuating current, and output measures (termed 'properties') of the neurons were measured or calculated, including rheobase, firing rate, energy consumption, and energy efficiency per spike. Ion channel maximal conductances were then varied, and the dependence of the output properties on the conductance values and their combinations were explored.

      The authors define as a single-output solution set the subset of maximal conductance space containing conductance combinations that produce a value of one of the output properties within a tolerance range around a target value. These single-output solution sets can take the shape of points, curves, surfaces or volumes in parameter space. A multi-output solution set is then defined as the subset of parameter space that produces values within a tolerance range for multiple properties simultaneously, and thus lies at the intersection of several single-output solution sets (if such an intersection exists).

      A major focus of the work is on the effect of channel pleiotropy - the impact of one ion channel type on more than one cellular output property - on the shape of solution sets, and whether homeostatic regulation schemes that adjust ionic membrane conductances to maintain and restore output properties in a target range can "find" the solution sets, and maintain the neuron within the solution set. The regulation schemes explored in this work directly use error signals of the output properties (the difference between an output property produced by a neuron and its target value) to increase or decrease maximal membrane constants, with pre-assigned regulation time constants.

      The study is systematically executed, and results support the main conclusion, that successful regulation of n independent neuronal output properties requires that at least n ion channel densities be adjustable for a unique solution to exist, and at least n+1 for degenerate solutions. This conclusion explicitly organizes previous results obtained by other modeling studies into a coherent framework, but is not surprising.

      The authors speculate that the need for neurons to regulate several of their output features may have provided the evolutionary drive for the highly diverse sets of voltage-dependencies and kinetics observed in different ion channels in nature. This speculation is intriguing, but also raises the question whether and how the unrealistically simple (and not very diverse) set of model ion channel characteristics used in this work may have impacted the extent and shape of single-property and multi-property solution sets: none of the ion channels in the model appear to have inactivation variables, and the two primarily varied ionic currents, I_Na and I_K, are identical in their voltage-dependence and activation dynamics; they differ only in their reversal potentials. It is likely that channels with such similar characteristics are more able to compensate for each other, and therefore produce more extensive and differently shaped solution sets, than more dissimilar channels.

      Further exploration of the influence of solution set dimensionality on the existence and tightness of linear correlations between pairs of maximal conductances demonstrates that higher-dimensional solution sets, and larger tolerances around output property target values, lead to fewer and weaker correlations. This again is an insight that puts an organizing perspective on previous studies.

      Finally, the paper provides examples of conductance regulation schemes that rely on multiple error signals (deviations of output properties from their respective target values) to differentially regulate different membrane conductances. These schemes are shown to successfully regulate neuron models and allow them to "find" solution sets in many circumstances (if solutions exist). While providing a proof-of-principle that echoes previous work by others, the biological interpretability of these regulation schemes is somewhat limited, because they can not be tied back to how the molecular machinery in a neuron would implement them. For example, it is not clear how a neuron would measure its energy efficiency per spike.

      We thank the reviewer for the detailed and insightful summary of our paper. We completely agree with the shortcomings identified by the author with respect to specific molecular machinery. We kept our model deliberately simple because many details are not yet experimentally established, and it was not the goal of this study to uncover such details (which would have required very different experiments/simulations).

    1. Author Responses

      Reviewer 2 (Public Review):

      This work analyzes, for the first time, changes in information flow in developing dissociated neuronal cultures using their recently developed continuous-time transfer entropy (TE) estimator. This is a timely study, since the field of network and systems neuroscience critically needs better estimators for structural, effective and/or functional connectivity. Recent technical developments allow us to track hundreds, and even thousands of neurons during development (both in vitro and in vivo) in several organisms. However, current tools to assess changes in connectivity across time are severely lacking, and this study directly tackles this problem. Their method is the state of the art and appears to be extremely well suited to this task since it is able to deal with information flow across multiple time-scales and deals with the sparsity/multiple comparisons problem with strict statistical testing.

      The authors apply their transfer entropy estimator to a publicly available dataset (Wagenaar et al, 2006) consisting of multielectrode array (MEA) recordings from dissociated cortical cultures during development. The original dataset consists of over 50 cultures from 8 different batches (with different plating densities) recorded between DIV (day in vitro) 3 to 35. For this study the authors selected 4 cultures at 3-4 time points to claim that 1) Information flow undergoes a dramatic increase during development. 2) The spatial structure of information flow is “locked-in” early. 3) During bursting activity, nodes (neurons/electrodes) play a specialized role that is also “locked-in” early.

      The activity of dissociated cultures is highly heterogeneous, and an appropriate sample size is needed to assess the significance of any observed features or patterns. This is well described in the original work that provided the dataset used in this study “An extremely rich repertoire of bursting patterns during the development of cortical cultures”, (Wagenaar D.A., et al, 2006, BMC Neurosci). For example, in the discussion section they state “[...] that cross-plating variability was larger than variability between sister cultures implies that it is crucial to use cultures from several different platings to obtain unbiased results.” The current study uses only 4 cultures (from 2 different batches) recorded at 4 time points (sampled around 1 week apart on average) that might belong to the original categories of “fixed-bursting” and “superbursts”. In this work, results from these 4 cultures are often reported on a case-by-case basis, and sometimes without any statistical significance assessment and with unclear summary statistics. Given that, the validity and significance of the results is difficult to assess in their current form.

      We agree with the reviewer that clarifying summary statistics and statistical significance across cultures will improve the paper, and have addressed this as follows:

      The new Tables III and IV in the revised manuscript contain summary statistics of the results across the different cultures, including significance tests of these summary statistics. These tables are referred to and interpreted in the main results text.

      We further agree with the reviewer that the analysis will be improved by including more data.

      A major strength of the TE estimator framework developed by the authors is that it can account for the statistical significance of any TE estimate. However, it is unclear how this significance test is used throughout most of the results. Figures 1a and 3 to 8 appear to consistently include points with 0 TE that have an impact on the measured quantities, like means, quartiles and correlations.

      In all the analyses presented in the paper, whenever the null hypothesis of zero TE between a given source and target could not be rejected (that is, the TE estimate was not found to be statically significant), then the value of the TE between that source and target was set to zero for all subsequent analysis. It is important to still retain that zero value in analysis so as to see the overall trends in how the information flows or lack thereof change. For the means for example, if we were to remove an edge with zero TE from the mean on an earlier day, that would artificially inflate the earlier day’s mean in comparison to the mean information flow on a later day.

      This step in the analysis (setting non-significant values to zero and retaining them in analysis) was strongly implied in the original submission (for instance, see line 113 of the original submission). However, it was only explicitly stated as a step in the construction of the functional networks. This was an oversight on the part of the authors. We thank the reviewer for bringing this to our attention.

      We have added sentences explicitly stating that we used a value of zero whenever the TE was not significant on lines 135-137 and lines 181-183 of the revised manuscript.

      Additionally, the correlation plots across days (figures 3 to 7) include least-squares fits that are often dominated by what appear to be outliers in the data (or possibly non-significant TE values). The estimates of Spearman correlation might also suffer of a similar issue due to “ties”.

      We do agree that the plotted least squares fits appear at times to be dominated by outliers in the data: this is precisely the reason that we utilised Spearman instead of Pearson correlations for the quantitative analysis of the relationships here.

      Further, the Spearman correlation deals naturally with ties by taking the mean rank of all tied points [1].

      Regarding the “locked-in” information flow, evidence is always presented through Spearman correlations across TE scores at different days. These values are often not significant or show a weak correlation (Figures 3 and 4). An early “lock-in” of information flow would imply not only pairwise correlations, but also a long temporal correlation of a node (or edge) TE score across several days.

      As above, we have provided a more systematic analysis of summary statistics / statistical significance of the trends across all experiments. To clarify the point that we are making with respect to lock in: our argument is that once the information flows are established, they are then substantially correlated with flows on later days. The early recordings in this dataset have either none or very few statistically significant information flows: without such information flows established yet in these early recordings, we’re unable to observe longer correlations of them.

      We have added a brief discussion to the second last paragraph of section II E of the revised submission discussing the lack of information flows on these earlier days to be locked in.

      The study of information flow within bursts is really interesting. As the authors point out, TE appears to be well poised to measure this contribution, and their burst-local TE measure appears to be equivalent to other methods that condition TE estimates on population-wide activity levels, e.g., Stetter et al, PLOS Comp Biol, 2012. In here, they analyze the correlations between the burst-local TE measures and the burst position (in time) of a node.

      We thank reviewer for highlighting the work of Stetter et. al. which calculated the TE conditioned on bursting activity. As there are some similarities between that work and our burst-local TE, it is definitely worth highlighting where the similarities and differences between these approaches lie. We thank the reviewer for bringing this to our attention. We would point out, however, that Stetter et. al. extract the bursting activity and then calculate the (conditional) TE based on this bursting activity alone. By contrast, we are just extracting the contributions from the spikes that occur during bursts. The core difference comes down to how, for the burst-local TE, the non-bursting activity is still used in the estimation of log probability densities for the contributions of the spikes that occurred during bursts. In the Stetter et. al. approach, the non-bursting activity is ignored in the estimation of these densities.

      We have added a brief discussion of the work of Stetter et. al. to the end of section IV H of the revised submission.

      For the existence of time ordering the authors mention “[...] cultures often follow an ordered burst propagation [23, 36]”. But that does not appear to be a universal property of developing cultures. It is unclear whether the cultures used in this study show consistent temporally ordered bursting patterns. From the 2 cited references, in Maeda et al, the bursting pattern and temporal ordering changes from burst to burst (see Fig. 3). It is uncertain that an average “burst position” can be defined for any given node. Similarly, in Schroeter et al, there are several characteristic MUA patterns (Fig 4C), and even there, it might not be possible to define a consistent temporal ordering.

      With regard to the burst ordering, it was not our intention to imply that there is a consistent ordering in the burst propagation. Rather, our claim is that some nodes tend to burst earlier or later on average and that this average burst location is correlated with the burst-local information flow. Indeed, in Schroeter et. al., although they do point out in Fig 4C that there are several characteristic MUA patterns, in Fig 4B they plot the average “MUA flow profile”. From that figure, it is clear that some nodes exhibit a remarkably clear tendency to spike earlier in the burst propagation than other nodes. Indeed, they then base much of their further analysis on this fact by comparing “the relative mean of peak times in the propagation chain” (that is, the mean burst position) of each node with the node’s degree in the functional network. In terms of the Wagenaar dataset, by inspecting the plots in Figs 5a and 5c of the revised and original submission, where we plot the mean burst position vs the burst-local TE, we see that there is a wide dispersion in these mean positions. This indicates that certain nodes exhibit a clear tendency to burst either later or earlier in the burst propagation.

      The reviewer has highlighted to us that we have perhaps not sufficiently emphasized the fact that we are analyzing the mean burst position of what might be an inconsistent burst ordering. We thank the reviewer for bringing this to our attention.

      In the abstract, we have changed “spike ordering” to “average spike ordering”. In the author summary, we have changed “burst position” to “average burst position”. We have also made changes to line 70, as well as extensive changes to section IV D and some more minor changes to the 7th paragraph of the discussion.

      References

      [1] Jerome L Myers, Arnold D Well, and Robert F Lorch Jr. Research design and statistical analysis. Routledge, 2013.

    1. Author Response

      Reviewer #1 (Public Review):

      In Figure 1A, the authors should show TEM images of control mock treated samples to show the difference between infected and healthy tissue. Based on the data shown in Figure 1B-E that the overexpression of GFP-P in N. benthamiana leads to formation of liquid-like granules. Does this occur during virus infection? Since authors have infectious clones, can it be used to show that the virally encoded P protein in infected cells does indeed exist as liquid-like granules? If the fusion of GFP to P protein affects its function, the authors could fuse just the spGFP11 and co-infiltrate with p35S-spGFP1-10. These experiments will show that the P protein when delivered from virus does indeed form liquid-like granules in plants cells. Authors should include controls in Figure 1H to show that the interaction between P protein and ER is specific.

      We agree with the reviewer and appreciate the helpful suggestion. As suggested, we added TEM images of control mock treated barley leaves. We also carried out immune-electron microscope to show the presence of BYSMV P protein in the viroplasms. Please see Figure 1–Figure supplement 1.

      BYSMV is a negative-stranded RNA virus, and is strictly dependent on insect vector transmission for infecting barley plants. We have tried to fuse GFP to BYSMV P in the full-length infectious clones. Unfortunately, we could not rescue BYSMV-GFP-P into barley plants through insect transmission.

      In Figure 1H, we used a PM localized membrane protein LRR84A as a negative control to show LRR84A-GS and BYSMV P could not form granules although they might associate at molecular distances. Therefore, the P granules were formed and tethered to the ER tubules. Please see Figure 1–Figure supplement 4

      Data shown in Figure 2 do demonstrate that the purified P protein could undergo phase separation. Furthermore, it can recruit viral N protein and part of viral genomic RNA to P protein induced granules in vitro.

      Because the full-length BYSMV RNA has 12,706 nt and is difficult to be transcribed in vitro, we cannot show whether the BYSMV genome is recruited into the droplets. We have softened the claim and state that the P-N droplets can recruit 5′ trailer of BYSMV genome as shown in Figure 3B. Please see line 22, 177 and 190.

      Based on the data shown in Figure 4 using phospho-null and phospho-mimetic mutants of P protein, the authors conclude that phosphorylation inhibits P protein phase separation. It is unclear based on the experiments, why endogenous NbCK1 fails to phosphorylate GFP-P-WT and inhibit formation of liquid-like granules similar to that of GFP-P-S5D mutant? Is this due to overexpression of GFP-P-WT? To overcome this, the authors should perform these experiments as suggested above using infectious clones and these P protein mutants.

      As we known, phosphorylation and dephosphorylation are reversible processes in eukaryotic cells. Therefore, as shown in Figure 5B and 6B, the GFP-PWT protein have two bands, corresponding to P74 and P72, which represent hyperphosphorylation and hypophosphorylated forms, respectively. Only overexpression of NbCK1 induced high ratio of P74 to P72 in vivo, and then abolished phase separation of BYSMV.

      In Figure 5, the authors overexpress NbCK1 in N. benthamiana or use an in vitro co-purification scheme to show that NbCK1 inhibits phase separation properties of P protein. These results show that overexpression of both GFP-P and NbCK1 proteins is required to induce liquid-like granules. Does this occur during normal virus infection? During normal virus infection, P protein is produced in the plant cells and the endogenous NbCK1 will regulate the phosphorylation state of P protein. These are reasons for authors to perform some of the experiments using infectious clones. Furthermore, the authors have antibodies to P protein and this could be used to show the level of P protein that is produced during the normal infection process.

      We detected the P protein existed as two phosphorylation forms in BYSMV-infected barley leaves, and λPPase treatment decreased the P44 phosphorylation form. Therefore, these results indicate that endogenous CK1 cannot phosphorylate BYSMV P completely.

      Based on the data shown in Figure 6, the authors conclude that phase separated P protein state promotes replication but inhibits transcription by overexpressing P-S5A and P-S5D mutants. To directly show that the NbCK1 controlled phosphorylation state of P regulates this process, authors should knockdown/knockout NbCK1 and see if it increases P protein condensates and promote recruitment of viral proteins and genomic RNA to increase viral replication.

      In our previous studies, BLAST searches showed that the N. benthamiana and barley genomes encode 14 CK1 orthologs, most of which can phosphorylated the SR region of BYSMV P. Therefore, it is difficult to make knockdown/knockout lines of all the CK1 orthologues. Accordingly, we generated a point mutant (K38R and D128N) in HvCK1.2, in which the kinase activity was abolished. Overexpression of HvCK1.2DN inhibit endogenous CK1-mediated phosphorylation of BYSMV P, indicating that HvCK1.2DN is a dominant-negative mutant.

      It is important to note that both replication and transcription are required for efficient infection of negative-stranded RNA viruses. Therefore, our previous studies have revealed that both PS5A and PS5D are required for BYSMV infection. Therefore, expression of HvCK1.2DN in BYSMV vector inhibit virus infection by impairing the balance of endogenous CK1-mediated phosphorylation in BYSMV P.

      Reviewer #2 (Public Review):

      The manuscript by Fang et al. details the ability of the P protein from Barley yellow striate mosaic virus (BYSMV) to form phase-separated droplets both in vitro and in vivo. The authors demonstrate P droplet formation using recombinant proteins and confocal microscopy, FRAP to demonstrate fluidity, and observed droplet fusion. The authors also used an elaborate split-GFP system to demonstrate that P droplets associate with the tubulur ER network. Next, the authors demonstrate that the N protein and a short fragment of viral RNA can also partition into P droplets. Since Rhabdovirus P proteins have been shown to phase separate and form "virus factories" (see https://doi.org/10.1038/s41467-017-00102-9), the novelty from this work is the rigorous and conclusive demonstration that the P droplets only exist in the unphosphorylated form. The authors identify 5 critical serine residues in IDR2 of P protein that when hyper-phosphorylated /cannot form droplets. Next, the authors conclusively demonstrate that the host kinase CK1 is responsible for P phosphorylation using both transient assays in N. benthamiana and a co-expression assay in E. coli. These findings will likely lead to future studies identifying cellular kinases that affect phase separation of viral and cellular proteins and increases our understanding of regulation of condensate formation. Next, the authors investigated whether P droplets regulated virus replication and transcription using a minireplicon system. The minireplicon system needs to be better described as the results were seemingly conflicting. The authors also used a full-length GFP-reporter virus to test whether phase separation was critical for virus fitness in both barley and the insect vector. The authors used 1, 6-hexanediol which broadly suppresses liquid-liquid phase separation and concluded that phase separation is required for virus fitness (based on reduced virus accumulation with 1,6 HD). However, this conclusion is flawed since 1,6-hexanediol is known to cause cell toxicity and likely created a less favorable environment for virus replication, independent of P protein phase separation. These with other issues are detailed below:

      1. In Figure 3B, the authors display three types of P-N droplets including uniform, N hollow, and P-N hollow droplets. The authors do not state the proportion of droplets observed or any potential significance of the three types. Finally, as "hollow" droplets are not typically observed, is there a possibility that a contaminating protein (not fluorescent) from E. coli is a resident client protein in these droplets? The protein purity was not >95% based on the SDS-PAGE gels presented in the supplementary figures. Do these abnormalities arise from the droplets being imaged in different focal planes? Unless some explanation is given for these observations, this reviewer does not see any significance in the findings pertaining to "hollow" droplets.

      Thanks for your constructive suggestions. We removed the "hollow" droplets as suggested. We think that the hollow droplets might be an intermediate form of LLPS. Please see PAGE 7 and 8 of revised manuscript.

      1. Pertaining to the sorting of "genomic" RNA into the P-N droplets, it is unlikely that RNA sorting is specific for BYSMV RNA. In other words, if you incubate a non-viral RNA with P-N droplets, is it sorted? The authors conclusion that genomic RNA is incorporated into droplets is misleading in a sense that a very small fragment of RNA was used. Cy5 can be incorporated into full-length genomic RNAs during in vitro transcription and would be a more suitable approach for the conclusions reached.

      Thanks for your constructive suggestions. Unfortunately, we could not obtain the in vitro transcripts of the full-length genomic RNAs (12706 nucleotides). We have softened the claim and state that the P-N droplets can recruit the 5′ trailer of BYSMV genome as shown in Figure 3B. Please see line 22, 177 and 190.

      According to previous studies (Ivanov, et al., 2011), the Rhabdovirus P protein can bind to nascent N moleculaes, forming a soluble N/P complex, to prevent from encapsidating cellular RNAs. Therefore, we suppose that the P-N droplets can incorporate viral genomic RNA specifically.

      Reference: Ivanov I, Yabukarski F, Ruigrok RW, Jamin M. 2011. Structural insights into the rhabdovirus transcription/ replication complex. Virus Research 162:126–137. DOI: https://doi.org/10.1016/j.virusres.2011.09.025

      1. In Figure 4C, it is unclear how the "views" were selected for granule counting. The methods should be better described as this reviewer would find it difficult to select fields of view in an unbiased manner. This is especially true as expression via agroinfiltration can vary between cells in agroinfiltrated regions. The methods described for granule counting and granule sizes are not suitable for publication. These should be expanded (i.e. what ImageJ tools were used?).

      We agree with the reviewer that it is important to select fields of view in an unbiased manner. We selected the representative views and provided large views in the new Supplement Figures. In addition, we added new detail methods in revision. Please see Figure 4–Figure supplement 1, Figure 5–Figure supplement 1, and method (line 489-498).

      1. In Figure 4F, the authors state that they expected P-S5A to only be present in the pellet fraction since it existed in the condensed state. However, WT P also forms condensates and was not found in the pellet, but rather exclusively in the supernatant. Therefore, the assumption of condensed droplets only being found in the pellet appears to be incorrect.

      Many thanks for pointing this out. This method is based on a previous study (Hubstenberger et al., 2017). The centrifugation method might efficiently precipitate large granules more than small granules. As shown in Figure 4B, GFP-PS5A formed large granules, therefore GFP-PS5A mainly existed in the pellet. In contrast, GFP-PWT only existed in small granule and fusion state, thus most of GFP-PWT protein was existed in supernatant, and only little GFP-PWT protein in the pellet. These results also indicate the increased phase separation activity of GFP-PS5A compared with GFP-PWT. Please see the new Figure 4F.

      Reference: Hubstenberger A, Courel M, Benard M, Souquere S, Ernoult-Lange M, Chouaib R, Yi Z, Morlot JB, Munier A, Fradet M, et al. 2017. P-Body Purification Reveals the Condensation of Repressed mRNA Regulons. Molecular Cell 68(1): 144-157 e145.

      1. The authors conclude that P-S5A has enhanced phase separation based on confocal microscopy data (Fig S6A). The data presented is not convincing. Microscopy alone is difficult for comparing phase separation between two proteins. Quantitative data should be collected in the form of turbidity assays (a common assay for phase separation). If P-S5A has enhanced phase separation compared to WT, then S5A should have increased turbidity (OD600) under identical phase separation conditions. The microscopy data presented was not quantified in any way and the authors could have picked fields of view in a biased manner.

      Thanks for your constructive suggestions. As suggested, turbidity assays were performed to show both GFP-PWT and GFP-PS5A had increased turbidity (OD600) compared with GFP. Please see Figure 4–Figure supplement 3.

      1. The authors constructed minireplicons to determine whether mutant P proteins influence RNA replication using trans N and L proteins. However, this reviewer finds the minireplicon design confusing. How is DsRFP translated from the replicon? If a frameshift mutation was introduced into RsGFP, wouldn't this block DsRFP translation as well? Or is start/stop transcription used? Second, the use of the 2x35S promoter makes it difficult to differentiate between 35S-driven transcription and replication by L. How do you know the increased DsRFP observed with P5A is not due to increased transcription from the 35S promoter? The RT-qPCR data is also very confusing. It is not clear that panel D is only examining the transcription of RFP (I assume via start/stop transcription) whereas panel C is targeting the minireplicon.

      Thank you for your questions and we are sorry for the lack of clarity regarding to the mini-replicon vectors. Here, we updated the Figure supplement 14 to show replication and transcription of BYSMV minireplicon, a negative-stranded RNA virus derivative. In addition, we insert an A after the start codon to abolish the translation of GFP mRNA, which allow us to observe phase separation of GFP-PWT, GFP-PS5A, and GFP-PS5D during virus replication. Use this system, we wanted to show the localization and phase separation of GFP-PWT, GFP-PS5A, and GFP-PS5D during replication and transcription of BYS-agMR. Please see Figure 6–Figure supplement 1.

      1. Pertaining to the replication assay in Fig. 6, transcription of RFP mRNA was reduced by S5A and increased by S5D. However, the RFP translation (via Panel A microscopy) is reversed. How do you explain increased RFP mRNA transcription by S5D but very low RFP fluorescence? The data between Panels A, C, and D do not support one another.

      Many thanks for pointing this out! We also noticed the interesting results that have been repeated independently. As shown the illustration of BYSMV-agMR system in Figure 6–Figure supplement 1, the relative transcriptional activities of different GFP-P mutants were calculated from the normalized RFP transcript levels relative to the gMR replicate template (RFP mRNA/gMR), because replicating minigenomes are templates for viral transcription.

      Since GFP-PS5D supported decreased replication, the ratio of RFP mRNA/gMR increased although the RFP mRNA of GFP-PS5D is not increased. In addition, the foci number of GFP-PS5D is much less than GFP-PWT and GFP-PS5A, indicating mRNAs in GFP-PS5D samples may contain aberrant transcripts those cannot be translated the RFP protein. In contrast, mRNAs in GFP-PS5A samples are translated efficiently. These results were in consistent with our previous studies using the free PWT, PS5A, and PS5D.

      Reference: Gao Q, et al. 2020. Casein kinase 1 regulates cytorhabdovirus replication and transcription by phosphorylating a phosphoprotein serine-rich motif. The Plant Cell 32(9): 2878-2897.

      1. The authors relied on 1,6-hexanediol to suppress phase separation in both insect vectors and barley. However, the authors disregarded several publications demonstrating cellular toxicity by 1,6-hexanediol and a report that 1,6-HD impairs kinase and phosphatase activities (see below). doi: 10.1016/j.jbc.2021.100260,

      We agree with the reviewer that 1, 6-hexanediol induced cellular toxicity. Therefore, we removed these results, which does not affect the main conclusion of our results.

      1. The authors state that reduced accumulation of BYSMV-GFP in insects and barley under HEX treatment "indicate that phase separation is important for cross-kingdom infection of BYSMV in insect vectors and host plants." The above statement is confounded by many factors, the most obvious being that HEX treatment is most likely toxic to cells and as a result cannot support efficient virus accumulation. Also, since HEX treatment interferes with phosphorylation (see REF above) its use here should be avoided since P phase separation is regulated by phosphorylation.

      We agree with the reviewer that 1, 6-hexanediol induced cellular toxicity and hereby affected infections of BYSMV and other viruses. In addition, 1, 6-hexanediol would inhibit LLPS of cellular membraneless organelles, such as P-bodies, stress granules, cajal bodies, and the nucleolus, which also affect different virus infections directly or indirectly. Therefore, we removed these results, which does not affect the main conclusion of our results.

      Reviewer #3 (Public Review):

      Membrane-less organelles formed through liquid-liquid phase separation (LLPS) provide spatiotemporal control of host immunity responses and other cellular processes. Viruses are obligate pathogens proliferating in host cells which lead their RNAs and proteins are more likely to be targeted by immune-related membrane-less organelles. To successfully infect and proliferate in host cells, virus need to efficiently suppressing the immune function of those immune-related membrane-less organelles. Moreover, viruses also generate exogenous membrane-less organelles/RNA granules to facilitate their proliferation. Accordingly, host cells also need to target and suppress the functions of exogenous membrane-less organelles/RNA granules generated by viruses, the underlying mechanisms of which are still mysterious.

      In this study, Fang et al. investigated how plant kinase confers resistance against viruses via modulating the phosphorylation and phase separation of BYSMV P protein. They firstly characterized the phase separation feature of BYSMV P protein. They also discovered that droplets formed by P protein recruit viral RNA and other viral protein in vivo. The phase separation activity of P protein is inhibited by the phosphorylation on its intrinsically disordered region. Combined with their previous study, this study demonstrated that host casein kinase (CK1) decreases the phase separation of P protein via increasing the phosphorylation of P protein. Finally, the author claimed that the phase separation of P protein facilitates BYSMV replication but decreases its transcription. Taking together, this study uncovered the molecular mechanism of plant regulating viral proliferation via decreasing the formation of exogenous RNA granules/membraneless organelles. Overall, this paper tells an interesting story about the host immunity targeting viruses via modulating the dynamics of exogenous membraneless organelles, and uncovers the modulation of viral protein phase separation by host protein, which is a hotspot in plant immunity, and the writing is logical.

      Thanks for your positive comment on our studies.

    1. Author Response

      Reviewer #1 (Public Review):

      Yang, Bhoo-Pathy, Brand et al detail their investigation of a large Swedish cohort compared with age matched controls to estimate the risk of short- and long-term cardiotoxicities of breast cancer therapies in a general breast cancer patient population. They find that breast cancer patients are at significantly increased risk of developing arrhythmia and heart failure both within the first year of cancer diagnosis as well as at least 10 years after. Interestingly, they find that there is an increased risk of ischemic heart disease within the first year after diagnosis, but no increased risk of ischemic heart disease in the long term.

      The authors should be commended for this large cohort study that achieves its goal of identifying the incidence and hazard ratio of cardiotoxicity associated with breast cancer treatment within a general breast cancer population. Their findings of increased risk of heart failure in patients treated with anthracyclines and trastuzumab is consistent with multiple prior studies in the field of cardio-oncology and adds to the validity of the data.

      The finding that there is only a slightly increased (and statistically insignificant) risk of ischemic heart disease after left sided radiotherapy is quite interesting, and as noted by the authors, differs from prior understandings about risk of ischemic heart disease associated with breast radiation therapy. Without data on mean heart dose or total radiation administered the results are hypothesis generating, but should not be utilized to guide medical decision making.

      One of the major limitations of this study is that the authors' goal is to identify the incidence and risk of cardiotoxicity associated with the various breast cancer treatment regimens and determine these risks over time, and as noted by the authors, the registry utilized only includes planned treatment not whether patients did receive this therapy (and what dose of therapy). This is a key point that should be emphasized when interpreting the results.

      As noted by the reviewer, the Stockholm-Gotland Breast Cancer Register only included the intended treatment without a detailed dosage of the therapy. However, the agreement between intended and administrated treatment was about 95% in Sweden (Löfgren,L et, al BMC Public Health. 2019). We have now further explained this in the discussion section.

      In Discussion: “Overall, our results indicate only small risk of heart disease due to radiotherapy in women treated in Sweden after year 2000. Further studies with detailed information on the mean heart dose of radiation or total cumulative radiation dose administered are therefore needed to confirm and provide more context to this finding.”

      In Discussion: “Besides, the Stockholm-Gotland Breast Cancer Register only records intended treatment, not whether patients actually received these therapies. However, the agreement between the intended and administered breast cancer treatment in Sweden has been previously reported to be about 95% (Löfgren et al., 2019).”

      There are several conclusions included in the discussion section that are not supported by the data from the results section and the authors should be careful to suggest mechanisms of cardiotoxicity from an observational population-based study. Examples include suggesting anthracyclines cause cardiotoxicity of the myocardium but not the cardiac vessels; attributing early increased risk of ischemic heart disease to emotional distress alone; and that inhibition of HER2 receptors in myocytes may explain cardiotoxicity caused by trastuzumab. These are interesting hypotheses that would be better supported by references to lab/animal model studies.

      We thank the reviewer for the suggestions and have now added the reference for the suggested mechanisms of cardiotoxicity with lab/animal model studies in the discussion section.

      In Discussion: “As the long-term risk was observed for heart failure but not ischemic heart disease, the cardiotoxic effect of chemotherapy might be mainly on the myocardium mediated by the effect of DNA double-strand breaks through topoisomerase (Top) 2β, but not the cardiac vessels. (Lyu et al., 2007)”

      In Discussion: “The finding that risk of ischemic heart disease in breast cancer patients was only transiently elevated after diagnosis is not unexpected, considering the emotional distress of dealing with a new cancer diagnosis in the patients, which may lead to higher short-term rates of ischemic heart disease (Fang et al., 2012; Schoormans, Pedersen, Dalton, Rottmann, & van de Poll-Franse, 2016). In addition, surgery after breast cancer diagnosis might increase the risk of arterial thromboembolism (Gervaso, Dave, & Khorana, 2021), which includes myocardial infarction, and the effect appears to attenuate one year after diagnosis. (Navi et al., 2017; Navi et al., 2019).”

      In Discussion: “The cardiotoxic effect of trastuzumab meanwhile may be explained by inhibition of the HER2 receptors in myocytes, that activates the mitochondrial apoptosis pathway through modulation of Bcl-xL and -xS, which regulates cell development and growth (Grazette et al., 2004; Yeh & Bickford, 2009)”

      The authors succeed in highlighting the increased risk of cardiotoxicity associated with breast cancer treatment in the observed patient population. Rather than exploring the mechanism of cardiotoxicity for the treatment regimens observed, the data presented may be more useful to propose a longitudinal cardiac monitoring schedule for patients who have been treated for breast cancer, and who the current data suggest, are at long term risk for heart failure and arrhythmia.

      As we found increased long-term risk of heart failure in breast cancer patients, especially for those treated with Anthracyclines +Taxanes and Trastuzumab, we therefore suggest for a prolonged longitudinal cardiac monitoring schedule for ten or more years in these treated patients. We have added the suggestion in the discussion section.

      In Discussion: “Analysis by time since diagnosis revealed long-term increased risks of arrhythmia and heart failure following breast cancer diagnosis, suggesting that a longitudinal cardiac monitoring schedule might be helpful to improve cardiac health in breast cancer patients.”

      Reviewer #2 (Public Review):

      This is a registry based study in which patients diagnosed with locoregional breast cancer ( stage 1-111) from 2001-2008, between the ages of 25-75 were compared to a randomly sampled cohort of 10 women matched by the year of birth and for three specific cardiac conditions as outlined in the key objective. Data was gathered by cross referencing Subject's unique identification numbers in Swedish Cancer Register, Patient Register, Cause of Death, and Migration Register. Prescribed Drug Register was reviewed to gather information about prescribed medication to perhaps infer the medical comorbid conditions for which medication was prescribed. Breast cancer treatment specific information was missing in cases and presumption of use of Anti Her2 therapy was made based on HER2 neu status in some cases. While the primary objective of the study to show increased evidence primarily Heart failure and arrythmias seem to have been met in this patient registry based study, there is some question of the specificity of the data since it was gathered from the various registers and is subject to operator dependent biases.

      Strengths: Study is a long term follow up of patients treated with potential cardiotoxic drugs, confirming the previously known association of specific heart disease to the use of these drugs. Longest follow up seems to be for 16 yrs for the earliest cohort of 2001 and minimum approximately 10 yrs for the cohort of 2008. This study does confirm that long term risk that remains even after the treatment is completed and potentially suggests that more robust cardiac function monitoring guidelines for survivors may be warranted.

      Weaknesses: This is a patient register based study. As outlined above, data was extracted by cross referencing various patient registers. Since the data was dependent on the ICD codes entered in the patient register, there seems to be potential for missed information.

      The Swedish Patient Register has quite high validity for the heart diseases analyzed in this study, with a positive predictive value between 88%-98%, by using the main diagnosis in the register. However, it is still possible that we have missed some information for heart disease and we have emphasized this limitation in the discussion section.

      In discussion: “The Swedish Patient Register has high validity for heart failure, arrhythmia and ischemic heart disease (with positive predictive value between 88%-98%) (Hammar et al., 2001; Ludvigsson et al., 2011), by analysing main diagnoses only. However, misclassification of heart diseases may still have occurred.”

      Preexisting comorbidities were also extracted through Patient Registers hence may be subject to same potential for missed information.

      The Swedish Patient Register has relatively high validity for the majority of comorbid diseases. However, patients without severe symptoms of the diseases might be treated in the primary health care centers, which were not included in the patient register. We have therefore pointed out this limitation in the discussion section.

      In discussion: “In addition, preexisting comorbidities extracted from the patient registers may not include those patients with slight symptoms.”

      In addition, information for use of Trastuzumab was extrapolated from the Her2neu status of the patient when such information may not have been accessible through Prescribed Drug Registers.

      As the majority of HER-2 positive patients were treated in the clinics, the Swedish Prescribed Drug Register does not register their information. Because ~90% of HER-2 positive cancers were treated with trastuzumab between 2005 and 2008 in the Stockholm-Gotland region, we therefore used HER-2 positivity as a proxy for trastuzumab treatment. We have now further explained this in the methods section.

      In Materials and Methods: “As ~90% of HER-2 positive cancers were treated with trastuzumab between 2005 and 2008 in the Stockholm-Gotland region and the Swedish Prescribed Drug Register does not cover data on treatment with trastuzumab, HER-2 positivity was used as a proxy when no registry data on trastuzumab was available during this time period (30% of the HER-2 positive patients had missing information on trastuzumab).

      It is also unclear if there was any protocol in place for cardiac monitoring for patients receiving cardiotoxic chemotherapy or Anti Her2neu agents.

      In Sweden, there is no cardiac monitoring for chemotherapy in routine clinical practice. For HER2-therapy, cardiac monitoring with a thorough cardiac assessment prior to treatment, including history, physical examination, and determination of left ventricular ejection fraction before, during and right after treatment has been mandatory since introduction in clinical routine. We have now added this information to the discussion.

      In discussion: “As there is no cardiac monitoring for chemotherapy in routine clinical practice and cardiac assessment is only performed prior to and during the treatment period for HER-2 positive patients in Sweden, a longer-term cardiac monitoring program might be helpful for these patients.”

      Reviewer #3 (Public Review):

      This matched analysis uses data from patients newly diagnosed with breast cancer the Stockholm-Gotland Breast Cancer Register and data from patients in the general female population in Sweden to ask the question of whether breast cancer diagnosis (and subsequent treatments of breast cancer) is associated with an increased rate of heart disease after treatment. It is impossible to answer this question in a randomized controlled setting and would be unethical to randomize patients to not be treated for their cancer, thus a matched approach in theory would seem to make sense at face value. However, I have some concerns about the analysis that I believe impede their answering the research aims.

      1. With regard to the matched analysis of time to heart disease diagnosis, I have several critiques/questions. First, for the breast cancer cohort, were patients with a diagnosis of heart disease prior to cancer diagnosis included in the analysis? If so, how was the event (which precedes time = 0) incorporated into the analysis? If not, please make sure to make note of this important restriction. I think the latter approach is the better / correct.

      As suggested by Referee 3, we have now excluded those patients with a diagnosis of heart disease prior to cancer diagnosis. We have updated the results and the methods section accordingly.

      In Materials and Methods:

      “We included all patients diagnosed with non-metastatic breast cancer (stages I-III) and without prior diagnosis of heart disease at age 25 to 75 years (N = 8015).”

      Second, for the matched cohort, what is time = 0 for these persons? i.e. how does one interpret "Time since diagnosis" on Figure 1 for a patient who has not been diagnosed with breast cancer?

      We apologize for this misunderstanding and have revised it to “Time since index date (= date of diagnosis, which is the same date for corresponding matched individual from the general population) ” in Figure 1.

      Third, how was the matching incorporated into the FPM? Presumably there should be a frailty term of some sort to indicate the matched groups, within which there is expected to be correlation.

      In the flexible parametric survival model for matched cohort data, a shared frailty term was incorporated into the model to indicate the matched cluster. The maximum (penalized) marginal likelihood method is used to estimate the regression coefficients and the variance for the frailty. We have added this explanation in the methods part.

      In Materials and Methods: “Considering the correlation within the matched clusters, a shared frailty term (as random effects) was incorporated into the model and the maximum (penalized) marginal likelihood method was used to estimate the regression coefficients and the variance for the frailty.”

      1. It is noted that Kaplan Meier curves were used to estimate the cumulative incidence of heart disease. How was death of the patient prior to diagnosis of heart disease handled? I do not think that Kaplan Meier is the correct approach here but rather a Aaalen-Johansen-type estimator that treats death as a competing event. See e.g. https://pubmed.ncbi.nlm.nih.gov/10204198/ A Kaplan Meier will tend to overestimate the event rate when competing events are counted as censoring.

      As suggested by the reviewer, we have now used the Aalen-Johansen method to estimate the cumulative incidence of heart disease and revised the text in the Methods, as well as the tables and figures in the supplement.

      In Materials and Methods,: “Aalen-Johansen estimation was used to assess the cumulative incidences of heart diseases in breast cancer patients and matched reference individuals, while other causes of death were considered as competing events.”

      1. The sentence "Missing indicators were included for the analysis of these covariates in the model" and the results in Table 3 suggest that some missing values were analyzed 'as is', meaning that missingness was used as a category itself. This of course is not desirable and there exists methodology+software for more appropriately handling these data, e.g. multiple imputation with chained equations. For example, how does one interpret that 'unknown chemotherapy' status is positively associated with heart failure but less so than anthracycline based chemo.

      Missingness of the type of adjuvant treatment was considered as a category in the previous version of our manuscript. To address potential biases resulting from missing data, we have now used multiple imputation with chained equations and revised the methods and Table 3 accordingly.

      In Materials and Methods: “Multiple imputation with chained equations was used to deal with the treatment categories with missing information. We replaced the missing data with 10 rounds of imputations and all the covariates were included in the imputation model.”

      1. The reported HRs at the top of page 10 seem incongruous with the FPM model demonstrated in Figure 1, since there is clearly a non-linear relationship between the hazard and the outcome. In other words, there is little sense in which the hazards are proportional at all time points.

      As shown in the FPM model in Fig. 1, HRs were not constant according to time since index date. Therefore, in the revised version, we only showed the HRs separately in <1, 1-2, 2-5, 5-10 and 10-17 years after diagnosis. We have revised the abstract, methods, and Table 2.

      In Abstract: “Time-dependent analyses revealed long-term increased risks of arrhythmia and heart failure following breast cancer diagnosis. Hazard ratios (HRs) within the first year of diagnosis were 2.14 (95% CI = 1.63-2.81) for arrhythmia and 2.71 (95% CI = 1.70-4.33) for heart failure. HR more than 10 years following diagnosis was 1.42 (95% CI = 1.21-1.67) for arrhythmia and 1.28 (95% CI = 1.03-1.59) for heart failure. The risk for ischemic heart disease was significantly increased only during the first year after diagnosis (HR=1.45, 95% CI = 1.03-2.04).”

      In Materials and Methods: “We compared the risk of heart diseases in breast cancer patients with that observed in the matched cohort, using flexible parametric model (FPM) with time since index date as underlying time scale.”

      In Results: “A short-term increase in risks of arrhythmia and heart failure was found in breast cancer patients (Table 2, Figure 1, HR at first year for arrhythmia= 2.14; 95% CI = 1.63-2.81, for heart failure =2.71; 95% CI = 1.70-4.33, respectively).”

      1. It seems unlikely that breast cancer diagnosis could ever be 'protective' for ischemic heart disease. A more constrained model that does not allow for the possibility of HR < 1 could provide a more sensible estimate of this time-dependent HR.

      To the best of our knowledge, the inverse association between breast cancer and the long-term risk of ischemic heart disease is possible considering that some of the reproductive risk factors for breast cancer have protective effect on the risk of ischemic heart disease. We have now discussed about this in Discussion.

      In Discussion: “The long term lower risk of ischemic heart disease in breast cancer patients compared to age-matched women might be explained by the opposite role of reproductive factors in breast cancer and ischemic heart disease. Women with younger age at menarche and older age at menopause were associated with increased risk of breast cancer, while decreased risk of ischemic heart disease were found among these women (Collaborative Group on Hormonal Factors in Breast, 2012; Okoth et al., 2020).”

    1. Author Response

      Reviewer #1 (Public Review):

      Lopez and Wingreen proposes the idea of noise-averaging cooperation (NAC), or within-population cross-feeding driven by noisy metabolism in microbes. The authors reasoned that since microbes are small, they are prone to noisy metabolism which limits growth rate. If related bacteria can share metabolites to average out noise (e.g in biofilm), then population growth rate can be improved and sometimes, the irreversible growth arrest of individuals can be prevented in theory. The authors predict substantial noise-driven growth inefficiencies from single-cell protein abundance data, review evidence for NAC, and propose how to detect NAC in microbial populations.

      Although this paper would be greatly strengthened by experimental tests (some of which may not be too difficult to do), I did enjoy reading it, and the writing is clear and thoughtful. The problem of "cheaters" (cells that take metabolites but do not leak any) will naturally arise, although the problem is mitigated in biofilms. Discussions on that will be useful.

      We agree that the issue of “cheaters” deserves additional attention in the manuscript. To this end, we have augmented our discussion of NAC’s evolutionary stability to include a discussion of the existing literature on the evolution of cooperation. We now situate NAC and the results of our biofilm model in this larger context. We note that our spatial model results show that the benefits of NAC can be “privatized” among cooperators, a key requirement for the evolution of cooperation.

    1. Author Response

      Reviewer #1 (Public Review):

      Oxenford and colleagues outline the basic principles of a new software tool which they developed to combine the documentation and correlation of various data sets relevant for the implantation and the control of the location of deep brain stimulation electrodes . The concept behind their Lead-OR tool is a logical extension of a software tool which they have developed earlier - the Lead-DBS package.

      Multimodal data representation undoubtedly will be a step forward. It is of particular relevance that the toolbox which is shown will be made openly available by open-source platforms.

      The introduction of this new tool holds great promise for future research. In particular, the use of this tool might result in a more uniform recording of the correlation of neurophysiological findings with the exact location of deep brain stimulation electrodes and ultimately of clinical outcome. A great advantage of this new Software is also ist flexibility with the option to include other sources as well like new atlases and anatomical data.

      The conclusions of this paper are well supported by the data which is shown. In particular the figures nicely support the claims made in the manuscript. The clinical series of 52 patients with Parkinson disease gives an example how the new software can be used. Nevertheless, it will be necessary to demonstrate the feasibility of the tool in future clinical studies.

      The software will also be useful when applying segmented leads. The authors could expand on this subject. It is certainly a disadvantage of the current software that recordings of local field potential cannot be incorporated yet. At least this should be possible post hoc.

      The discussion touches upon many controversial topics and ambiguous Scenarios but it is overall well balanced. The limitations of the study are outlined very openly.

      We would like to thank the reviewer for this accurate summary and their positive words about our manuscript.

      Reviewer #2 (Public Review):

      Oxenford et. al., describe a novel open-source DBS visualization software package, Lead-OR that aims to fill a gap in the intraoperative visualization of DBS trajectories. While theirs is certainly not the first nor only attempt at achieving this, the described software is unique in combining an open-source approach with integration of multimodality data including integration with the most commonly used planning and microelectrode recording platforms. Their software has the potential to take intraoperative DBS visualization to the next level by combining patient-specific imaging with intraoperative electrophysiology and new normalization tools to incorporate external atlases. While some may find this approach unnecessary given the trend towards decreased reliance on MER in DBS for movement disorders, the tools described will still be useful for retrospective and research analyses. The true potential of LeadOR lies in its future potential as integration across platforms grows and other developers add to its capabilities over time.

      We would like to thank the reviewer for this accurate summary and the positive evaluation of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated that vaccine which is designated RNA replicons delivered by lipid inorganic nanoparticles (LION) exhibited the protective immune response against SARS-CoV2 variants by heterologous challenging. They also provide the evidence its significant efficacy to assess pathological analysis in the lung using hamster model. However, this study presented descriptive data with a few mechanistic studies in the immune response. Concerns with the manuscript are related to data describing the relevant of protective effects in vivo and the data supporting the interpretation of vaccination efficacy against multiple SARS-CoV2 strains.

      Specific concerns In previous study (Erasmus et al, 2020a), the authors developed a novel vaccine and demonstrated that this novel vaccine harboring an alphavirus-derived repRNA induced antibody production responses in mice and macaques. In this manuscript, the authors demonstrated that this novel vaccine harboring SARS-CoV2 variant derived repRNA with pre-fusion type has significant cross-neutralization activity and protective immunity against SARS-CoV2 variant.

      1. The antibodies which are produced after immunization by repRNA expressing pre-fusion stabilized spike protein/LION can bind to S1 or S2 or RBD? Please define the reactivity of antibodies and also compare to those from native form.

      We thank the reviewer for this suggestion, we have updated figure 2 of the manuscript with this additional data.

      1. The authors demonstrated that this novel vaccine has significant efficacy with heterologous neutralizing activities. Please provide some evidence for reasons. F.i. this novel vaccine (with pre-fusion type) can induce the production of cross-reactive antibodies against SARS-CoV variant? And also it would be better to define the epitopes of these antibodies.

      We propose/postulate the cross-protective efficacy is due to cross-reactive neutralizing antibodies as shown in figure 3. Although in some combinations of vaccine vs challenge the neutralizing titers are diminished. Additionally, thanks to your suggestion to characterize within S domain-binding antibodies, we also noted that the apparent increase in breadth of the B.1.1.7 vaccine did appear to correlate with this vaccine’s ability to drive higher S1-binding antibody responses relative to the A.1 vaccine.

      1. In the hamster model, this novel vaccination showed the significant protective effects on lung pathology. Please provide some data that a novel vaccination induce T cell responses in hamster by the frequency of antigen specific CD4 or CD8 T cell and cytokines.

      We chose to focus on neutralizing antibodies as these appear to be the primary correlate of protection against disease (https://www.nature.com/articles/s41591-021-01377-8). In this study, we therefore did not collect samples to measure T-cell responses to the vaccine. However, we have shown induction of T-cell responses in mice and non-human primates receiving the A.1-targeted vaccine (https://pubmed.ncbi.nlm.nih.gov/32690628/), suggesting T-cell responses may have been induced in hamsters. We have included an updated discussion section with discussion on this limitation of our study.

      Minor concerns Miss-labeling in Figure 5, B, D, F in the manuscript. Please correct it.

      We thank the reviewer for catching this error and have corrected the text in the manuscript.

      Reviewer #2 (Public Review):

      This paper aims to develop second-generation vaccines that protect against multiple SARS-CoV-2 variants of concern. For this purpose, the authors developed new vaccine candidates composed of SARS-CoV-2 spike protein derived from B.1.1.7 (alpha) and B.1.351 (beta) variants. The essential backbone of the vaccines they used contains alphavirus-derived sequences to be self-amplifying, and one containing spike protein of the Wuhan strain is already in clinical trials. They demonstrated no significant difference in virus removal and pathogenesis in the lower respiratory tract. However, the titer of in vitro neutralizing activity and virus removal ability in the upper respiratory tract were decreased against the strains different from the vaccine strain.

      Overall, their data are convincing and valuable as a platform for a new vaccine against SARS-CoV-2 VoC in the future. Besides, I have some comments to strengthen their argument.

      1) The challenge experiments in Figure 4, Figure 5, Figure 6, and Figure 7 lack data on infection protection against B.1.617.2 (delta strain). It is better to add B.1.617.2 to the challenge experiments and neutralizing assay in Figure 3. The addition of data against B.1.1.529 (Omicron) is ideal.

      We appreciate the reviewer’s suggestion. However, these studies were completed prior to having a working B.1.617.2-stock which was difficult to achieve due to mutations arising in tissue culture. Nevertheless, we believe the neutralizing titers achieved against B.1.617.2 in figure 3 would suggest significant efficacy against B.1.617.2 infection. We found neutralizing titers against B.1.617.2 by the three vaccines were similar or greater than seen against B.1.351, a VoC for which we still saw near complete protection. Additionally, the complete replacement of the B.1.617.2 VoC with B.1.1.529 now makes further testing against B.1.617.2 of limited benefit. Efficacy testing against the B.1.1.529 VoC is still ongoing.

      2) There are no data on T cell responses to vaccines, even in mice. If their vaccine can also induce T-cell responses, it would be more attractive. At least, it would be better to discuss the potential contribution of T-cell responses since alphavirus-based replicating RNA vaccines could be one of the nice vaccine platforms to elicit T-cell responses, according to previous works. (For example, McKay PF et al., Nat Commun. 2020 Jul 9;11(1):3523.)

      As mentioned in response to reviewer 1, we chose to focus on neutralizing antibodies as these appear to be the primary correlate of protection against disease (https://www.nature.com/articles/s41591-021-01377-8). We agree that data on T-cell responses would be helpful but we did not collect samples to evaluate T-cell responses during the course of the studies presented here. We have shown induction of T-cell responses in mice and non-human primates receiving the A.1-targeted vaccine (https://pubmed.ncbi.nlm.nih.gov/32690628/). We have included an updated discussion section with discussion on this limitation of our study.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use ribosome profiling (RiboSeq) and RNA sequencing (RNASeq) to characterise the transcriptome and translatome of two PRRSV species as well as the host in response to infection. One particularly exciting feature of the study is that the analysis is carried out at different times of infection, which shows how both the virus and the host regulate their gene expression. The authors identify several new regulatory mechanisms of virus gene expression. Unexpectedly, they also find that the frameshifting efficiency at the ORF1ab frameshifting site changes with time. This contradicts the dogma in the field, which states that frameshifting is constant and has evolved to be constant to produce the a particular ratio of the two protein isoforms. The strength of the paper is in its comprehensible analysis. The paper is extremely rich in data, with 12 main and 23 Supplemental Figs and 11 Supplemental Tables, all of them rather complex. The main weakness is that it is written in a technical language that will be hardly readable by a non-specialist readership. Unfortunately, the authors do not make a good job in guiding the reader through their findings and hardly identify the the most important findings, while leaving the details to the specialists. This is particularly exemplified in Fig. 12, which should present the summary of the findings and would be extremely helpful, but hardly provides any text at all. This is potentially a very interesting paper, but the impact on the field could be increased considerably by better presentation of the work.

      We would like to thank this reviewer for the positive comments about the scientific findings, and for their suggestions for improving the presentation of the work. This outside perspective was very useful in helping us see which parts of the paper required clearer explanation or less detail, which can be hard to discern when very close to the work. We have incorporated all of this reviewer’s suggestions and we think this has improved the manuscript and made it easier to follow.

      Reviewer #2 (Public Review):

      The authors used the ribosome profiling technique to study gene expression at transcriptional and translational levels in the cells infected with porcine reproductive and respiratory syndrome virus (PRRSV-1 and PRRSV-2) using ribosome profiling. The ribosome profiling was carried out on the cells at different time points within the first 12 hours of infection, thus providing information on gene expression changes during the time of infection.

      The analysis of ribosome profiling data is exceptionally detailed and includes scrupulous characterization of footprint read lengths, de novo prediction of translated ORFs, characterisation of local pauses and differential gene expression of host and viral genes. The RNA-seq analysis is on par with that, the authors did a superb job at characterising the composition of the viral transcriptome that included identification of heteroclite RNAs and defective interfering RNAs. This provided the authors with reliable information for the interpretation of translational mechanisms responsible for the translation of ORFs discovered with ribosome profiling data.

      A specific focus of the manuscript was placed on the characterisation of two instances of ribosomal frameshifting occurring in PRRSVs. In addition to "canonical" -1 frameshifting at a slippery sequence stimulated by downstream RNA secondary structure (common to many viruses), PRRSVs genome contains an additional frameshifting site whose efficiency is stimulated by a viral protein. The authors demonstrated that the efficiency of this frameshifting is increasing over time which is expected since the concentration of stimulating protein is increasing. Furthermore, the authors found that the efficiency of "canonical" frameshifting is also changed. The authors describe this as surprising since it directly contradicts the common description of its function as "setting the fixed ratio" between the synthesized products upstream and downstream of the frameshift site. Perhaps it is not so surprising in the hindsight, given that the frameshifting is dependent on so many different factors, folding states of RNA pseudoknots which are dynamic, ribosome density upstream, etc. it would be more surprising if the efficiency of frameshifting were indeed fixed. I think the "fixed ratio" was proposed mainly to draw a difference to ribosomal frameshifting occurring in cellular genes (like antizyme or bacterial release factor 2) where there seems to be only one functional product, but its synthesis level depends on the efficiency of frameshifting sensing certain conditions. It is great though that the authors observed such changes and I agree with the authors' speculations that this is unlikely to be unique to PRRSVs.

      While I found the work to be largely descriptive, the authors did not shy away from speculating about potential mechanisms responsible for observed regulation. The manuscript is hard to get through simply due to its large length and a lot of data, but reading it is rewarding.

      Again, we would like to thank this reviewer for their positive comments about the work, and to reiterate that hopefully the revised version of the manuscript will be easier to read.

      Reviewer #3 (Public Review):

      The manuscript by Cook et al. describes the first comprehensive gene expression analysis of two species of PRRSV, an important agricultural pathogen. Using ribosome profiling and RNA-sequencing, the authors systematically analyze the transcriptome of the virus and its translation, and their temporal kinetics. The analysis revealed non-canonical RNA species that are suggested to contribute to translation of parts of ORF1ab, changing the stoichiometry between the NSPs. In addition, the authors use the ribosome profiling data to identify novel overlapping ORFs, including a conserved uORF in the 5' leader, and to analyze the efficiency of frame-shift in two sites in the viral genome, one of which is trans-regulated by the viral nsp1β. The frame-shift efficiency in both sites is presented to be increasing late in infection. The authors also present conservation analysis from hundreds of available genomes. Finally, analysis of host gene expression uncovers a pattern suggesting translation inhibition of induced transcripts, and by comparing a WT virus to a mutant virus lacking the nsp2 site frame-shift, the authors identify a gene (TXNIP) whose expression is affected by nsp2TF.

      In this rigorous work, the authors uncover new insights on an important pathogen, which can be of value to the wider field of virology. However, due to technical issues a few of the authors claims may require reconsideration.

      We are grateful to this reviewer for their comments on the rigour and the impact of the work, as well as the suggestions for improvement which they included in their more detailed review. Within the detailed review, this reviewer expressed some concerns that ribosome run-off (seen in Figure 1—figure supplement 1 [formerly Supplementary Figure 1]) might confound the comparison of ribosome densities in different regions of the viral genome (particularly ORF1ab). However, this run-off only noticeably affects the first ~100 nt of host CDSs, which is very small compared to the ~12,000 nt total length of ORF1ab. The regions of ORF1ab in which we compare ribosome density in our study are almost all > 1,000 nt downstream of this ~100 nt run-off region and will therefore not be significantly affected by run-off. The exception to this is our assessment of heteroclite sgRNA translation, where the “heteroclite” region does include the first ~100 nt of ORF1a. As such, run-off may have a slight effect on this analysis, but we expect this to be minor, as the ~100 nt run-off region represents only a small proportion of the 1,550-nt “heteroclite” region. Further, any such effect would actually lead to under-estimation of heteroclite sgRNA translation, by artefactually reducing the relative RPF density in the heteroclite region. This would therefore strengthen our conclusion that our data provide evidence for heteroclite sgRNA translation.

    1. Author Response

      Reviewer #2 (Public Review):

      Romand et al investigates the role of hyperphosphorylated guanosine nucleotides (ppGpp) in acclimation of plant chloroplasts to nitrogen limitation. The signaling role of ppGpp as alarmone is well established in the stringent response of bacteria. The stringent response allows bacteria to adapt to amino acid or carbon starvation and other acute abiotic stress conditions by downregulation of resource-consuming cell processes. A series of studies, including the current one, have demonstrated the retention of the bacterial-type ppGpp-mediated signaling response in plant and algal chloroplasts. The current study convincingly demonstrates the involvement of ppGpp in remodeling of photosynthetic machinery under nitrogen limitation. Using three Arabidopsis RSH lines (two underaccumulators and one overaccumulator of ppGpp), the authors show that the ppGpp is required for preventing excess ROS accumulation, oxidative stress and death of cotyledons under nitrogen limiting condition. The authors show a transient accumulation in ppGpp upon nitrogen limitation, which is followed by a sustained increase in the ratio of ppGpp to GTP. There is a prompt decline in maximum photochemical efficiency of photosystem II (PSII) and linear electron transport under nitrogen deficiency in wild type and ppGpp overaccumulator plants. However, mutants with low amount of ppGpp have a delayed decrease in these photosynthetic parameters. PpGpp is further shown to decrease (or degrade) photosynthetic proteins, and a remodeling of PSII that involves uncoupling of LHC II from the reaction center core has been suggested to occur under nitrogen starvation. The authors also show a ppGpp-mediated downregulation of chloroplast gene transcription and a coordinated plastid-nuclear gene expression under nitrogen deficiency.

      Strengths 1. The conclusions of this paper are mostly well supported by data. With three different RSH lines, there is a convincing demonstration of the specific involvement of ppGpp in nutrient acclimation. The line carrying conditional overexpression of Drosophila ppGpp hydrolase (MESH) nicely complements the RSH lines and strengthens many of the conclusions. This is a detailed analysis of ppGpp function in a plant species. The data supplement accompanying each main figure is extensive and helpful. 2. The genomic analysis in nitrogen replete and deplete wild type uncovers an interesting regulation of RSH enzymes at the transcriptional level. This is likely to be part of a signaling response that works in conjunction with allosteric modulation of RSH activity under nitrogen limitation. 3. The large-scale analysis of plastid and nuclear gene transcripts supports the involvement of ppGpp in coordinated repression of plastid and nuclear gene transcription. 4. By the inclusion of mitochondrial genes and proteins in their analysis, the authors clearly show that the ppGpp action is limited to plastids and does not extend to mitochondria, which like chloroplasts, have a bacterial ancestry. 5. The thorough demonstration of the involvement of ppGpp in low nitrogen acclimation of photosynthetic metabolism adds greatly to the understanding of plant abiotic stress tolerance mechanisms and ppGpp function in both plants and bacteria.

      We thank the reviewer for these observations on our work.

      Weaknesses: 1. With two earlier reports from a different laboratory (Maekawa et al 2015 and Honoki et al 2018) showing the involvement of ppGpp in acclimation to nitrogen deficiency, the novelty of the current study is diminished. The authors mention that the double mutant (rsh2 rsh3) used by Honoki et al does not show a clear phenotype other than a delay in Rubisco degradation. It is not clear to me why the lack of two major RSH isoforms, involved in synthesis of ppGpp under light, would not produce any phenotype. This discrepancy should be discussed further in the manuscript.

      The work of Maekawa et al., 2015 and Honoki et al., 2018 was indeed important for highlighting the potential involvement of ppGpp in the acclimation to nitrogen deficiency. However, these studies were based on the constitutive overaccumulation of ppGpp. Here, we demonstrate a physiological requirement for ppGpp signalling by the plant to allow acclimation to an abiotic stress- we consider this to be a major step forwards in understanding the role of ppGpp in plants, and one of the few examples of a physiological requirement for ppGpp in plants.

      We mention the use of an RSH2 RSH3 mutant by Honoki et al. 2018 while putting our results into the context of previous findings in the discussion. We bring the attention of the reviewer to our analysis of an RSH2 RSH3 mutant in this study, and that in our hands the mutant phenotype was indistinguishable from the RSH quadruple mutant (rshQM) (Figure 2- figure supplement 1 panel B). Therefore, we do indeed consider that RSH2 and RSH3 are the main RSH isoforms involved in ppGpp-mediated acclimation to nitrogen deficiency, and we state this ( see p7 l161-164 in original manuscript). As we explain in the discussion there are probably technical reasons for the discrepancy with the results reported by Honoki et al. 2018. We also note here that the RSH2 RSH3 mutants used in our study and by Honoki et al. 2018 are not identical: the same SAIL insertion SAIL_305_B12 was used for rsh2, while the rsh3 allele used by Honoki was the GABIkat insertion GABI129D02 and here SAIL_99_G05). We now add this difference in the genetic identity of the mutants as an additional potential explanation for the different findings in the two studies.

      1. The authors at times show a tendency to overinterpret their results. A ppGpp-mediated repression of chloroplast transcription and translation is sufficient to explain most of the observations in this study. However, the authors seem to go beyond this simple explanatory framework by invoking specific roles for ppGpp in remodeling of PSII antenna-core interaction and in blocking of PSII reaction center repair. There is no data in the manuscript in support of these two propositions. A coordinated decrease in synthesis of most chloroplast proteins, including the D1 reaction center protein of PSII, is sufficient to explain the decrease in Fv/Fm. There is no evidence in the manuscript for "photoinactivation gaining an upper hand via ppGpp-mediated signaling"

      The circuit breaker analogy of PSII photoinhibition that the authors discuss in support is just an interpretation. The remodeling of PSII antenna-core interaction, likewise, could be a simple consequence of the ppGpp-mediated decrease in D1 protein synthesis. The high antenna-core ratio under nitrogen starvation likely reflects the lag in the decrease of LHCB1 (which eventually decreases significantly by day 16).

      Since ppGpp-signaling primarily affects plastid transcription and translation, there is a rapid decrease in plastid psbA gene product (D1) relative to the nuclear-encoded LHCB1. The unconnected LHCII might simply be a result of the mismatch in antenna-core stoichiometry rather than an active regulation of PSII functional assembly by ppGpp.

      We have re-worked the discussion to make these points more clearly, and also to tone down certain points where we may have over-stretched our interpretation.

      We think that our interpretation is essentially the same as the reviewer’s- the ppGpp mediated inhibition of chloroplast translation and transcription is sufficient to explain the majority of our results. In the discussion we also discuss the possibility that ppGpp stimulates the active degradation of some chloroplast proteins, and put this in context of studies showing that N-starvation activates the specific proteolysis of certain photosynthetic proteins in Chlamydomonas and has an effect on the half lives of different chloroplast proteins in plants. We do not propose or present data suggesting that ppGpp has any other specific targets/effectors- for example within the PSII repair cycle or in remodelling PSII stoichiometry- although we also cannot exclude the possibility of targets in these processes.

      We think that the ppGpp dependent change in PSII stoichiometry during N-starvation is not just a side effect of a general downregulation or a temporary mismatch as suggested- but due to its size, persistence and effect on photosynthesis is likely to be part of the acclimation process. For example, the ppGpp-dependent drop in Fv/Fm is maintained at day 16 and even beyond (Fig 2D). We also see that photosynthetic proteins are still degraded in low ppGpp mutants (Fig. 3A), but that the high Fv/Fm is maintained throughout. These points and the fact that the alteration of PSII stoichiometry is not caused by the direct action of ppGpp on PSII (but via transcription/translation) does not mean that it is not important or does not play a role in acclimation. Other studies report that PSII RC inactivation can protect PSI (e.g. Tikkanen et al. 2014) and ppGpp may be working in a similar fashion here by reducing the flow of energy into the photosynthetic electron transport chain. This interpretation is consistent with our results showing that wild-type plants and high ppGpp plants (rsh1-1) accumulate less ROS and ROS-related damage than plants defective in ppGpp biosynthesis (Fig. 1).

      1. The work is mostly descriptive of the involvement of ppGpp in low nitrogen tolerance without any data on how the nitrogen deficiency is sensed by the RSH enzymes and how ppGpp orchestrates the multi-faceted acclimatory response. Perhaps, these aspects are beyond scope of the current manuscript, but they could be discussed more.

      We agree that these are very important questions, and also that they are out of the scope of the current work. We think that our work goes beyond the descriptive by demonstrating the physiological functions of ppGpp-signalling during nitrogen deficiency and a framework for how it occurs (i.e downregulation of chloroplast function and avoidance of excess oxidative stress).

      Reviewer #3 (Public Review):

      The manuscript by Romand et al. explores the role of guanosine penta- and tetraphosphate, ppGpp, in the acclimation of plants to nitrogen limitation. It shows that an early and transient ppGpp accumulation - and a controlled ppGpp/GTP ratio - is necessary for a proper acclimation of plants to such stress. The pathway is shown to act on remodeling the photosynthetic machinery and downregulating photosynthesis during stress, thus limiting ROS damage to the plants. This regulation most likely takes place by affecting chloroplast transcription, maintaining the balance between nucleus- and chloroplast-encoded proteins.

      The manuscript proposes a thorough analysis of the ppGpp-induced response including extensive wild type and mutant analyses at the gene and protein expression level as well as at the physiological level under nitrogen limitation together with heterologous expression of ppGpp hydrolase from Drosophila. The conclusions are carefully backed by the data (but for the lack of gene expression analysis in the high ppGpp line, rsh1-1), the figures and text clear, well-written and easy to follow. Altogether it represents a solid new step in improving the comprehension of plant response to nitrogen limitation, as well as on the role of ppGpp in plants and possibly throughout the green lineage. An alternative hypothesis to ppGpp photoprotective role could be discussed in that photoprotection may be an indirect effect due to photosynthetic protein degradation enabled by ppGpp, possibly through modulation of ppGpp/GTP ratio affecting chloroplast protease activity.

      On this last point we agree with the reviewer- our data indicates that the photoprotective role of ppGpp is via the ppGpp-dependent control of the abundance of photosynthetic proteins. This is indirect in the sense that we have no evidence that ppGpp itself interacts with components of the photosynthetic machinery. However, as discussed below we do not think that photoprotection is just a side-effect of ppGpp’s action- we show that the capacity to synthetise ppGpp is required for avoiding the generation of ROS and tissue death.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, the authors examine the role of feedback from primary visual cortex (V1) to the dorsolateral geniculate nucleus of the thalamus (dLGN) under a variety of visual stimulus conditions. This is a well-defined circuit originating from a specific population of Layer 6 cells in the cortex, and the authors test the role of this projection by recording in dLGN during silencing of V1 via ChR2 expression in PV inhibitory cells. This is a well-established technique for strong silencing of cortex. However, because there are other disynaptic pathways from V1 to thalamus, they also perform a similar set of experiments using more targeted optogenetic inhibition of a genetically-defined class of Layer 6 (NTSR1) cells that make up most of the L6 corticothalamic projections. The fact that these experiments elicit similar results supports their interpretation that these direct projections are largely responsible for the observed results. While previous studies have manipulated corticothalamic projections pharmacologically, via V1 lesions, or via optogenetics, the authors rightly point out that most previous studies have focused on simple parametric stimuli and/or have been performed in anesthetized animals. The results of this study suggest feedback during natural visual stimuli and locomotion reveal effects that are distinct from these previous studies.

      Overall, these are important and carefully-performed experiments that significantly advance our understanding of the role of corticothalamic feedback to the dLGN.

      We thank the reviewer for the appreciation of our methods and results.

      The authors suggestion that the different effects observed during simple and complex stimuli may be due to increased surround suppression during the full-field gratings seems reasonable, but I didn’t understand how the analysis of blank periods during these two conditions supported this argument. It wasn’t clear to me what mechanisms would be expected to support the alternative outcome, where suppressing feedback during the blank periods interleaved with the two different stimuli would have different effects - unless they are testing whether natural movies elicit some longer-lasting state change that would change the results observed during blank periods. This seems somewhat implausible, and unless the authors wish to expand the study to include different stimulus sizes, I think the interpretation regarding surround suppression is best left to the discussion, where it is already treated well.

      We thank the reviewer for the recommendation. We fully agree that explaining the difference in CT feedback across blanks, gratings, and movies will require more experiments. We have followed the recommendation of the reviewer and removed the interpretation related to differences in surround suppression from the results section and treat it now in the discussion only.

      The paper would benefit from more clearly highlighting results that agree or disagree with previous studies, with a brief mention of how the authors interpret these similarities or differences. For example the results of Olsen et al 2012 seem to be consistent with what the authors observe here with gratings but not with natural movies, and although Olsen et al performed some awake recordings, I think the LGN recordings were all under anesthesia. Specifically highlighting these differences (and suggesting an interpretation for them) would help emphasize the novelty of the study.

      We thank the reviewer for the recommendation and now highlight throughout the results and discussion where our results agree or disagree with previous studies. As mentioned by the reviewer, we have similar results for gratings to the results obtained by Olsen et al. (2012), although in our study we have not explicitly centered the full field gratings on the RFs and we have not measured surround suppression. The results for the blank stimuli and the movies, however, are different, at least in terms of how CT feedback affects ring rate. A key insight of our study, at least in our view, is that CT feedback effects might well differ for different stimuli, and understanding the underlying mechanism (e.g., differential engagement of the excitatory and indirect inhibitory CT feedback pathway) will be an important avenue of research in the future.

      The authors should comment more on the spatial extent of V1 silencing and potential effects of the variability observed across mice, especially given that they appear to have made only a single injection of ChR2 to label PV cells. While silencing with this method extends beyond the injection site, it probably doesn’t cover all of V1. Was any analysis done of variability across mice based on the size or location of the ChR2 expression measured post-hoc?

      Unfortunately, we did not preserve enough slices to precisely quantify the extent of expression across animals. However, visual inspection of the slices revealed that even a single injection typically resulted in a widespread pattern of expression. In fact, we think that activation of PV neurons was determined in its spatial extent not so much by the virus expression but rather by the photoactivation light. With a distance of 0.5 0.1 mm of the optical fibre from the cortical surface, most of V1 was covered by light. A previous study performing a quantitative characterization of the lateral spread of optogenetic suppression by PV activation demonstrates that pyramidal neuron ring can be suppressed 2 3 mm from the laser center Li et al. (2019). Hence, we think that variability in opsin expression across mice is unlikely to have a substantial impact on our results.

      The decrease in reliability and sparseness during running is attributed partially to increased eye movements. In cortex this has been studied in awake animals with natural movies in a variety of studies where the opposite effects are observed including Froudarakis et al 2014 where there was a small increase in both metrics during running, and Reimer et al 2014 where reliability strongly increased during pupil dilation. If there is enough data to condition on running periods where eye movements are stable or dilation outside of running to measure the effects of feedback suppression during these periods, this would be useful information.

      We thank the reviewer for bringing up this interesting issue. We fully agree that our results recorded in dLGN are different from those measured by Froudarakis et al. (2014) and Reimer et al. (2014) in V1.

      As suggested by the reviewer, we have repeated the analysis proposed by Reimer et al. (2014) to identify periods in the movie with the most rapid pupil dilation / constriction in face of continuous changes in overall luminance. Besides the effects of pupil dilation / constriction on ring rate, we have computed reliability both according to what we had used throughout our manuscript and in the way proposed in Reimer et al. (2014), which resembles our measure of SNR. We find that both measures of reliability are unaffected by pupil dilation.

      Interestingly, in the meantime other studies have also reported that reliability might be differently affected by behavioral state in V1 compared to dLGN. For instance, Nestvogel and McCormick (2022) found that consistent with our results variability of membrane potential in visual thalamic neurons was not significantly altered by locomotion or whisker movement.

      Reviewer #2 (Public Review):

      Spacek et al. study the corticothalamic feedback of different visual stimuli on visual thalamus. With optogenetic suppression of visual cortex feedback and simultaneous multi-channel recordings in visual thalamus, the authors succeeded to acquire important data about this essential feedback loop in awake, behaving animals. The authors show in detail that the cortical feedback acts as a gain factor in thalamus for the transmission of signals from retina to cortex. They also show that naturalistic scenes result in robust feedback from cortex. As expected from anatomy, the authors find that modulatory feedback from cortex and modulatory input from brain stem act rather independently on thalamus. The paper is technically very impressive and the results are important for a wide range of readers.

      We thank the reviewer for the positive feedback.

      It is advisable to revise the Introduction and Discussion to better integrate the new findings into the existing literature.

      We thank the reviewer for this advice, and have revised the title, abstract, introduction and discussion to better integrate our new findings into the existing literature, and highlight our advances in relation to previous findings.

      The authors distinguish between awake, resting state and running state. However, the awake, resting state in mice comprises a wide range of alertness levels. This range of alertness will most likely affect the bursting probability of thalamocortical neurons.

      We thank the reviewer for this comment. So far, our manuscript had only taken locomotion as a proxy for behavioral state, as locomotion typically goes along with increased pupil size (Erisken et al., 2014; McGinley et al., 2015) and increased levels of arousal (McGinley et al., 2015; Vinck et al., 2015). To also study the effects of locomotion-independent arousal, we have now applied the analysis mentioned by the reviewer: following methods originally suggested by Reimer et al. (2014), we identified periods of the movie presentation without locomotion that corresponded to the upper or the lower quartile of pupil size change. Similar to the results that Reimer et al. (2014) found for primary visual cortex, we observed that ring rate in dLGN is enhanced during times when the pupil was dilating faster than usual vs. when it was constricting faster than usual. Like the effects of running, the modulations by pupil-indexed arousal persisted even with V1 suppression. We present these new results in Figure 5 - Supplement 2.

    1. Author Response

      Reviewer #1 (Public Review):

      The goal of this Tools and Resources article was to present a new method for optogenetic stimulation and optical imaging at the same time in two different cortical layers in vivo, and through 3 sets of experiments, highlight the promise and wide applicability of this method.

      The method itself presents an elegant solution to several outstanding drawbacks among the many recent innovations in these lines of methodology, including high expense, lack of specificity and excessive brain tissue damage. The paper provides what I believe to be a fair account of the capabilities and limitations of existing methods and a clear description of how the new method builds on and overcomes these.

      The three sets of experiments work well because they demonstrate reliability and feasibility in replicating previous findings from older techniques such as the phenonmenon of 'backpropagation-activated calcium spike firing' and net inhibitory influence of layer 2/3 cells on layer 5 cells, while also extending beyond those findings by verifying that some effects generalise to other areas than previous observations - the layer 2/3-5 interaction previously seen in primary somatosensory is here extended to motor cortex - and uncovering interesting phenomena that are relatively unexplored to date - the great variability in the degree of mirroring of activity in two layers receiving axonal input from the same thalamic area.

      The method presents exciting possibilities for the fine-grained study of cortical microcircuits and how they enable perception and cognition and relate to behaviour. The simplicity and low cost of the solution opens it up to a wider range of laboratories globally, and its low-profile imprint on the cortex ensures that it most likely reflects activity of normal, intact, rather than damaged, cortical tissue.

      We thank Reviewer #1 for recognizing “exciting possibilities” and advantages of our method.

      Reviewer #2 (Public Review):

      This manuscript reported a new approach to conduct neural activity imaging and manipulation in two different cortical layers. Two periscopes, each constructed from a micro-prism, a GRIN lens and a multi-mode fiber, could be inserted to the brain at different depths, and each can either perform imaging or optogenetics. The authors demonstrated a few applications: stimulation of L5 soma and superficial layer dendrites to evoke backpropagating action potential; optogenetically stimulating cells in L2/3 and observing response in L5 to investigate interaction between cells in two different layers; and simultaneously recording axon terminals from posteromedial thalamic nucleus at two different depths in cortex. This works combines the ideas of fiber photometry to access deep layers and using microprism to turn the optical field of view by 90 deg.

      Major strengths

      • Using microprism to perform layer specific imaging or optogenetics.

      • Low cost

      • Demonstrations of a few applications that require layer specific imaging and optogenetics.

      We thank Reviewer #2 for recognizing these strengths.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors describe a generative model-based framework to better analyze stochastic growth data, including bacterial cell growth.

      This work is well-supported by simulations and data analysis and will likely be of interest to those trying to understand the processes governing bacterial growth, as well as those studying stochastic growth processes in biology more broadly.

      We thank Reviewer 1 for appreciating the methods used in the work and its scope.

      It would be good to have a more extensive discussion about what is specifically new here. This is not my particular field, so having a bit more of an introduction about methods beyond binning (if any) that have emerged to understand these data.

      Binning and linear regression are the most commonly used methods for probing the correlations between variables obtained using single cell data analysis [Ho et al. (2018), Jun et al. (2018)]. In this paper, we try to address the pitfalls associated with applying these simple procedures. In the revised version, we put more emphasis on linear regression throughout the manuscript for example in the ”Introduction”,

      “While binning may provide a smooth non-linear relation between variables, linear regression is used to find a linear relationship between the variables. In addition to binning, we use the ordinary least squares regression where the slope and the intercept of the best linear fit line are obtained by minimizing the squared sum of the difference between the dependent variable raw data and the predicted value. Here, the best fit/the best linear fit is obtained using the raw data and not the binned data. Similar to binning, the assumption underlying linear regression is that our knowledge of x-axis variable is precise while the noise is in the y-axis variable.”

      Following this comment, we added the best linear fits to the Figures 2C, 2D and 3A. We also try to clearly point out the sections which are not novel to the paper. For example, the adder model in Figure 1A has been discussed previously. We emphasize that in the revised text,

      “This previously discussed example demonstrates and reiterates the use of statistical analysis on single-cell data to understand the underlying cell regulation mechanisms.” Similarly, the novel results of the paper such as obtaining the best linear fits for the plots ln( LLdb ) vs ⟨λ⟩Td and its flipped axes are based on a class of models studied by Eun et al. (2018). We try to make it clearer in p.7 line 133 of the revised manuscript,

      “For that purpose, we use a previously studied model [Eun et al. (2018)] which considers growth to be exponential with the growth rate distributed normally and independently between cell cycles with mean growth rate ⟨λ⟩ and standard deviation CVλ⟨λ⟩.”

      Reviewer #2 (Public Review):

      The final result (Fig 4) is somewhat disconnected from the majority of the paper that precedes it. Specifically, the authors’ procedure that resolves exponential vs. nonexponential growth results in E. coli in alanine being deemed exponential (Fig 2B) only to later be revealed as non-exponential (Fig 4A), albeit weakly. Furthermore, the procedure advertised as distinguishing exponential from linear growth (Fig 3B), when applied to the data, reveals neither (Fig 4). This makes the main point of the paper (the demonstration and resolution of pitfalls) feel disconnected from its application to a particular case, which is more nuanced and likely leaves many questions unanswered.

      We attempt to resolve the gap between Figure 4 and the text preceding it by showing that growth rate vs age plot can be used to infer the modes of growth, including, but not limited to, exponential and linear growth. To verify this, we simulate the adder model for cells undergoing super-exponential growth. The binned growth rate trend as a function of age for super-exponential growth is shown in Figure 3B and the following text is added to the revised manuscript,

      “Thus, the two growth modes (exponential and linear) could be differentiated using the growth rate vs age plot (for details see Section 5.7). However, the growth rate vs age plots can be used to infer the mode of growth beyond the two discussed above. We show this by using simulations of cells following the adder model and undergoing faster than exponential or super-exponential growth (see Section 5.11.2 for details). In such a case, the growth rate is expected to increase. This increase in growth rate is shown in Figure 3B using simulations. The binned data trend (red triangles) again matches the growth rate mode used in the simulations (red dotted line). Thus, the growth rate vs age plots are a consistent method to distinguish linear from exponential and super-exponential growths.”

      The details about the simulation of super-exponential growth has been added to Section 5.11.2. We have also added theoretical predictions (dotted lines of same color) to the growth rate vs age curves of different models shown in Figure 3B of the revised manuscript, that agree well with our simulations.

      The title (”To bin or not to bin...”) implies that binning is the main culprit behind potentially misleading analysis, but I would argue that in the end, it is linear regression. Each of the two main pitfalls and their resolution would be unchanged if the data were never binned, I believe. Binning affects the apparent curvature of the y vs x relationship, but this reads as a more minor point. Therefore, the title may be a bit misleading in service of its poeticism.

      We thank the reviewer for the comment and we have changed the title to a more apt one for the manuscript- ”Distinguishing different modes of growth using single-cell data”.

      Reviewer #3 (Public Review):

      Kar et al. examine an interesting and important question of how to make sense of large sets of observational data, specifically cell length data, which may or may not be consistent with various underlying biological mechanisms. As datasets improve in their technical quality (increasing spatiotemporal resolution, increasing numbers of observations), there is hope that the community will be able to resolve differences between underlying cell biological mechanisms of cell size homeostasis. As the authors point out, these interpretations and analyses require statistical analysis that can accurately perform the model selection or parameter estimation task of interest.

      We thank Reviewer 3 for the comments. Indeed, our message in the paper is to use underlying biological models to aid the inference of biological mechanisms using various statistical analyses methods.

      1. The authors succeed in bringing attention to the issue of appropriate binning when analyzing large datasets. The authors focus their figures and discussion on an important, and practical issue, as many researchers perform linear regression on binned data. The title and framing of the manuscript imply that it will provide a comparison with statistical methods that do not involve binning. The authors look at different choices of binning dimensions, but do not sufficiently explore the power of their generative model to perform (un)weighted regression or parameter estimation from the not-binned data. They do explore the unbinned data from an analytical statistics approach in section 5.4.1 and 5.5 but this not yet extensively explored in the figures and/or discussion.

      We apologize for the confusion. Throughout the paper, we perform linear regression on the raw data and not the binned data. We have now added a clarification statement in the revised text,

      "Here, the best fit/the best linear fit is obtained using the raw data and not the binned data."

      Further, we have changed the title of the manuscript to ”Distinguishing different modes of growth using single-cell data”. The paper aims to bring forth the issues in both binning and linear regression which arise from similar sources i.e., the intrinsic noise affecting the x-axis variable and the inspection bias. We discuss these issues in relation to mode of growth in single cells as the new title now states.

      In addition to section 5.4.1 and 5.5 where we discuss calculating the best fit line for exponential and linear growth, we also try to explicitly mention linear regression on non-binned data in the figures, and the Results and Discussion section. This is shown by the addition of the best linear fit/best fit in Figures 2C, 2D and 3A.

    1. Author Response

      Reviewer #1(Public Review)

      Maji et al demonstrate co-storage of prolactin (PRL) and galanin (GAL) as functional amyloids in secretory granules of the female rat. In a series of detailed experiments, they show that both hormones promote their aggregation to amyloid. They show that PRL and GAL co-localize in the pituitary and that there is co-fibril formation, forming a new type of hybrid fibril. They further demonstrate that there is a unidirectional cross-seeding of GAL aggregation for PRL seeds, while cross seeding by mixed fibrils does not occur. Molecular dynamic studies show that co-aggregation of PRL and GAL induce the formation of a β-sheet at the protein surface. Overall, more efficient storage of the hormones in secretory granules is demonstrated, as well as faster release, as compared to the homotypic counterparts. Strengths include the rigorous techniques that were used, including biophysical techniques with transmission electron microscopy and the use of molecular dynamics to delineate the mechanism of PRL and GAL interactions at the atomic level. An additional strength is the novel observation of the unidirectional, heterotypic templating competency of PRL fibril seeds for GAL monomers, inducing GAL fibril formation.

      We appreciate your comments and thank you for providing deep insight into our manuscript and stating the importance of our findings.

      Reviewer #2 (Public Review):

      Research on peptide hormones released from the Pituitary, including Prolactin, has shown that the hormones are stored as functional amyloids. Furthermore, it is well established that Prolactin and Galanin are co-stored in secretory granules of the anterior pituitary until they are released into the bloodstream. However, the mechanism by which hormones are stored and released remains a mystery. This study describes the co-aggregation and functional heterotypic amyloid formation of Prolactin and neuropeptide Galanin in secretory granules. This study suggests that the Prolactin and Galanin interact with each other at high specificity and form functional amyloids. These functional amyloids are heterotypic. Moreover, they demonstrated that Prolactin-Galanin amyloids can form surface-induced secondary fibrils on the surfaces of others. Galanin forms secondary fibrils on Prolactin seeds and Prolactin does not form secondary fibrils on Galanin seeds, indicating that this process occurs in a highly regulated manner. Additionally, they analyzed the release of hormone monomers from amyloids in vitro. They found that Prolactin-Galanin functional amyloids are released faster than amyloids formed by Prolactin or Galanin homotypic fibrils. To understand the interactions between Prolactin and Galanin at the atomic level, they have also performed molecular dynamics simulations and docking studies.

      A high point of the study is the identification of Prolactin's capability to cross-seed Galanin. This causes amyloid fibrils to be formed. However, in contrast, the Galanin failed in cross-seed the Prolactin. It emphasizes the specificity and regulation in functional amyloid formation. Additionally, the understanding of the interactions between Prolactin and Galanin at the atomic level from MD simulations strengthens the findings. The results of this study did not confirm the possibility of heteromeric fibril formation.

      In this study, the authors succeeded in achieving their goals and their conclusions were backed up by their results.

      Undoubtedly, this work will have a significant impact on the field of endocrinology and protein aggregation. By studying secretory granules of the pituitary gland, researchers have successfully stepped one step closer to understanding peptide hormone synthesis and release.

      We appreciate you for the elaborate analysis of our manuscript and for commenting on the relevance of our findings.

      Reviewer #3 (Public Review):

      Maji and coworkers present a tour de force study of the coaggregation of two hormones, prolactin and galanin. Protein aggregation in vivo is much more of a "messy" affair than in the tidy lab of an Eppendorf tube and the authors demonstrate an intimate collaboration between these two hormones in the aggregation process. Their work ranges from IHC of tissue slices over experimental biophysics to computational studies, presented in a clear, user-friendly and illustrative manner and overall, the conclusions are sound. I found the cartoon diagrams in various figures particularly helpful and of high quality.

      Whether their conclusions can be extended to higher-order complexes between multiple hormones (even closer to real life) is the next question to address - but the mind boggles at the number of possibilities to explore.

      We thank you for your detailed analysis of the studies in our manuscript and for highlighting the importance of our work.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors find CpGs within 500Kb of a gene that associate with transcript abundance (cis-eQTMs) in children from the HELIX study. There is much to admire about this work. With two notable exceptions, their work is solid and builds/improves on the work that came before it. Their catalogue of eQTMs could be useful to many other researchers that utilize methylation data from whole blood samples in children. Their annotation of eQTMs is well thought out and exhaustive. As this portion of the work is descriptive, most of their methods are appropriate.

      Unfortunately, their use of results from a model that does not account for cell-type proportions across samples diminishes the utility and impact of their findings. I believe that their catalog of eQTMs contains a great deal of spurious results that primarily represent the differences in cell-type proportions across samples.

      Lastly, the authors postulate that the eQTM gene associations found uniquely in their unadjusted model (in comparison to results from a model that does account for cell type proportion) represent cell-specific associations that are lost when a fully-adjusted model is assumed. To test this hypothesis, the authors appear to repurpose methods that were not intended for the purposes used in this manuscript. The manuscript lacks adequate statistical validation to support their repurposing of the method, as well as the methodological detail needed to peer review it. This section is a distraction from an otherwise worthy manuscript. But provide evidences that enriched for cell sp CpGs.

      Major points

      1. Line 414-475: In this section, the authors are suggesting that CpGs that are significant without adjusting for cell type are due to methylation-expression associations that are found only in one cell type, while association found in the fully adjusted model are associations that are shared across the cell types. I do not agree with this hypothesis, as I do not agree that the confounding that occurs when cell-type proportions are not accounted for would behave in this way. Although restricting their search for eQTMs to only those CpGs proximal to a gene will reduce the number of spurious associations, a great deal of the findings in the authors' unadjusted model likely reflect differences in cell-type proportions across samples alone. The Reinius manuscript, cited in this paper, indicates that geneproximal CpGs can have methylation patterns that vary across cell types.

      Following reviewers’ recommendations, we have reconsidered our initial hypothesis about the role of cellular composition in the association between methylation and gene expression. Although we still think that some of the eQTMs only found in the model unadjusted for cellular composition could represent cell specific effects, we acknowledge that the majority might be confounded by the extensive gene expression and DNA methylation differences between cell types. Also, we recognize that more sophisticated statistical tests should be applied to prove our hypothesis. Because of this, we have decided to report the eQTMs of the model adjusted for cellular composition in the main manuscript and keep the results of the model unadjusted for cellular composition only in the online catalogue.

      1. Line 476-488: Their evidence due to F-statistics is tenuous. The authors do not give enough methodological detail to explain how they're assessing their hypothesis in the results or methods (lines 932-946) sections. The methods they give are difficult to follow. The results in figure S19A are not compelling. The citation in the methods (by Reinius) do not make sense, because Reinius et al did not use F-statistics as a proxy for cell type specificity. The citation that the authors give for this method in the results does not appear to be appropriate for this analysis, either. Jaffe and Irizarry state that a CpG with a high Fstatistic indicates that the methylation at that CpG varies across cell type. They suggest removing these CpGs from significant results, or estimating and correcting for cell type proportions, as their presence would be evidence of statistical confounding. The authors of this manuscript indicate that they find higher F-statistics among the eQTMs uniquely found in the unadjusted model, which seems to only strengthen the idea that the unadjusted model is suffering from statistical confounding.

      We recognize the miss-interpretation of the F-statistic in relation to cellular composition. We have deleted all this part from the updated version of the manuscript.

      1. The methods used to generate adjusted p-values in this manuscript are not appropriate as they are written. Further, they are nothing like the methods used in the paper cited by the authors. The Bonder paper used permutations to estimate an empirical FDR and cites a publication by Westra et al for their method (below). The Westra paper is a better one to cite, because the methods are more clear. Neither the Bonder nor the Westra paper uses the BH procedure for FDR.

      Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238-1243 (2013).

      We apologize for this misleading citation. Although Bonder et al applied a permutation approach to adjust for multiple testing, our approach was inspired by the method applied in the GTEx project (GTEx consortium, 2020), using CpGs instead of SNPs. The citation has been corrected in the manuscript. Moreover, we have explained in more detail the whole multiple-testing processes in the Material and Methods section (page 14, line 316):

      “To ensure that CpGs paired to a higher number of Genes do not have higher chances of being part of an eQTM, multiple-testing was controlled at the CpG level, following a procedure previously applied in the Genotype-Tissue Expression (GTEx) project (Gamazon et al., 2018). Briefly, our statistic used to test the hypothesis that a pair CpGGene is significantly associated is based on considering the lowest p-value observed for a given CpG and all its pairs Gene (e.g. those in the 1 Mb window centered at the TSS). As we do not know the distribution of this statistic under the null, we used a permutation test. We generated 100 permuted gene expression datasets and ran our previous linear regression models obtaining 100 permuted p-values for each CpG-Gene pair. Then, for each CpG, we selected among all CpG-Gene pairs the minimum p-value in each permutation and fitted a beta distribution that is the distribution we obtain when dealing with extreme values (e.g. minimum) (Dudbridge and Gusnanto, 2008). Next, for each CpG, we took the minimum p-value observed in the real data and used the beta distribution to compute the probability of observing a lower p-value. We defined this probability as the empirical p-value of the CpG. Then, we considered as significant those CpGs with empirical p-values to be significant at 5% false discovery rate using BenjaminiHochberg method. Finally, we applied a last step to identify all significant CpG-Gene pairs for all eCpGs. To do so, we defined a genome-wide empirical p-value threshold as the empirical p-value of the eCpG closest to the 5% false discovery rate threshold. We used this empirical p-value to calculate a nominal p-value threshold for each eCpG, based on the beta distribution obtained from the minimum permuted p-values. This nominal p-value threshold was defined as the value for which the inverse cumulative distribution of the beta distribution was equal to the empirical p-value. Then, for each eCpG, we considered as significant all eCpG-Gene variants with a p-value smaller than nominal p-value.”

      References:<br /> GTEx consortium, The GTEx Consortium atlas of genetic regulatory effects across human tissues, Science (2020) Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.

      Reviewer #2 (Public Review):

      Strength:

      Comprehensive analysis Considering genetic factors such as meQTL and comparing results with adult data are interesting.

      We thank the reviewer for his/her positive feedback on the manuscript. We agree that the analysis of genetic data and the comparison with eQTMs described in adults are two important points of the study.

      Weakness:

      • Manuscript is not summarized well. Please send less important findings to supplementary materials. The manuscript is not well written, which includes every little detail in the text, resulting in 86 pages of the manuscript.

      Following reviewers’ comments, we have simplified the manuscript. Now only the eQTMs identified in the model adjusted for cellular composition are reported. In addition, functional enrichment analyses have been simplified without reporting all odds ratios (OR) and p-values, which can be seen in the Figures.

      • Any possible reason that the eQTM methylation probes are enriched in weak transcription regions? This is surprising.

      Bonder et al also found that blood eQTMs were slightly enriched for weak transcription regions (TxWk). Weak transcription regions are highly constitutive and found across many different cell types (Roadmap Epigenetics Consortium, 2015). However, hematopoietic stem cells and immune cells have lower representation of TxWk and other active states, which may be related to their capacity to generate sub-lineages and enter quiescence.

      Given that we analyzed whole blood and that ROADMAP chromatin states are only available for blood specific cell types, each CpG in the array was annotated to one or several chromatin states by taking a state as present in that locus if it was described in at least 1 of the 27 bloodrelated cell types. By applying this strategy we may be “over-representing” TxWk chromatin states, in the case TxWk are cell-type specific. As a result, even if each blood cell type might have few TxWk, many positions can be TxWk in at least one cell type, inflating the CpGs considered as TxWk. This might have affected some of the enrichments.

      On the other hand, CpG probe reliability depends on methylation levels and variance. TxWk regions show high methylation levels, which tend to be measured with more error. This also might have impacted the results, however the analysis considering only reliable probes (ICC >0.4) showed similar enrichment for TxWk.

      Besides these, we do not have a clear answer for the question raised by the reviewer.

      References:

      Bonder MJ, Luijk R, Zhernakova D V, Moed M, Deelen P, Vermaat M, et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet [Internet]. 2017 [cited 2017 Nov 2];49:131–8. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27918535

      Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu YC, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh KH, Feizi S, Karlic R, Kim AR, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJ, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai LH, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M. Integrative analysis of 111 reference human epigenomes. Nature. 2015 Feb 19;518(7539):317-30. doi: 10.1038/nature14248. PMID: 25693563; PMCID: PMC4530010.

      • The result that the magnitude of the effect was independent of the distance between the CpG and the TC TSS is surprising. Could you draw a figure where x-axis is the distance between the CpG site and TC TSS and y-axis is p-value?

      As suggested by the reviewer, we have taken a more detailed look at the relationship between the effect size and the distance between the CpG and the TC’s TSS. First, we confirmed that the relative orientation (upstream or downstream) did not affect the strength of the association (p-value=0.68). Second, we applied a linear regression between the absolute log2 fold change and the log10 of the distance (in absolute value), finding that they were inversely related. We have updated the manuscript with this information (page 22, line 504):

      “We observed an inverse linear association between the eCpG-eGene’s TSS distance and the effect size (p-value = 7.75e-9, Figure 2B); while we did not observe significant differences in effect size due to the relative orientation of the eCpG (upstream or downstream) with respect to the eGene’s TSS (p-value = 0.68).”

      Results are shown in Figure 2B. Of note, we winsorized effect size values in order to improve the visualization. The winsorizing process is also explained in Figure 2 legend. Moreover, we have done the plot suggested by the reviewer (see below). It shows that associations with smallest p-values are found close to the TC’s TSS. Nonetheless, as this pattern is also observed for the effect sizes, we have decided to not include it in the manuscript.

      • Concerned about too many significant eQTMs. Almost half of genes are associated with methylation. I wonder if false positives are well controlled using the empirical p-values. Using empirical p-value with permutation may mislead since especially you only use 100 permutations. I wonder the result would be similar if they compare their result with the traditional way, either adjusting p-values using p-values from entire TCs or adjusting pvalues using a gene-based method as commonly used in GWAS. Compare your previous result with my suggestion for the first analysis.

      Despite the number of genes (TCs) whose expression is associated with DNA methylation is quite high, we do not think this is due to not correctly controlling false positives. Our approach is based on the method used by GTEx (GTEx consortium) and implemented in the FastQTL package (Ongen et al. 2016), to control for positives in the eQTLs discovery. As in GTEx, we run 100 permutations to estimate the parameters of a beta distribution, which we used to model the distribution of p-values for each CpG. Then, to correct for the number of TCs among significant CpGs, we applied False Discovery Rate (FDR) at a threshold < 0.05. Finally, we defined the final set of significant eQTMs using the beta distribution defined in a previous step.

      For illustration, we compared the number of eQTMs with our approach to what we would obtain by uniquely applying the FDR method (adjusted p-value <0.05), getting fewer associations with our approach: eQTMs (45,203 with FDR vs 39,749 with our approach), eCpGs (24,611 vs 21,966) and eGenes (9,937 vs 8,886). Among the 8,886 significant eGenes, 6,288 of them are annotated to coding genes, thus representing 27% of the 23,054 eGenes coding for a gene included in the array.

      References:

      GTEx consortium, The GTEx Consortium atlas of genetic regulatory effects across human tissues, Science (2020) Sep 11;369(6509):1318-1330. doi: 10.1126/science.aaz1776.

      Ongen et al. Fast and efficient QTL mapper for thousands of molecular phenotypes, Bioinformatics (2016) May 15;32(10):1479-85. doi: 10.1093/bioinformatics/btv722. Epub 2015 Dec 26.

      • I recommend starting with cell type specific results. Without adjusting cell type, the result doesn't make sense.

      As suggested by other reviewers, we have withdrawn the model unadjusted for cellular composition.

      Reviewer #3 (Public Review):

      Although several DNA methylation-gene expression studies have been carried out in adults, this is the first in children. The importance of this is underlined by the finding that surprisingly few associations are observed in both adults and children. This is a timely study and certain to be important for the interpretation of future omic studies in blood samples obtained from children.

      We agree with the reviewer that eQTMs in children are important for interpreting EWAS findings conducted in child cohorts such as those of the Pregnancy And Childhood Epigenetics (PACE) consortium.

      It is unfortunate that the authors chose to base their reporting on associations unadjusted for cell count heterogeneity. They incorrectly claim that associations linked to cell count variation are likely to be cell-type-specific. While possible, it is probably more likely that the association exists entirely due to cell type differences (which tend to be large) with little or no association within any of the cell types (which tend to be much smaller). In the interests of interpretability, it would be better to report only associations obtained after adjusting for cell count variation.

      Following reviewers’ recommendations, we have reconsidered our initial hypothesis about the role of cellular composition in the association between methylation and gene expression. Although we still think that some of the eQTMs only found in the model unadjusted for cellular composition could represent cell specific effects, we acknowledge that the majority might be confounded by the extensive gene expression and DNA methylation differences between cell types. Also, we recognize that more sophisticated statistical tests should be applied to prove our hypothesis. Because of this we have decided to report the eQTMs of the model adjusted for cellular composition in the main manuscript and keep the results of the model unadjusted for cellular composition only in the online catalogue.

      Several enrichments could be related to variation in probe quality across the DNA methylation arrays.

      For example, enrichment for eQTM CpG sites among those that change with age could simply be due to the fact age and eQTM effects are more likely to be observed for CpG sites with high quality probes than low quality probes. It is more informative to instead ask if eQTM CpG sites are more likely to have increasing rather than decreasing methylation with age. This avoids the probe quality bias since probes with positive associations with age would be expected to have roughly the same quality as those with negative associations with age. There are several other analyses prone to the probe quality bias.

      See answer to question 2, below.

    1. Author Response

      Reviewer #1 (Public Review):

      Peter Dietrich and his collaborators performed a complex experimental study aiming at exploring an interactive effect of selection history (offspring of plants grown in low- and high-diversity plots), soil origin (soil from low- and high-diversity plots) and experimental treatments (drought or nitrogen addition) on performance of four grass species. The authors did so to examine eco-evolutionary feedbacks between plant community diversity and global change drivers. Specifically, the authors hypothesize that decline in species richness due to the drivers can induce a selection regime that will select for traits that will make species more vulnerable to the further effects of global change drivers, amplifying thus the initial diversity decline. The authors indeed found that all three factors, and their interaction can affect plant performance, though the effects detected here were often species-specific.

      We thank the reviewer for the positive evaluation of our study. In the revised version, we express more clearly the fact that the plant responses to global change were species-specific. We agree that this is a highly relevant finding that needs to be stressed.

      Reviewer #2 (Public Review):

      The authors present work from a greenhouse experiment testing the influence of plant and soil histories on seedling responses to global change. They grew seedlings of 4 grass species via seeds collected from different historical levels of plant community diversity (2 vs 6 species) as well as in home and away soil inoculum and a combination of these. The authors find that certain plant species respond differently to global change depending on the historical plant diversity (6 vs 2 species) and to a lesser extent the soil history. These effects were primarily species specific and affected plant traits rather than biomass.

      Strengths of this study include the thorough experimental approach and novel question regarding how plant diversity may modulate plant-soil interactions under global change. Weaknesses of this study include weak or unclear support for several of the proposed hypotheses as well as lack of clear results to support the main conclusions and title of this paper.

      We are thankful for the detailed advice to improve our manuscript. We added additional text to the introduction to better introduce our hypotheses and rephrased the title, abstract, and conclusion to better match our results.

    1. Author Response

      Reviewer #2 (Public Review):

      The visual system must extract two basic features of visual stimuli: luminance, which we perceive as brightness, and contrast, the change in luminance over space or time (this paper focuses on changes over time). Contrast is separately processed by ON and OFF pathways, which encode luminance increments or decrements, respectively. Contrast must be robustly detected even if the overall luminance changes rapidly, as might occur if an animal is moving in and out of shadows. This paper addresses how such a luminance correction occurs in the fly.

      In the fly, three types of first-order interneurons - L1, L2, and L3 - transmit information from photoreceptors to the medulla, where ON and OFF encoding emerges. Previous work suggested that all three interneurons primarily encode contrast signals and that they project to distinct pathways: L1 to the ON pathway and L2 and L3 to the OFF pathway. Ketkar et al. show that, contrary to this model, these interneurons encode both contrast and luminance in specific ways and are not cleanly segregated into ON versus OFF inputs.

      This study reveals several new insights into early visual processing that are interesting and well-supported by the data:

      1) The authors show that behavioral responses to ON stimuli can compensate for rapid changes in luminance. However, the purported sole input to the ON pathway, L1, shows activity that is highly dependent on luminance. This suggests that a luminance correction must arise downstream of L1. These results are analogous to findings previously made by the same group regarding the OFF pathway (Ketkar et al., 2020). The previous paper showed that L2 provides contrast information to the OFF pathway, and L3 provides luminance information to allow for a luminance correction in downstream contrast encoding. But unlike the multiple inputs to the OFF pathway, the ON pathway was thought to only receive input from L1, provoking the question of whether L1 is able to provide both contrast and luminance information.

      2) Using well-designed calcium imaging studies, the authors surveyed the responses of the three interneurons and found that they encode different stimulus features: L1 encodes both contrast and luminance, L2 purely encodes contrast, and L3 purely encodes luminance (with a different dependence than L1). These are interesting and important findings revealing how both contrast and luminance encoding are distributed across the three interneurons.

      3) Using neuronal manipulations, the authors dissected the contributions of the three interneurons to ON and OFF behavior under changing luminance. These experiments showed that L1 and L3 are required for the luminance correction in the behavior. Moreover, the finding that all three interneurons contribute to both ON and OFF behavior contrasts with the existing model of segregated pathways. Thus, this paper could change the way we think about early visual processing in the fly: rather than relaying similar information to distinct downstream pathways, first-order interneurons relay distinct information to common pathways.

      Overall, the major claims of this paper are important and supported by the experiments. There are just a few concerns that I would note:

      Thank you for the overall positive evaluation of our work, as well as for the constructive criticism, which we are going to address below.

      1) The authors state that they have shown luminance invariance in ON behavior (e.g. line 376-377 of the Discussion), but this is not entirely accurate: the ON behavior decreases as luminance increases. This is still an interesting effect since it's the opposite of what L1 activity does, so it's clear that the circuit is implementing a luminance correction, but it is not "luminance invariance".

      As pointed out in response to essential comment #2, we carefully edited the manuscript to talk about ‘near’ luminance invariance, or data approaching luminance invariance. More prominently, we rephrased the text to highlight the need for a luminance gain to scale behavioral responses to contrast, even if the resulting behavior is not entirely luminance invariant.

      2) The visual stimuli presented for most imaging experiments (full-field) are not the same as those presented for behavior (moving edges). It is possible neuronal responses and their encoding of luminance and contrast may differ if tested with the moving edge stimuli (if so, this would be concerning). The authors did image L1 with both types of stimuli and could compare these responses. Also, testing behavior at 34º and imaging at 20º presents a possible discrepancy in comparing these data.

      We use moving ON edges in Figure 1, and these data suggest that the transient response of L1 scales with step changes in luminance, consistent with data in Figure 2B. Although we did not point this out in the paper, the L1 responses in Figure 1 also decay to different response levels, consistent with the luminance-sensitive component that static stimuli reveal in Figure 2. Furthermore, for other ongoing projects in the lab, we have for example measured physiological responses in L2 with the same stimuli used in behavior, and there is no discrepancy with the data reported here. Overall, there is no reason to believe, following a vast amount of literature in Drosophila and other flies, that LMCs would respond any different to moving vs. static stimuli.

      We can additionally point out that the behavioral data of L3 silencing (at 34ºC) nicely correlate with physiological contrast responses of L1 and L2 (at 20ºC, predicted from electrophysiological recordings for LMCs in Ketkar et al. 2020, measured for L1 here). Many previous studies, for example in motion detection, have linked data from physiological recordings at room temperature with behavioral experiments done at higher temperature (e.g., Ammer et al., 2015; Clark et al., 2011; Creamer et al., 2019; Fisher et al., 2015; Leonhardt et al., 2017; Salazar-Gatzimas et al., 2016; Serbe et al., 2016; Silies et al., 2013; Strother et al., 2017). We therefore do not think that these are major concerns.

      3) I find it puzzling that silencing L1 has little effect on ON behavior at 100% contrast and varying luminance (Figure 3A), but severely affects ON behavior to 100% contrast (and lower values) when different contrasts are interleaved (Figure S1). The authors note this but do not provide a clear explanation of why this might be the case. Aside from mechanism, it is not clear whether the difference is due to varying luminance in the first experiment or varying contrast in the second one (e.g. they could test 100% contrast without varying luminance).

      The two stimulus sets used here do not allow us to pinpoint why the L1 silencing phenotype differs between them, since they comprise more than one difference as discussed above (see point 4) in “Essential Revisions”). We now include two additional experiments that dissect the role of different stimulus parameters (Supp. Figure 2). To understand whether the difference is due to varying luminance, we tested responses to ON edges of fixed (100%) contrast and luminance at the same stimulus parameters (motion duration, speed) as used in Figure 3, and did not find reduced turning responses when silencing L1. Thus, varying luminance does not change the effect of L1 on ON behavior. However, when repeating this experiment with a bright inter-stimulus interval, L1 silencing lead to a strong response deficit. Therefore, differences in the interval luminance explain the differences in the L1 silencing phenotype observed not only in this study but also across studies. Although we hypothesize a role of contrast adaptation that may function differently with altered contrast statistics, a more detailed investigation would be necessary to understand the mechanism. Nevertheless, our experiments allow us to conclude that L1 is not the sole major input to the ON pathway, even though it is required under certain stimulus conditions.

      4) I do not entirely agree with the authors' interpretation of the L1 ort rescue experiment for OFF behavior. They state that rescue flies "responded similarly to positive controls". However, the graph shows that the rescue flies generally fall in between the mutant and heterozygote control flies; they resemble the controls at low luminance but resemble the mutants at high luminance. One may conclude that L1 is sufficient to enhance OFF behavior at low luminance, but it is a stretch to say it's a complete rescue.

      Sorry, we just meant to say that they “responded similarly to positive controls (...) at low luminance”, but the sentence was badly written. We corrected this to: “L1 ort rescue flies responded similarly to positive controls at low luminances, rescuing responses to OFF edges at dim backgrounds.”

      5) The authors typically use t-tests to analyze experiments with 2 variables (genotype and luminance) and 3 or more conditions per variable. This is not the most appropriate statistical test; typically one would use a two-way ANOVA. At the least, it should be clear whether they are performing corrections for multiple comparisons if performing many t-tests on the same dataset.

      Thank you for the suggestion, we now use a two-way ANOVA followed by corrected pairwise comparisons and state this clearly in the figure captions (also addressed above in essential comment #5).

      Reviewer #3 (Public Review):

      Ketkar et al combine calcium imaging and behavioral experiments to investigate the encoding of luminance and contrast in 3 first-order interneurons in the Drosophila lamina: L1, L2, and L3, as well as the role of these signals in moving ON edge behavior across luminance. The behavioral experiments are well performed. The rescue experiments are particularly interesting. Together with silencing they support and nicely extend previous work showing that L1/2/3 are not simply segregated between ON and OFF pathways. My main issue is the link that the authors make between the cellular responses and the behaviors performed and therefore the overall conclusions and claims of the paper about the roles of contrast vs luminance encoding of each neuron type (particularly L1) in the behaviors.

      Major concerns:

      1) The authors state that the main behavior they study, namely optomotor response to moving light edges at 100% contrast, is "luminance invariant". A strict definition of this would be that behavioral responses are constant with increasing luminance. However, there are very few plots in this paper where this is the case. In almost all examples, the response is decreasing with respect to increasing luminance. The authors do qualify a "nearly" invariant behavior, but this does not change the fact that interpretation of the data in the context of the framing of the paper is often problematic.

      We thank the reviewer for this critical comment. The main point (that we apparently failed to make clear enough) is that there is a clear requirement for a luminance gain. Physiological LMC responses measured using calcium imaging to ON stimuli in Figure 1, or predicted from previous electrophysiological recordings to OFF stimuli in (Ketkar et al., 2020) cannot account for any of the (control) behavioral data. We now edited the text to tone down statements about luminance invariance, and instead highlighted the need for a luminance gain.

      2) The manuscript would benefit from clear definitions of luminance and contrast, as well as an explanation of how contrast and luminance sensitivity can be inferred from experiments. In particular, the authors use transient vs. sustained response properties in L1, L2, and L3 as indicators of contrast and luminance sensitivity, but this is not stated clearly. It would be important to explain this to the reader early on.

      We now added definitions of general terms to the introduction and added data and analysis to the manuscript (Figure S1, and Figure 2B-D) to more clearly test which component of the neurons’ responses encode contrast or luminance.

      3) In the manuscript, it is often stated that "calcium imaging experiments reveal that each first order interneuron is unique in its contrast and luminance encoding properties" (line 110). This was shown clearly for L2 and L3 in their previous work in Ketkar et al. 2020, with a welldesigned two-step stimulus that was able to tease apart contrast vs. luminance invariance. Unfortunately it does not seem that this level of experimental detail and analysis is applied to L1 here. In particular, the authors state " L1 encodes both contrast and luminance in distinct response components." Line 112, in the summary of their findings. I would not agree that the authors have actually shown this properly in this manuscript.

      Addressed above, in point 6 of “Essential Revisions”

      4) The results as they are stated, are at times not well supported by the data. The manuscript would benefit from a careful assessment of the accuracy and precision of the language used to interpret the data. Sometime just moving some conclusions to the discussion and explaining the assumptions made to reach a particular conclusion would be enough. A few of examples:

      We carefully edited the entire manuscript, in addition to addressing the specific points below.

      o Figure 2: "Lamina neuron types L1-L3 are differently sensitive to contrast and luminance". It is overall true that from the raw traces, the response are different. However the quantification in C-E only pertains to luminance.

      As stated above, we now did further analysis on the contrast encoding properties of L1 and L2 and pointed out the major differences between these neurons (Figure 2B-D).

      o Figure 3: "L1 is not required but sufficient for ON behavior across luminance". The data convincingly shows this. I would however point out that the statement "this data [..] highlights its behavioral relevant role of its luminance component" line 231 is an overstatement.

      We deleted this statement at the end of the paragraph.

      o Figure 6: "L1 luminance signal is required and sufficient for OFF behavior" the data presented shows convincingly that when L1 is inactive the behavior becomes (more) intensity variant. However, it does not show that it is the "luminance signal" in L1 that is required for this effect. In general, because L1 has a sustained and a transient response, it is difficult to strictly implicate one or the other in supporting any behavior, short of manipulating L1 to make it fully transient or fully sustained.

      We agree. The figure title now reads “L1 function is required and sufficient for OFF behavior”.

      o It is often not clear which conclusions stem from this work and which from their previous work Ketkar et al. 2020, or even other previous work on contrast sensitivity in particular. Clarifying this might help with my concern about statements not well supported by the data in this paper, and also justify their overall novelty. In general the manuscript assumes familiarity with this previous work, which is not always helpful for the reader.

      As stated above, we now more clearly separate previous findings from novel findings in the abstract, and throughout the text. We also expanded the introduction to better explain the core concepts that are needed to understand this work, without having read Ketkar et al. 2020.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Fernandes et al. take advantage of synthetic constructs to test how Bicoid (Bcd) activates its downstream target Hunchback (Hb). They explore synthetic constructs containing only Bcd, Bcd and Hb, and Bcd and Zelda binding sites. They use these to develop theoretical models for how Bcd drives Hb in the early embryo. They show that Hb sites alone are insufficient to drive further Hb expression.

      The paper's first half focuses on how well the synthetic constructs replicate the in vivo expression of hb. This approach is generally convincing, and the results are interesting. Consistent with previous work, they show that Bcd alone is sufficient to drive an expression profile that is similar to wild‐type, but the addition of Hb and Zelda are needed to generate precise and rapid formation of the boundaries. The experimental results are supported by modelling. The model does a nice job of encapsulating the key conclusions and clearly adds value to the analysis.

      In the second part of the paper, the authors use their synthetic approach to look at how the Hb boundary alters depending on Bcd dosage. This part asks whether the observed Bcd gradient is the same as the activity gradient of Bcd (i.e. the "active" part of Bcd is not a priori the same as the protein gradient). This is a very interesting problem and good the authors have tried to tackle this. However, the strength of their conclusions needs to be substantially tempered as they rely on an overestimation of the Bcd gradient decay length.

      Comments:

      ‐ My major concern regards the conclusions for the final section on the activity gradient. In the Introduction it is stated: "[the Bcd gradient has] an exponential AP gradient with a decay length of L ~ 20% egg‐length (EL)". While this was the initial estimate (Houchmandzadeh et al., Nature 2002), later measurements by the Gregor lab (see Supplementary Material of Liu et al., PNAS 2013) found that "The mean length constant was reduced to 16.5 ± 0.7%EL after corrections for EGFP maturation". The original measurements by Houchmandzadeh et al. had issues with background control, that also led to the longer measured decay length. In later work, Durrieu et al., Mol Sys Biol 2018, found a similar scale for the decay length to Liu et al. Looking at Figure 5, a value of 16.5%EL for the decay length is fully consistent with the activity and protein gradients for Bcd being similar. In short, the strength of the conclusions clearly does not match the known gradient and should be substantially toned down.

      The reviewer is right: several studies aiming to quantitatively measure the Bicoid protein gradient ended‐up with quite different decay lengths.

      A summary of the various decay lengths measured, and the method used for these measurements is given below:

      As indicated, these measurements are quite variable among the different studies and the differences can potentially be attributed to different methods of detection (antibody staining on fixed samples vs fluorescent measurements on live sample) or to the type of protein detected (endogenous Bicoid vs fluorescently tagged).

      We agree with the reviewer that given these discrepancies, the exact value of the Bcd protein gradient decay length is not known and that we only have measurements that put it in between 16 and 25 % EL (see the Table above). Therefore, we agree that we should tone down the difference between the protein vs activity gradient and focus on the measurements of the effective activity gradient decay length allowed by our synthetic reporters. This allows us to revisit the measurement of the Hill coefficient of the transcription step‐like response, which is based on the decay‐length for the Bcd protein gradient, and assumed in previous published work to be of 20% EL (Gregor et al., Cell, 2007a; Estrada et al., 2016; Tran et al., PLoS CB, 2018). Importantly, the new Hill coefficient allows us to set the Bcd system within the limits of an equilibrium model.

      As mentioned by the reviewer, it is possible that the decay length of the protein gradient measured using antibody staining (Houchmandzadeh et al,, Nature, 2002) was not correct due to background controls. Such measurements were also performed in Xu et al. (2015) which agree with the original measurements (Houchmandzadeh et al., Nature 2002). As indicated in the table above, all the other measurements of the Bcd protein gradient decay length were done using fluorescently tagged Bcd proteins and we cannot exclude the possibility the wt vs tagged protein might have different decay lengths due to potentially different diffusion coefficients or half‐lives. Before drawing any conclusion on the exact value of the endogenous Bcd protein gradient decay length, it is essential to measure it again in conditions that correct for the background issues for immuno‐staining as it was done in Liu et al., PNAS, 2013 for the Bcd‐eGFP protein. In this study, the authors only measured the decay length of the Bcd fusion protein using immuno‐staining for the Bcd protein. Unfortunately, in this study, the authors did not measure again the decay length of the endogenous Bcd protein gradient using immuno‐staining and the same procedure for background control. Therefore, they do not firmly exclude the possibility that the endogenous vs tagged Bcd proteins might have different decay length.

      We thank the reviewer for his comment which helped us to clarify the message. In addition, as there is clearly an issue for the measurements of the Bcd protein gradient, we added a section in the SI (Section E) and a Table (Table S4) describing the various decay length measured for the Bcd or the Bcd‐fluorescently tagged protein gradients from previous studies. In the discussion, together with the possibility that there might be a protein vs activity gradient (as we originally proposed and believe is still a valid possibility), we also discuss the alternative possibility proposed by the reviewer which is that the protein vs activity gradients have the same decay lengths but that the decay length of the Bcd protein gradient was potentially not correctly evaluated.

      ‐ All of the experiments are performed in a background with the hb gene present. Does this impact on the readout, as the synthetic lines are essentially competing with the wild‐type genes? What controls were done to account for this?

      We agree with the reviewer that this concern might be particularly relevant at the hb boundary where a nucleus has been shown to only contain ~ 700 Bicoid molecules (Gregor et al., Cell, 2007b). However, ~1000 Bicoid binding regions have been identified by ChIP seq experiments in nc14 embryos (Hannon et al., Elife, 2017) and given that several Bcd binding sites are generally clustered together in a Bcd region, the number of Bcd binding sites in the fly genome is likely larger than 1000. It is much greater than the number of Bicoid binding sites in our synthetic reporters. Therefore, we think that it is unlikely that adding the synthetic reporters (which in the case of B12 only represents at most 1/100 of the Bcd binding sites in the genome) will severely alter the competition for Bcd binding between the other Bcd binding sites in the genome. Additionally, the insertion of a BAC spanning the endogenous hb locus with all its Bcd‐dependent enhancers did not affect (as far as we can tell) the regulation of the wildtype gene (Lucas, Tran et al., 2018).

      We have added a sentence concerning this point in the main text (lines 108 to 111).

      ‐ Further, the activity of the synthetic reporters depends on the location of insertion. Erceg et al. PLoS Genetics 2014 showed that the same synthetic enhancer can have different readout depending on its genomic location. I'm aware that the authors use a landing site that appears to replicate similar hb kinetics, but did they try random insertion or other landing site? In short, how robust are their results to the specific local genome site? This should have been tested, especially given the boldly written conclusions from the work.

      This concern of the reviewer has been tested and is addressed Fig S1 where we compare two random insertions of the hb‐P2 transgene (on chromosome II and III; Lucas, Tran et al., 2018) and the insertion at the VK33 landing site that was used for the whole study. As shown Fig. S1, the dynamics of transcription (kymographs) are very similar. In the main text, the reference Fig. S1 is found in the Materials and Methods section (bottom of the 1st paragraph concerning the Drosophila stocks, lines 518).

      ‐ Related to the above, it's also not obvious that readout is linear ‐ i.e. as more binding sites are added, there could be cooperativity between binding domains. This may have been accounted for in the model but it is not clear to me how.

      The reviewer is totally correct. It is clear from our data that readout is not linear: comparing (increase of 1.5 X in the number of BS) B6 with B9 leads to a 4.5 X greater activation rate and this argues against independent activation of transcription by individual bound Bcd TF. There is almost no impact of adding 3 more sites when comparing B9 to B12 (even though it corresponds to an increase of 1.33 X in the number of BS). This issue has been rephrased in the main text (lines 200 to 203) and further developed for the modeling aspects in the SI section C and Figure S3. It is also discussed in the second paragraph of the discussion (lines 380 to 383).

      ‐ It would be good in the Introduction/Discussion to give a broader perspective on the advantages and disadvantages of the synthetic approach to study gene regulation. The intro only discusses Tran et al. Yet, there is a strong history of using this approach, which has also helped to reveal some of the approaches shortcoming. E.g. Gertz et al. Nature 2009 and Sharon et al. Nature Biotechnology 2012. Again, I may have missed, but from my reading I cannot see any critical analysis of the pros/cons of the synthetic approach in development. This is necessary to give readers a clearer context.

      One sentence was added in the introduction concerning this point (lines 79 to 82).

      A short review concerning the synthetic approach in development has also been added at the beginning of the discussion (lines 347 to 359).

      Reviewer #2 (Public Review):

      It is known that Bicoid increases in concentration across the syncytial division cycles, the gradient length scale for Bicoid does not change, and hunchback also increases in concentration during the syncytial cycles but the sharp boundary of the hunchback gradient is constantly seen despite the change in concentration of Bicoid. This manuscript shows that by increasing the Bicoid concentration or by adding Zelda binding sites, the expression of hunchback can be recapitulated to that of a previously studied promoter for hunchback.

      I have the following comments to understand the implications of the study in the context of increasing concentrations of Bicoid during the syncytial division cycles:

      ‐ Bicoid itself is also increasing over the syncytial division cycles, how does this change in concentration of Bicoid affect the activation of the hunchback promoter given the cooperative binding of Bicoid and Bicoid and Zelda as documented by the study?

      We thank the reviewer for this remark about the dynamics of the Bcd gradient, which we may have taken for granted. A seminal work on the dynamics of the Bcd gradient using fluorescent‐tagged Bcd (Gregor et al, Cell, 2007a) has shown that the gradient of Bcd nuclear concentration (this nuclear concentration is the one that matter for transcription) remains stable over nuclear cycles, despite a global increase of Bcd amount in the embryo. This can be explained by the fact that Bcd molecules are imported in the nuclei and that the number of nuclei double at every cycle, such that both processes compensate each other. Thus, we assumed that the gradient of Bcd nuclear concentration was stable over nc11 to nc13.

      We have clarified this assumption in the model section in the manuscript (lines 165‐168).

      Supporting our assumption, when looking at the transcription dynamics regulated by Bcd, in Lucas et al, PLoS Gen, 2018, we observed very reproducible expression pattern dynamics of the hb‐P2 reporter at each cycle nc11 to nc13. Such reproducibility in the pattern dynamics were also observed in this current work for hb‐P2, B6, B9, B12 and H6B6 reporters (Fig. S6A). Also, in Lucas et al, PLoS Gen, 2018, the shift in the established boundary positions of hb‐P2 reporter between nc11 to nc13 is ~2%EL (approximately a nucleus length ~10μm) and it is thus marginal.

      In addition, as mentioned in the text (lines 105 to 107), we only focused our analysis on nc13 data which are statistically stronger given the higher number of nuclei analyzed. Thus, any change of Bcd nuclear concentration that would happen over nuclear cycles will not matter.

      Concerning Zelda: Zelda’s transcriptional activity when measured on a reporter with only 6 Zld binding sites changes drastically over the nuclear cycles, with strong activity at nc11 and much weaker activity at nc13 (Fig S4A). This indicates that the changes in expression pattern dynamics of Z2B6 from nc11 to nc13 are caused predominantly by decreasing Zelda activity: the effect of Zld on the Z2B6 promoter is very strong during nc11 and nc12. It is also very strong at the beginning of nc13 (even though the Z6 reporter is almost silent) and became a bit weaker in the second part of nc13 (Fig S4B‐D).

      ‐ Does the change in concentration of Bicoid across the nuclear cycles shift the gradient similar to the change in numbers of Bicoid binding sites?

      In both Lucas et al, PLoS Gen, 2018 and in this work (Fig. 1, Fig. 3 and Fig. S6A), we found that the positions of the expression boundary are very reproducible and stable in time for hb‐P2, B6, B9, B12, H6B6 during the interphase of nc12 to 13. For hb‐P2, the averaged shift of the established boundary position in nc11, 12 and 13 is within 2 %EL. This averaged shift between the cycles is of similar magnitude to the difference caused by embryo‐to‐embryo variability within nc13 (~2 %EL) (Gregor et al, Cell, 2007b, Lucas et al, PloS Gen, 2018). This shift is much smaller than the difference between the expression boundary positions of B6 and B9 (~ 8 % EL) and between B6 and Z2B6 (~17.5 %EL) in nc13.

      For these reasons, we conclude that the difference between the expression patterns of B6, B9 and Z2B6 are caused predominantly by changing the TF binding site configurations of the reporters, rather than variability in the Bcd gradient.

      The assumption of gradient stability has been clarified in the previous answer and in the manuscript (lines 165‐168).

      ‐ The intensity is a little higher for B9 and B12 at the anterior in 2B? Is this statistically different? is this likely to change the amount of Bicoid expression at the locus and lead to more robust activation?

      We performed statistical tests to distinguish the spot intensities at the anterior pole for every pair of reporters in Fig. 2B (hb‐P2, B6, B9 and B12). All p‐values from pair‐wise KS tests are greater than 0.067, suggesting that the spot intensities at the anterior pole are not distinguishable between these reporters.

      We have clarified this in the manuscript (line 157).

      ‐Are the fraction of active loci not changing across the syncytial cycles when the concentration of Bicoid also changes and consistent with the synthetic promoters?

      To measure the reproducibility of the expression pattern dynamics in different nuclear cycles, we compared the boundary position of the fraction of active loci pattern as a function of time for all hbP2 and synthetic reporters (Fig. S6A). In this figure panel, for all reporters except Z2B6, the curves in nc12 and nc13 largely overlap, suggesting high reproducibility in the pattern dynamics between cycles and consequently low sensitivity to the subtle variation in the Bcd nuclear concentration gradient between the cycles.

      For Z2B6, we attributed the difference in pattern dynamics between nc12 and nc13 to the changes in Zelda activity, as validated independently with a synthetic reporter with only 6 Zld binding sites (Fig. S4A).

      ‐How do the numbers of Hb BS change the expression of Hb? H6B6 has 6 Hb BS whereas the Hb‐P2 has 1? Are more controls needed to compare these 2 contexts?

      As our goal was to determine to which mechanistic step of our model each TF (Bcd, Hb, Zld) contributed, we added BS numbers that are much higher than in the hb‐P2 promoter. The added number of Hb BS remains very low when compared to total number of Hb binding sites in the entire genome (Karplan et al, PLOS Gen, 2011), therefore, it is very unlikely to affect the endogenous expression of Hb protein.

      We clarified this in the manuscript (lines 211 to 212).

      Does Zelda concentration change across the syncytial division cycles? How does the change in concentration in the natural context affect the promoter activation of Hb?

      Zelda concentration is stable over the nuclear cycles, as observed with the fluorescently‐tagged Zld protein (Dufourt et al., Nat Com, 2018). However, Zelda’s transcriptional activity when measured on a reporter with only 6 Zld binding sites changes drastically over the nuclear cycles, with strong activity at nc11 and much weaker activity at nc13 (Fig S4A, this work).

      The impact of this change in Zld activity can be observed with the Z2B6 promoter, with the expression boundary moving from the posterior region toward the anterior region over the nuclear cycles (Fig. S4B‐D). However, we don’t detect any changes in the expression pattern dynamics of hb‐P2 over the nuclear cycles (Fig. S6A and in Lucas et al., PLoS Gen, 2018).

      We have clarified this in lines 250‐251 of the main manuscript.

      ‐Changing the dose of Bicoid shifts the boundary of hunchback expression. It would be nice to model or test this in the context of varing doses of zelda or even reason this with respect to varying doses of zelda across the syncytial division cycles.

      We thank the reviewer for this insight. Concerning Zelda, we did not perform any experiment reducing the amount of Zelda in the embryo. However, in a previous study (Lucas et al., PLoS Genetics, 2018), we observed that the boundary of hb was shifted towards the anterior when decreasing the amount of Zelda consistent to the fact that the dose of Zelda is critical to set the boundary position and the threshold of Bcd concentration required for activation. However, as Zelda is distributed homogeneously along the AP axis, it cannot bring per se positional information to the system.

      Reviewer #3 (Public Review):

      I think the framing could be improved to better reflect the contribution of the work. From the abstract, for example, it's unclear to me what the authors think is the most meaningful conclusion. Is it the observations about the finer details of TF regulation (bursting dynamics), the fact that Bcd is probably the sole source of "positional information" for hb‐p2, that Bcd exists in active/inactive form, or the fact that an equilibrium model probably suffices to explain what we observe? The first sentence itself seems to suggest this paper will discuss "dynamic positional information", in which case it's somewhat misleading to say this kind of work is "largely unexplored"; Johannes Jaeger in particular has been a strong proponent of this view since at least 2004. On that note some particularly relevant recent papers in the Drosophila early embryo include:

      1) Jaeger and Verd (2020) Curr Topics Dev Biol

      2) Verd et al. (2017) PLoS Comp Biol

      3) Huang, Amourda, et al. and Saunders (2017) eLife

      4) Yang, Zhu, et al. (2020) eLife [see also the second half of Perkins (2021) PLoS Comp Biol for further discussion of that model]

      ‐Some reviews from James Briscoe also discuss this perspective.

      We agree with the reviewer that the phrasing of the abstract was not clear enough to emphasize the contribution of the work and we are also sorry if it suggested that the dynamic positional information is largely unexplored because this was not at all our intention.

      We rephrased the abstract aiming to better highlight the most meaningful conclusions.

      ‐I would also recommend modifying the title to reflect the biology found in the new results.

      We modified the title to better reflect the new results:<br /> “Synthetic reconstruction of the hunchback promoter specifies the role of Bicoid, Zelda and Hunchback in the dynamics of its transcription”

      ‐A major point that the authors should address is the design of the synthetic constructs. From table S1, the sites are often very closely linked (4‐7 base pairs). From the footprint of these proteins, we know they can cover DNA across this size (see, https://pubmed.ncbi.nlm.nih.gov/8620846/). As such, there may be direct competition/steric hindrance (see https://pubmed.ncbi.nlm.nih.gov/28052257/). What impact does this have on their interpretations? Note also that the native enhancer has spaced sites with variable identities.

      We completely agree with the reviewer comment in the sense that we named our reporters according to the number (N) of Bcd binding sites sequences that they contain, even though we cannot prove definitively that they can effectively be bound simultaneously by N Bcd molecules. It is thus possible that B9 is not a B9 but an effective B6 (i.e. B9 can only be bound simultaneously by 6 molecules) if, for instance, the binding of a Bcd molecule to one site would prevent by the binding of another Bcd molecule to a nearby site (as proposed by the reviewer in the case of direct competition or steric hindrance).

      Even though we cannot exclude this possibility, we think that our use of B6, B9, B12, in reference to the 6 Bcd BS of hb‐P2 promoter, is relevant for several reasons : i) some of the Bcd BS in the hb‐P2 promoter are also very close from each other (see Table S1); ii) the design of the synthetic construct was made by multimerizing a series of 3 strong Bcd binding sites with a similar spacing as found for the closest sites in the hb‐P2 promoter (as shown in Figure 1A and Table S1); iii) the binding of the Bicoid protein has been shown in foot printing experiments in vitro to be more efficient on sites of the hb‐P2 promoter that are close from each other, and this has even been interpreted as binding cooperativity (Ma et al., 1996); iv) even though these experiments were not performed with full‐length proteins, two molecules of the paired homeodomain (from the same family of DNA binding domain as Bcd) are able to simultaneously bind to two binding sites separated by only 2 base pairs. This binding to very close sites is even cooperative while when the two sites are distant by 5 base pairs or more, the simultaneous binding to the two sites occurs without cooperativity (Wilson et al., 1993).

      Conversely, as it is very difficult to demonstrate that 9 Bcd molecules can effectively bind to our B9 promoter, it is very difficult to know exactly how many binding sites for Bcd the hb‐P2 contains, and a large debate concerning not only the number but also the identity of the Bcd sites in the hb promoter is still ongoing (Park et al., 2019; Ling et al., 2019).

      As we cannot exclude the possibility that B9 is an effective B6, it remains possible that B9 and hb‐P2 (which is supposed to only contains 6 sites) have the same number of effective Bcd binding site and this could explain why the two reporters have very similar transcription dynamics and features.

      Regarding other interpretations in the manuscript, we identified two other aspects that will be affected if our synthetic reporters have fewer effective sites than the number of sites they carry. The first one concerns the synergy, as the increase in the number of sites of 1.5 from B6 to B9 might be over‐estimated but this would even increase the synergistic effect given the 4.5 difference in activity of the two reporters (Fig. S3). The second one concerns the discussion on the Hill coefficient and the decay length where the effective number of binding sites (N) is required to determine the limit of concentration sensing (Fig. 5). This would particularly be important for the hb‐P2 promoter.

      Except for these specific points, we don’t think that the possibility that reporters do not exactly contain as many as effective binding sites than proposed, has a huge impact on our interpretations and the general message conveyed in this manuscript. Most importantly, it is very clear that our B6 and B9 reporters differ only by three Bcd binding sites and have yet very distinct expression dynamics: while B9 recapitulates almost all transcription features of hb‐P2, B6 is far from achieving it. Similarly, H6B6 and Z2B6 have very different transcription features than B6 and these differences have been key for understanding the mechanistic functions of the three TF we studied.

      This discussion has been added to the discussion (lines 400 to 414)

    1. Author Response

      Reviewer #2 (Public Review):

      In this interesting and beautifully illustrated study, the authors are addressing the question of the emergence of craniofacial tissues by dissecting the interplay between skeletal muscle progenitors and associated connective tissue cells. By combining sophisticated lineage-tracing single cell RNA-seq experiments with potent computational analysis tools followed by in situ validations, the authors have identified a population of Myf5+ bipotent progenitors that give rise to both muscle and connective tissue. However, some conclusions are solely based on the RNA-Seq data that would require further experimental validations.

      We thank the Reviewer for evaluating our work and their encouraging comments. We agreed with the assessment and have now added more in-situ validations and quantifications to support the in-silico analyses.

      Reviewer #3 (Public Review):

      In this manuscript, Grimaldi et al. present evidence for the existence of Myf5+ bipotent progenitors for myogenic and connective lineages in the dorsal regions of the mouse head, which is not populated by neural crest cell-derived connective tissue. The study relies heavily on scRNA-seq dataset obtained from cell populations sorted at defined time points, and refined computational analysis, including trajectory and gene network inference using the established tools RNA velocity and SCENIC, respectively. The proposed model is partially validated by in situ staining experiments, including genetic labeling, which identified Pdgfra+ non-myogenic cells within the Myf5+ lineages, notably in association with extraocular muscles (EOM). The authors propose a myogenic origin for the connective tissue, in regions devoid of neural crest cells, and show that loss of Myf5 function causes an increase in the proportion of Sox9+ cells among Myf5+ lineage cells, which is consistent with a binary fate choice from Myf5+ progenitors. The authors tentatively identify signaling molecules and transcription regulators underlying both fate decisions and cell-cell communications between myogenic and non-myogenic cell populations.

      The general message of the study offers a potentially new paradigm to study neural crest cell-independent mesodermal fate decision in the vertebrate head, and is thus poised to augment our understanding of craniofacial development, and potential diseases.

      Unfortunately, there are shortcomings that strongly reduce enthusiasm for this manuscript. Strictly speaking, there is no clear demonstration for the existence of bipotent progenitors in the absence of clonal analysis. The study relies excessively on computational analysis of descriptive scRNA-seq datasets, with a general paucity of secondary experimental validation. The manuscript would benefit from a refined focus on the key point, and addition of validation for the initial conclusions, at the expense of somewhat convoluted analyses (e.g. Figs. 6 and 7)

      We thank the Reviewer for their constructive comments. We have now provided additional in-situ validations on embryos and quantifications to support the in-silico analyses.

    1. Author Response:

      Evaluation Summary:

      This paper will be of interest to researchers who perform single-molecule fluorescence imaging experiments as well as those who want to include machine learning in their data analyses. The authors have developed a machine learning algorithm that addresses some of the data analysis challenges in the field of single-molecule fluorescence imaging. The methods are rigorously benchmarked using simulated data and tested using real data. There are some concerns whether Tapqir is general enough for use by the broader community of single-molecule fluorescence researchers.

      We thank the reviewers for their thorough review of the manuscript. In response to the reviewer comments, we posted to bioRxiv a revised manuscript with new data and edits to text. Concerns about generality are addressed in the revised manuscript and in the responses to specific reviewer comments below.

      Reviewer #1 (Public Review):

      "Bayesian machine learning analysis of single-molecule fluorescence colocalization images" by Ordabayev, et al. reports the development, benchmarking, and testing of a Bayesian machine learning-based method, which the authors name Tapqir, for analyzing single-molecule fluorescence colocalization data. Unlike currently available, more conventional analysis methods, Tapqir attempts to holistically model the microscopy images that are recorded during a colocalization experiment. Tapir uses a physics-based, global model with parameters describing all of the features of the experiment that are expected to contribute to the recorded microscopy images, including shot noise of the spots and background, camera noise, size and shape of the spots, and specific- and non-specific binders. Based on benchmarking on simulated data with widely varying properties (e.g., signal-to-noise; amounts, rates, and locations of specific and non-specific binders; etc.), Tapqir generally does as well and, in some cases, better than currently existing methods. The authors also test Tapqir on real microscopy images with similarly varying properties from studies that have been previously published by their research group and demonstrate that their Tapqir-based analysis is able to faithfully reproduce the previously published results, which were obtained using the more conventional analysis methods available at the time the data were originally published. This is a well-designed and executed study, Tapqir represents a conceptual and practical advance in the analysis of single-molecule fluorescence colocalization experiments, and its performance has been comprehensively and rigorously benchmarked on simulated data and tested on real data. The conclusions of this study are well supported by the data, but some of the limitations of the method need to be clarified and discussed in more depth, as outlined below.

      1. Given that the AOI is centered at the target molecule and there is a strong prior for the binder also being located at the center of the AOI, the performance of Tapqir is dependent on several variables of the microscopy/optical system (e.g., the microscope point-spread function, magnification, accurate alignment of target and binder imaging channels, accurate drift correction, etc.). Although this caveat is mentioned and some of these factors are listed in the main text of the manuscript, the authors could have expanded this discussion in order to clarify the extent to which the performance of Tapqir depends on these factors.

      We added relevant new data to the revised manuscript in Table 5. The question about alignment accuracy is now discussed in the Materials and Methods:

      “Tests on data simulated with increasing proximity parameter values σxy (true) (i.e., with decreasing precision of spatial mapping between the binder and target image channels) confirm that the cosmos model accurately learns σxy (fit) from the data (Figure3–Figure Supplement 3D; Table 5). This was the case even if we substituted a less-informative σxy prior (Uniform vs. Exponential; Table 5).

      The CoSMoS technique is premised on colocalization of the binder spots with the known location of the target molecule. Consequently, for any CoSMoS analysis method, classification accuracy will in general decline when the images in the target and binder channels are less accurately mapped. However, for the Tapqir cosmos model, low mapping precision has little effect on classification accuracy at typical non-specific binding densities (λ = 0.15; see MCC values in Table 5).”

      The more general point about priors is now addressed in the Materials and Methods as follows:

      “All simulated and experimental data sets in this work were analyzed using the prior distributions and hyperparameter values given above, which are compatible with a broad range of experimental conditions (Table 1). Many of the priors are uninformative and we anticipate that these will work well with images taken on variety of microscope hardware. However, it is possible that highly atypical microscope designs (e.g., those with effective magnifications that are sub-optimal for CoSMoS) might require adjustment of some fixed hyperparameters and distributions (those in Eqs. 6a, 6b, 11, 12, 13, 15, and 16). For example, if the microscope point spread function is more than 2 pixels wide, it may be necessary to increase the range of the w prior in Eq. 13. The Tapqir documentation (https://tapqir.readthedocs.io/en/stable/) gives instructions for changing the hyperparameters.”

      1. The Tapqir model has many parameters, each with its own prior. The majority of these priors are designed to be uninformative and/or weak and the only very strong prior is the probability that a specific binder is located at or very near the center of the AOI. The authors could have tested and commented on how the strength of the prior on the location of a specific binder affects the performance of Tapqir.

      The revised manuscript includes new data on and expanded discussion of this point. In our model, the position of a target-specific spot relative to the target position has a prior distribution illustrated as the green curve in Figure 2-Figure supplement 2. Importantly, the peak in this distribution does not have an a priori set width. Instead, the width of the peak is a model hyperparameter, σxy, that is learned from the image data set without user intervention. To make sure that this point is understood, we expanded and clarified the relevant Methods section and modified the legend of Figure 2-Figure supplement 2.

      To address the reviewers’ specific question, we constructed simulated data sets with different mapping precision values and analyzed them; the results are presented in the (new) Table 5 and discussed:

      “The CoSMoS technique is premised on colocalization of the binder spots with the known location of the target molecule. Consequently, for any analysis method, classification accuracy declines when the images in the target and binder channels are less accurately mapped. For the Tapqir cosmos model, low mapping precision has little effect on classification accuracy at typical non-specific binding densities (λ = 0.15; see MCC values in Table 5).”

      1. Given the priors and variational parameters they report, the authors show that Tapqir performs robustly and seems to require no experiment-to-experiment optimization. This is expected to be the case for the simulated data, since they were simulated using the same model that Tapqir uses to perform the analysis. With regard to the real data, however, it is quite likely that this is due to the fact that the analyzed data all come from the same laboratory and, therefore, likely the same microscope(s). It would have therefore been very useful if the authors would have listed and discussed which microscope settings, experimental conditions, and/or other considerations, beyond those described in point 1 above, would result in a need for re-optimization of the priors and/or variational parameters.

      As noted above, we now address this point in the Materials and Methods as follows:

      “All simulated and experimental data sets in this work were analyzed using the prior distributions and hyperparameter values given above, which are compatible with a broad range of experimental conditions (Table 1). Many of the priors are uninformative and we anticipate that these will work well with images taken on variety of microscope hardware. However, it is possible that highly atypical microscope designs (e.g., those with effective magnifications that are sub-optimal for CoSMoS) might require adjustment of some fixed hyperparameters and distributions (those in Eqs. 6a, 6b, 11, 12, 13, 15, and 16). For example, if the microscope point spread function is more than 2 pixels wide, it may be necessary to increase the range of the w prior in Eq. 13. The Tapqir documentation (https://tapqir.readthedocs.io/en/stable/) gives instructions for changing the hyperparameters.”

      1. Based on analysis of the simulated data shown in Figure 5, where the ground truth is known, the use of Tapqir to infer kinetics is less accurate that the use of Tapqir to infer equilibrium binding constants. The authors do a great job of discussing possible reasons for this. In the case of the real data analyzed in Figure 6 and in Figure 6 - Figure Supplements 1 and 2, the kinetic results obtained using Tapqir have different means and generally larger error bars than those obtained using Spot-Picker. To more comprehensively assess the performance of Tapqir versus Spot-Picker, the authors could have used the association and dissociation rates to calculate the corresponding equilibrium binding constants and then compared these kinetically calculated equilibrium binding constants to the population-calculated equilibrium binding constants that the authors calculate and report in the bottom plot in Panel D of Figure 6 and Figure 6 - Figure Supplements 1 and 2. This would provide some information on the accuracy of the kinetics in that the closer the kinetically and population-calculated equilibrium binding constants are to each other, the more accurately the kinetics have been estimated. Performing this type of analysis for the kinetics obtained using Tapqir and Spot-Picker would have allowed a more comprehensive comparison of the two methods.

      This comment seems to reflect a misunderstanding. Fig. 6 and its figure supplements do not report any dissociation kinetics or binding equilibrium constants. Instead, they report ka (pseudo first-order target-specific association rate constant), kns (pseudo first-order target non-specific association rate constant), and Af (the active faction, i.e., the fraction of target molecules capable of association with binder). ka and Af values from the two methods agree within experimental uncertainty for all four data sets analyzed. kns values differ, but as we point out:

      “We noted some differences between the two methods in the non-specific association rate constants kns. Differences are expected because these parameters are defined differently in the different non-specific binding models used in Tapqir and spot-picker (see Materials and Methods).”

      (There is additional discussion of this point in Materials and Methods). The reviewer is correct that the estimated uncertainties (i.e., error bars in panels D) in ka and Af are generally larger for Tapqir than for spot-picker. This is expected, for the reasons that we explain:

      “In general, previous approaches in essence assume that spot classifications are correct, and thus the uncertainties in the derived molecular properties (e.g., equilibrium constants) are systematically underestimated because the errors in spot classification, which can be large, are not accounted for. By performing a probabilistic spot classification, Tapqir enables reliable inference of molecular properties, such as thermodynamic and kinetic parameters, and allows statistically well-justified estimation of parameter uncertainties. This more inclusive error estimation likely accounts for the generally larger kinetic parameter error bars obtained from Tapqir compared to those from the existing spot-picker analysis method (Figure 6, Figure 6–Figure Supplement 1, Figure 6–Figure Supplement 2, and Figure 6–Figure Supplement 3). ”

      Reviewer #2 (Public Review):

      The work by Ordabayev et al. details a Bayesian inference-based data analysis method for colocalization single molecule spectroscopy (CoSMoS) experiments used to investigate biochemical and biophysical mechanisms. By using this probabilistic framework, their method is able to quantify the colocalization probabilities for individual molecules while accounting for the uncertainty in individual binding events, and accounting for camera and optical noise and even non-specific binding. The software implementation of this method, called Tapqir, uses a Python-based probabilistic programming language (PPL) called pyro to automate and speed-up the optimization of a variational Bayes approximation to the posterior probability distribution. Overall, Tapqir is a powerful new way to analyze CoSMoS data.

      Tapqir works by analyzing small regions (14x14 pixels) of fluorescence microscopy images surrounding previously identified areas of interest (AOI). The collection of images of these AOIs through time are then analyzed collectively using a probabilistic model that accounts for each time frame of each AOI and is able to determine whether up to K "binders" (K=2 here) are present and which of them is specifically bound. This approach of directly modeling the contents of the image data is relatively novel, and few other examples exist. The details of the probabilistic model used incorporate an impressive amount of physical insight (e.g., camera gain) without overparameterization.

      We thank the reviewer for these positive comments.

      The gamma-distributed noise model used in Tapqir captures quite a lot of physics and, given the analyses in Figs. 3-6, clearly works, but might be limited to certain types of cameras used in the fluorescence microscopy (e.g., EMCCDs). For instance, sCMOS cameras have pixel-dependent amplification and noise profiles, rather than a single gain parameter, and are sometimes approximately modeled as normal distributions with both mean and variance having an intensity-dependent and independent contribution that is different for each pixel on the camera. It is unclear how Tapqir performs on different cameras.

      In the revised manuscript, we expanded the discussion of the Image likelihood component of our model to emphasize that 1) all data sets we analyze are experimental or simulated EMCCD images, 2) sCMOS images have the different noise characteristics alluded to by the reviewer, and 3) optimal sCMOS image analysis might require a modified model, possibly including the ability to use per-pixel calibration data as a prior as was done in super-resolution work (now cited) that uses sCMOS data.

      sCMOS cameras have in recent years become very popular for some kinds of single-molecule imaging (e.g., PALM/STORM or live-cell single-particle tracking). However, for the low-background/low-signal in vitro single-molecule TIRF that is our target application for the approach described in the manuscript, EMCCD is still preferable over sCMOS for many, but not all, imaging conditions (see https://andor.oxinst.com/learning/view/article/what-is-the-best-detector-for-single-molecule-studies). Thus, we think there will be plenty of interest in the approach we describe in the manuscript even if (which is not certain) the program functions better with EMCCD than with sCMOS images.

      Going forward to develop and test an sCMOS-targeted version of the model, as we have done for EMCCD, will require revised model and code, but will also necessitate accurately simulating sCMOS CoSMoS images, obtaining experimental sCMOS CoSMoS images reflecting a broad range of realistic experimental conditions, and using the simulated and experimental images to test the new model. These may well be useful things to do in the future but would be a considerable step beyond the scope of the present manuscript.

      The variational Bayes solution used by Tapqir provides many computational benefits, such as numerical tractability using pyro and speed. It is possible that the exact posterior, e.g., as obtained using a Markov chain Monte Carlo method, would be insignificantly different with the amount of data typical for CoSMoS experiments; however, this difference is not explored in the current work.

      We agree. However, since we have not done any analyses using MCMC, there is nothing in particular that we can say about it in the context of CoSMoS data analysis. Implementation of an MCMC approach using our model will be easier in the future because the Pyro developers are currently working to optimize the implementations of MCMC methods in their software.

      The intrinsic use of prior probability distributions in any Bayesian inference algorithm is extremely powerful, and in Tapqir offers the opportunity to "chain together" subsequent analyses by using the marginalized posteriors from one experiment as the basis for the priors for subsequent experiments (e.g., in \sigma^{xy}) for extremely high accuracy inference. While the manuscript discusses setting and leveraging the power of priors, it does not explore the power of such "chaining" and the positive effects upon accuracy.

      Chaining is beneficial in principle. However, in practice it will help significantly only if the uncertainty in the posterior parameter values from the non-chained analysis is larger than the experiment-to-experiment variability in the “true” parameter values. For σxy we obtain very narrow credence intervals without chaining (Table 1). In our judgement, these are unlikely to be made more accurate by using prior information from another experiment where such factors as microscope focus adjustment may be slightly different.

      A significant number of CoSMoS experiments use multiple, distinct color fluorophores to probe the colocalization of different species to the target. The current work focuses only upon analyzing data with a single color-channel. Extensions to multiple independent wavelengths are computationally trivial, given the automated variational inference ability of PPLs such as pyro, and would increase the impact of the work in the field.

      Our current approach can be used to analyze multi-channel data simply by analyzing each channel independently. However, we agree that there would be advantages to joint analysis of multiple wavelength channels (especially if there is crosstalk between channels) and that implementing multi-channel analysis is a logical extension of our study. It is straightforward (though not trivial, in our experience) to implement such multi-wavelength models. However, testing the functioning of candidate models and validating them using simulation and experimental data would require extensive work that in our view goes beyond what is reasonable to include in the present manuscript.

      Tapqir analysis provides time series of the probability of a specific binding event, p(specific), for each target analyzed (c.f., Fig. 5B), and kinetic parameters are extracted from these time series using secondary analyses that are distinct from Tapqir itself.

      The method reported here is well designed, sound, and its utility is well supported by the analyses of simulated and experimental data sets reported here. Tapqir is a cutting-edge image analysis approach, and its proper treatment of the uncertainty inherent to CoSMoS experiments will certainly make an impact upon the analysis of CoSMoS data. However, many of the (necessary) assumptions about the data (e.g., fluorescence microscopy) and desired information (e.g., off-target vs on-target binding) are quite specific to CoSMoS experiments and therefore limit the direct applicability of Tapqir for the analysis of other single-molecule microscopy techniques. With that in mind, the direct Bayesian inference-based analysis of image data, as opposed to integrated time series, as demonstrated here is very powerful, and may encourage and inspire related methods to be developed.

      Our approach is a powerful way to analyze CoSMoS data in part because it is specific to CoSMoS – it is premised on a physics-based model that incorporates known features of CoSMoS experiments. We agree that the general approach could be adapted to other image analysis applications.

      Reviewer #3 (Public Review):

      In this manuscript, the authors seek to improve the reproducibility and eliminate sources of bias in the analysis of single molecule colocalization fluorescence data. These types of data (i.e., CoSMoS data) have been obtained from a number of diverse biological systems and represent unique challenges for data analysis in comparison with smFRET. A key source of bias is what constitutes a binding event and if those events are colocalized or not with a surface-tethered molecule of interest. To solve these issues, the authors propose a Bayesian-based method in which each image is analyzed individually and locally around areas of interest (AOIs) identified from the surface tethered molecules. A strength of the research is that the approach eliminates many sources of bias (i.e., thresholding) in analysis, models realistic image features (noise), can be automated and carried out by novice users "hands-free", and returns a probability score for each event. The performance of the method is superb under a number of conditions and with varying levels of signal-to-noise. The analysis on a GPU is fairly quick-overnight-in comparison with by-hand analysis of the traces which can take days or longer. Tapqir has the potential to be the go-to software package for analysis of single molecule colocalization data.

      The weaknesses of this work involve concerns about the approach and its usefulness to the single-molecule community at large as wells as a lack of information about how users implement and use the Tapqir software. For the first item, there are a number of common scenarios encountered in colocalization analysis that may exclude use of Tapqir including use of CMOS rather than EM-CCD cameras, significant numbers of tethered molecules on the surface that are dark/non-fluorescent, a high density/overlapping of AOIs, and cases where event intensity information is critical (i.e., FRET detection or sequential binding and simultaneous occupancy of multiple fluorescent molecules at the same AOI). In its current form, the use of Tapqir may be limited to only certain scenarios with data acquired by certain types of instruments.

      In the following paragraphs, we address 1) concerns about application to CMOS, 2) dark target molecules, 3) overlapping AOIs, and 4) application to methods (e.g., smFRET) that require extraction of both colocalization and intensity data.

      1) Application to CMOS images.

      In the revised manuscript, we expanded the discussion of the Image likelihood component of our model to emphasize that 1) all data sets we analyze are experimental or simulated EMCCD images, 2) sCMOS images have the different noise characteristics alluded to by the reviewer, and 3) optimal sCMOS image analysis might require a modified model, possibly including the ability to use per-pixel calibration data as a prior as was done in super-resolution work (now cited) that uses sCMOS data.

      sCMOS cameras have in recent years become very popular for some kinds of single-molecule imaging (e.g., PALM/STORM or live-cell single-particle tracking). However, for the low-background/low-signal in vitro single-molecule TIRF that is our target application for the approach described in the manuscript, EMCCD is still preferable over sCMOS for many, but not all, imaging conditions (see https://andor.oxinst.com/learning/view/article/what-is-the-best-detector-for-single-molecule-studies). Thus, we think there will be plenty of interest in the approach we describe in the manuscript even if (which is not certain) the program functions better with EMCCD than with sCMOS images.

      Going forward to develop and test an sCMOS-targeted version of the model, as we have done for EMCCD, will require revised model and code, but will also necessitate accurately simulating sCMOS CoSMoS images, obtaining experimental sCMOS CoSMoS images reflecting a broad range of realistic experimental conditions, and using the simulated and experimental images to test the new model. These may well be useful things to do in the future but would be a considerable step beyond the scope of the present manuscript.

      2) Dark target molecules.

      In their detailed comments, the reviewers suggested a “no target molecules in sample” (NTIS) control instead of the “no fluorescent target molecules in control AOIs” (NFTICA) design that we illustrate in Fig. 1. Both types can be used as a Tapqir control dataset without any modification of the program or model. We have edited the Fig. 1 caption to explain that either type is acceptable. The reviewers are correct that, all else being equal, NTIS may be better if the target molecules are incompletely labeled. However, in practice experimenters usually know the fraction of molecules that are labeled and reduce the fluorescent target molecule surface density to hold the fraction of spots with two or more coincident target molecules (fluorescent or not) below a chosen threshold (typically 1 % or less), negating the possible advantage of NTIS (but at the expense of collecting less data per sample). On the other hand, NFTICA has the practical advantage that it is a control internal to the sample and is thus immune to problems caused by temporal or sample-to-sample variability (e.g., of surface properties).

      3) Overlapping AOIs.

      The method does not require non-overlapping AOIs – we used partially overlapping AOIs in the experimental data analyzed in the manuscript. Even though our analysis used larger AOI sizes (and hence, more overlap) than the spot-picker method, there was good agreement in the results, indicating that overlap does not cause any undue problems.

      In the revised manuscript Results section we added the following discussion of the effect of AOI size:

      “Since target-nonspecific spots are built into the cosmos model, there is no need to choose excessively small AOIs in an attempt to exclude non-specific spots from analysis. We found that reducing AOI size (from 14 x 14 to 6 x 6 pixels) did not appreciably affect analysis accuracy on simulated data (Table 2). In analysis of experimental data, smaller AOI sizes caused occasional changes in calculated p(specific) values reflecting apparent missed detection of a few spots (Figure 3–Figure supplement 4). Out of caution, we therefore used 14 x 14 pixel AOIs routinely, even though the larger AOIs somewhat reduced computation speed (Table 2 and Figure 3–Figure Supplement 4).”

      4) Methods requiring extraction of intensity data.

      The cosmos model we describe in the manuscript does not incorporate phenomena where the spot intensity at a single target changes, such as when there is FRET or multiple binders. As we point out in the final paragraph of the Discussion, more elaborate versions of the cosmos model that incorporate these phenomena could be developed. This would entail implementation, optimization, and validation with simulations and real data of the new model, which is beyond the scope of the present manuscript.

      Second, for adoption by non-expert users information is missing in the main text about practical aspects of using the Tapqir software including a description of inputs/outputs, the GUI (I believe Taqpir runs at the command line but the output is in a GUI), and if Tapqir integrates the kinetic modeling or not.

      This information is given in the online Tapqir documentation. The kinetic analysis (as in Fig. 6) is a simple Python script that is run after Tapqir; the instructions for using it are included in the documentation. Tapqir runs can be initiated using either a CLI or GUI. Output can be viewed in Tensorboard, in a Tapqir GUI, and/or passed to a Jupyter notebook or Python script for further analysis, plotting, etc.

      Given that a competing approach has already been published by the Grunwald lab, it would be useful to compare these methods directly in both their accuracy, usefulness of the outputs, and calculation times.

      The reviewer does not explain why comparing with the Grunwald method would be preferable to the comparison with spot-picker that is included in the manuscript. To be sure there is no misunderstanding, the following are the same for the two methods and therefore are not reasons to prefer one or the other of these methods for the comparison in Fig. 6 (see also Discussion):

      1) Like Tapqir, both spot-picker and Grunwald methods analyze 2-D images, not integrated intensities.

      2) Unlike Tapqir, neither spot-picker nor Grunwald is fully objective; both require subjective selection of classification thresholds by the analyst in order to tune the algorithm performance for analysis of a particular dataset.

      3) Neither spot-picker nor Grunwald is a Bayesian method. “Bayesian” in the Grunwald paper title refers to their excellent work on a separate analytical method (described in the same paper) for evaluating the number of binder molecules colocalized with a target spot; this method is not relevant to a comparison with the model presented in our manuscript.

      4) Unlike Tapqir, neither spot-picker nor Grunwald estimate classification probabilities. Instead, they simply assign binary spot/no-spot classifications that do not convey to downstream analyses the extent of uncertainty in each classification.

      5) Neither spot-picker nor Grunwald has been validated previously using simulated image data. Consequently, the validity of image classification has not been established for either.

      The comparison of Fig. 6 and supplements does not claim to and is not intended to show that Tapqir is better than spot-picker for real experimental data; we cannot make such a claim for these or any other methods because we do not know the true kinetic process and rate constants that generated the experimental data. Instead, our comparison uses experimental data sets with a broad range of characteristics (Table 1) to show that Tapqir yields similar association rate constants to those produced by spot-picker even though the former is objective and automatic while the latter requires subjective tuning by an analyst. Our choice to use spot-picker over Grunwald for this comparison was dictated by the fact that among the co-authors we have such an expert in the use of spot-picker, whereas we lack comparable expertise with Grunwald. We have little doubt that Grunwald would also produce results similar to the other methods in the hands of an expert user who is able to subjectively adjust classification parameters.

      Along these lines, the utility of calculating event probability statistics (Fig. 6A) is not well fleshed-out. This is a key distinguishing feature between Tapqir and methods previously published by Grunwald et al. In the case of Tapqir, the probability outputs are not used to their fullest in the determination of kinetic parameters. Rather a subjective probability threshold is chosen for what events to include. This may introduce bias and degrade the objective Tapqir pipeline used to identify these same events.

      This comment reflects a misunderstanding. No probability threshold is used in the kinetic analyses (Figs. 5 and 6). Instead, we make full use of the p(specific) probability output using the posterior sampling strategy that is illustrated in Fig. 5B and is described in the Results and in Materials and Methods. In the revised manuscript we modified the Results section to further emphasize this point.

      Finally, the manuscript could be improved by clearly distinguishing between the fundamental approach of Bayesian image analysis from the Tapqir software that would be used to carry this out.

      We have revised the manuscript to adopt this recommendation. We now call the mathematical model “the cosmos model” and use “Tapqir” to refer to the software.

      A section devoted to describing the Tapqir interface and the inputs/outputs would be valuable. In the manuscript's current form, the lack of information on the interface along with the potential requirement for a GPU and need for the use of a relatively new programming language (Pyro) may hamper adoption and interest in colocalization methods by general audiences.

      Description of the interface and inputs/outputs is given in the online Tapqir documentation.

      Users do not need to own a GPU; they can instead run the program on a readily available cloud computing service. We have now added to Table 1 data showing that computation time on the Google Colab Pro cloud service is actually faster than that on our local GPU system. Colab Pro is inexpensive, readily accessible, and user friendly. We have added to the user manual a tutorial that shows how to run a sample data set using Tapqir on Colab.

      Users do not need any knowledge of Pyro to use Tapqir; Pyro is merely used internally in the coding of Tapqir.

    1. Author Response

      Evaluation Summary:

      In this manuscript, the authors provide promising results for the treatment of age-related sarcopenia with AdipoRon, a drug that targets the receptors for adiponectin. This is a well done study using an agonist (AdipoRon) involved in lipid and mitochondrial metabolism regulation to mitigate age related muscle loss in mice.

      Thank you for these positive comments – we are excited about the potential of this agent as a means to prevent and treat sarcopenia.

      Reviewer #1 (Public Review):

      Strengths and Accomplishments:

      1) This study tests an exciting potential intervention for sarcopenia, is well supported by prior literature investigating the effects of AdipoRon in age-related metabolic diseases, and now extends these data into aging.

      2) The study uses a diversity of techniques and systems (in vitro, in vivo, and ex vivo, chronic and acute treatments, young and old mice) to investigate the effects and relevant mechanisms of AdipoRon from the level of the whole organism into the muscle fibers and further into cellular signaling pathways.

      3) Similar cellular findings across species and cell types argues for strong conservation of the downstream effects of AdipoRon.

      4) This study provides coherent and conserved downstream molecular mechanisms (e.g. PGC-1a) and physiological changes (fiber types, mitochondrial function, insulin sensitivity) that should be readily translatable into mechanistically-designed non-human primate and human clinical studies.

      5) The presentation is well organized and logical, showing the effects of chronic AdipoRon treatment in old and then young male mice, followed by acute treatment in young mice and cells, moving from clinical to physiological to cellular/molecular findings.

      We thank the reviewer for these comments and are eager to continue our work in this area – the absence of a pharmacological agent to treat sarcopenia is a major gap in geriatric and rehabilitation medicine.

      Weaknesses and Limitations:

      1) The key mouse studies are underpowered, resulting in inconclusive rotarod data in the aged group and no behavioral testing in the young group. There is no other whole-organism functional data to support the clinical relevance of the ex vivo and postmortem physiological and molecular findings.

      We acknowledge that this study is a first step toward identifying an agent to prevent and treat sarcopenia. As this reviewer will appreciate it can be challenging to overcome the natural increase in variance that occurs with age, the upshot is that the magnitude of effect needs to be quite large to find statistically significant outcomes when conducting comparisons among aged animals. We are careful to distinguish between statistically significant differences and those that are numerically different. We would argue that the absence of statistical significance does not mean that an observation is not informative or biologically meaningful. We would like to add that the data described here prompted a follow up study in male and female mice where our plans include more extensive investigation of the functional, tissue-level, and systemic outcomes of AdipoRon treatment. In addition, we have applied for funding to support a nonhuman primate study on the impact of AdipoRon on sarcopenia, physical function, and metabolism, that we anticipate will have greater translational value.

      2) The in vitro cellular use fibroblasts and immune cells, which supports an argument for broad conservation of AdipoRon mechanisms but does not directly support the primary muscle physiological findings.

      The purpose of including the cell culture experiments was to define the cellular response to AdipoRon and to determine whether indices such as gene expression that hinted at changes in metabolism were actually associated with functional differences in cellular energetics. In response to this comment and comments from the other reviewers, we have conducted experiments in differentiated C2C12 myotubes and confirm that the molecular signatures initially reported in the fibroblasts and primate PBMCs are also induced in the murine myotube culture model in response to AdipoRon. We fully acknowledge that differentiated C2C12, being “muscle like”, are a better model for interrogating the mechanisms of AdipoRon action as they relate to skeletal muscle specifically.

      3) Using different strains for young and old mice limits the interpretation of young vs old differences, which could be due to strain differences instead.

      We appreciate this point; at the time that this work was undertaken we were constrained by what was available. We would note that we see the same responses to AdipoRon in our ongoing study that was conducted in a C3B6F1 hybrid line and included mice of 6, 22, and 28 months of age. We are confident that the results described here are genuinely reflective of the actions of AdipoRon as a function of age of the treated mouse and not due to differences in mouse genetic background.

      Reviewer #2 (Public Review):

      This is a straightforward study, demonstrating utility of an agonist targeting energy metabolism pathways in aging mouse muscle. Rather than the treatment improving muscle function generally, it appears selective to muscles predominantly affected by age-related muscle loss (type II fibers). As the authors acknowledge, these results need to be replicated in females, as they only looked at male mice.

      We agree that it will be important to follow up this study using a cohort with males and females (see response to Reviewer 1 comment 1 above). Interestingly, our nonhuman primate studies have indicated that there is sex dimorphism in skeletal muscle aging. Males are bulkier and with greater gains there are apparently greater losses as the animals advance in age. The females show less of a decline in total muscle with age than males in terms of DEXA estimates of appendicular muscle bulk; however, at the cellular and molecular level it is clear that aging is having an impact. We have identified improvements in metabolic indices including fasting insulin and RER in mice treated with AdipoRon, and improvement in endurance treadmill performance that was significant in AdipoRon treated males. That study, that includes more animals and two aging time points, will allow for detailed tissue and molecular level analysis so that we can identify which processes are sensitive to aging and to AdipoRon and track those pathways against physical performance at the individual level.

      Reviewer #3 (Public Review):

      In this manuscript the authors sought to investigate whether an adiponectin-receptor agonist could reduce the incidence of sarcopenia in aged mice. The authors provide compelling evidence that AdipoRon improves skeletal muscle function in aged mice, remodels muscle fibers, and appears to improve mitochondrial function at least in vitro.

      The authors provide multiple lines of evidence for the effects of AdipoRon, from live measurements of muscle function in aged rodents, to ex-vivo muscle activity assays, to in vitro assessment of mitochondrial function and activation of pathways involved in mitochondrial remodeling.

      The experiments complement one another very well, though it is unclear why two different strains were used for young and old mice, nor why the analysis was restricted to male animals only.

      Overall, the study details a promising intervention to restore muscle function in elderly individuals and identifies a druggable pathway that can be exploited for this goal.

      We thank the reviewer for these positive remarks. We acknowledge the limitation of looking only at male mice at this stage and have undertaken a follow up study that includes both sexes. The use of AdipoRon was inspired by our work in caloric restriction where adiponectin increase is a hallmark of CR in rodents and in nonhuman primates. At the time that this work began we had access to these aged male mice from NIA but the young mice came from an internal colony. We are eager to share these data in the hope that others will be interested in testing AdipoRon as a means to prevent or treat sarcopenia and would love to see studies in rehabilitative research being undertaken too.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors have developed a new method that allows for two-color STED imaging. They have applied this method to measure spine head size and PSD95 changes following exposure to an enriched environment.

      Strengths

      -The new method is well-described and seems to have considerably less crosstalk than previous attempts at in vivo two-color STED imaging. The analyses and controls of the method are compelling. I think that this method could be valuable for examining how different components of the synapse are changing in response to sensory or environmental changes.

      -The method is appropriate for measuring the size of PSD95 and spine head size in the enriched environment paradigm they use here. They find that in the short-term spine head size and PSD95 size are not always correlated.

      -They also find that there is less variability in the spine head size in animals in an enriched environment.

      Weaknesses<br /> -The authors use an enriched environment plasticity paradigm to showcase the method and measure spine head and PSD95 size and how they change over short periods of time. This particular biological study is not well-motivated and there is not a stated reason for studying the short-term (30-120 minutes) dynamics of PSD95 and spine head size, and their correlations. They also show that the variability in spine head size is decreased with the enriched environment, but do not show what the implications of that change would be from a biological point of view for synaptic dynamics or synaptic function.

      -The authors show that there are differences in the morphology of PSD95 between mice reared in enriched environments and those in control environments. While this quantification is done blindly by three different analysts, it is not done in a quantitative way. Also the authors do not show or explain the biological relevance of differences in the morphologies of PSD95, thus it is not clear what this measure means for synaptic plasticity or function.

      -The authors use a cranial window preparation, which is commonly used in the literature. However, it is not clear how long they wait to image the mice after the cranial window. Previous work from Xu et al. (PMID: 17417634) suggests that there is in an increase in glial activation for a period of up to a month after surgery. The authors have not shown the degree of glial activation that follows after their surgeries and if they have not waited a month, there may be upregulation of microglia, which may alter synaptic stability (also demonstrated in the same paper). The authors have not discussed this point or the implications for their findings.

      We thank the reviewer for his/her valuable input.

      The time-scale we study is similar to what is known from structural changes after LTP and thus we wanted to study the same time scale in vivo. We revised the motivation and explained better the biological relevance of the observed changes. We absolutely agree with the reviewer on his/her concern for chronic imaging. However, we performed acute experiments and imaged directly after implanting the window in the same session. After imaging the mice were sacrificed.

      Reviewer #3 (Public Review):

      Wegner et al. use two-color STED to follow spines and their PSDs in layer1 of mouse visual cortex over 2 hours under anesthesia. They compare mice that were kept in an enriched environment (EE) to control mice housed in standard laboratory cages. Spines in EE mice are larger and show larger fluctuations in size. PSDs in EE mice shrink during anesthesia and tend to change their nanostructure. Very importantly, changes in spine size were not driven by PSD size changes, or vice versa. Technologically, this is a landmark study, as tracking two different labeled structures in individual synapses at the nanoscale can obviously be applied to a large number of synaptic proteins and organelles, two at a time. Single-color superresolution microscopy is much less useful, as 'puncta in space', without cellular context, are difficult to interpret. This pioneering work is the first proof-of-concept of two-color in-vivo STED and of major importance for the community. Although stochastic processes seem to drive much of the synaptic dynamics under anesthesia, the environment shapes the spine size distribution and affects synaptic dynamics in a lasting fashion.

      One major comment:

      l.259: "These results suggest that Ctr housed mice undergo stronger morphological changes." This I find a bit misleading. What about: These results suggest that anesthesia induces stronger morphological changes in Ctr housed mice? Altogether, a discussion of the potential effects of anesthesia on spine/PSD dynamics is missing (see e.g. Yang et al., DOI: 10.1371/journal.pbio.3001146). The fact that there was weak correlation between spine head and PSD fluctuation could have something to do with the state of suppressed activity the system was in during imaging. Under conditions of intense processing of visual information, changes might have been more rapid and more tightly correlated. This could be mentioned as a perspective for the future - to visually stimulate the anesthetized animal.

      We agree with the reviewer that it should be mentioned here that the morphological change was observed under anesthesia. However, the sentence suggested by the reviewer is also a bit misleading since it suggests that the anesthesia has triggered the change. We think that anesthesia might affect the amplitude and dynamic of the observed changes but does not induce the change. Thus we rephrased as follows: These results suggest that Ctr housed mice undergo stronger morphological changes under anesthesia.

      We absolutely agree about the potential influence of the anesthesia on the spine and PSD95 nanoplasticity and added the following comment. Of course, we would like to perform the measurement in the future also in awake mice and after visual stimulation.

      Added to discussion: However, it was shown that MMF anesthesia reduces spiking activity and mildly increases spine turnover in the hippocampus (Yang et al., 2021). Thus, the plasticity of spine heads and PSD95 assemblies might be different in the awake state and under intense processing of visual information.

    1. Author Response:

      Reviewer #2 (Public Review):

      The reported study includes an overall well-conducted and well-presented set of experiments. Ample data are reported and a clear and conclusive picture of the findings is portrayed.

      1. The Introduction falls short of providing the background needed for fully appreciating the current findings and their importance. The authors don't present the current understanding regarding the role of 4-vinylanisole in locusts (mostly their own work). Nor do they present the accepted knowledge of the control of sexual maturation in locusts (mostly several decades-old work). Moreover, the importance of reproductive synchrony in the life history of gregarious locusts, including its tentative roles in maintenance of the homogeneity and integrity of the swarm, in ensuring high density conditions for the next generation, and more, is also not adequately presented.

      We appreciate the reviewer’s helpful comments. According to these comments, we have revised the introduction part by enriching the significance of reproductive synchrony in ecological adaption of gregarious locusts and the research progresses on sexual maturation control in locusts. Details were shown as: “Depending on population density, locusts display striking phenotypic plasticity, with a cryptic solitarious phase and an active gregarious phase (Wang and Kang, 2014). Gregarious locusts, compared to solitarious conspecifics, show much higher synchrony in physiological and behavioral events, such as egg hatching and sexual maturation, as well as synchronous feeding and marching behaviors (Norris, 1954, Uvarov, 1977). Reproductive synchrony in gregarious locusts provides benefits for individuals in several aspects, such as more favorable microenvironment, lower risk of predation, efficiently forging, as well we more encounters with mates, therefore ensures high density conditions for the next generation, and is essential for maintenance of locust swarm (Beekman et al., 2008, Maeno et al., 2021). Some sort of vibratory stimulus, maternal microRNAs, and SNARE protein play important roles in the egg-hatching synchrony of gregarious locusts (Chen et al., 2015b, He et al., 2016, Nishide and Tanaka, 2016). It has been revealed that the presence of mature male adults has effectively accelerating effects on synchrony of sexual maturation of immature male and female conspecifics in two locust species, Schistocerca gregaria and Locusta migratoria (Norris, 1952, Loher, 1961, Guo and Xia, 1964, Norris, 1964). The accelerating effects of several prominent volatiles released by gregarious mature males in male maturation have been exampled in the desert locust. Four volatile pheromones (benzaldehyde, veratrole, phenylacetonitrile, and 4-vinylveratrole) have significantly stimulatory effects on sexual maturation of male adults, with phenylacetonitrile having the most pronounced effect. (Mahamat et al., 1993, Assad et al., 1997). However, how conspecific interaction affects female sexual maturation remains unclear and the pheromones those contribute to maturation synchrony of females have not been determined so far”. In the current study, we identify 4-vinylanisole as a key pheromone promoting sexual maturation synchrony through validating the role of five gregarious male-abundant volatiles one by one, instead of following up our previous work on 4-VA. Thus, we have fully elaborated the multifunction of 4-VA as both aggregation pheromone and maturation accelerating pheromone in the formation and maintenance of locust swarm in the discussion part.

      2. Research on pheromonal signaling in locusts have traditionally focused on compounds with a putative role in density-dependent phase-specific behaviors. Hence, it is common to compare the response of crowd-reared vs. solitary locusts to applied chemicals. The challenge, however, is maintaining the density context, while attempting to conduct controlled similar experiments with locusts of the two phases (i.e. keeping the solitary phase locusts isolated, while the gregarious locusts must always be crowded). This is even more challenging when studying reproductive physiology. By the basic nature of the two phases, there can be a multitude of interacting factors (behavioral and/or physiological) affecting the much-desired reproductive synchronization in gregarious locusts, while such synchronization is not expected at all in solitary ones (it may even be claimed to have no fitness-related advantage).

      3. In general, the authors of the current report have dealt well with these challenges, taking extra care to conduct multiple controls and making an effort to specifically test all the possible factors. However, there are several points that raise some uncertainties. For example:

      o If I am not mistaken, females of both phases were included in the study only if already mated by day A+7 (LL355-357). While this is reasonable for gregarious locusts, it may not be suitable for the solitary locusts, imposing an undesired and unequal selection criterion.

      We thank the reviewer’s comments. We don’t think the criterion (mated at PAE 6-7 days) cause significant bias in either gregarious locusts or solitarious locusts. In fact, the limitation of mating before PAE 7 days is used to rule out the effects on oviposition synchrony caused by difference in mating age among individuals. This criterion is only limited during the analysis of the first oviposition date. On the premise of consistent mating time, oviposition consistency in gregarious female adults may largely present the sexual maturation synchrony among individuals (Figure 1A). For subsequent experiments, we mainly concentrate on regulation of sexual maturation using only virgin females in all experiments.

      o In the test of the effects of conspecifics interactions, 10 gregarious locusts provided stimulation to the tested gregarious female, while only one insect was the stimulating factor for the solitary female.

      Actually, we carried out two independent experiments to test the effects of conspecifics interactions. The population densities were kept in solitarious context for comparison of female sexual maturation synchrony between typical gregarious and solitarious phases (Figure 1D). For locust emissions treatments, ten solitarious locusts were used to ensure the stimulations at the same density level (Figure 1F). Both of two experiments suggested that solitarious male adults had no effects on female sexual maturation.

      o It is not clear how were egg pods attributed to specific gregarious females (maintained in groups of 10)

      Thanks for the reviewer’s comments. To monitor the oviposition activities of each individual of gregarious females in a group, locusts were individually marked, and their first oviposition times were determined by collecting egg pods every 4 hours per day after mating. Females those laid new eggs could be easily distinguished by much thinner abdomen with white foam around ovipositor. We have provided the method details in the revised manuscript.

      Overall, since the focus of this study is actually not on the comparison between the phases, it might have been beneficial to the readers if the focus was on the gregarious locusts only, with maybe a couple of experiments conducted on solitary insects and presented separately.

      We understand the reviewer’s concern. Actually, the aim of this study is to explore the mechanism underlying sexual maturation synchrony by comparing phase- and sex-dependent conspecific interactions in locusts. The reproductive synchrony in gregarious might be not highlighted without comparison with solitarious locusts, including both first oviposition time and sexual maturation, although the mechanism studies were mostly performed in gregarious locusts. Moreover, phase-dependent comparison of volatile contents is helpful for us to screen candidate volatiles responsible for the acceleration of sexual maturation synchrony in females.

      4. Assuming that within a locust group there is overall agreement in the age of males and females, there seem to be a not-fully-explained mismatch between the age of max 4-VA release by males (linearly increasing with age) and the age of max effect in females (critical period at A+3-4)

      We appreciate the reviewer’s query. We have provided additional discussions on the “mismatch” of between age-dependent release of 4-VA by males and the age of max effect in females (PAE 3-4 days). Details were shown as: “. We find that the release of 4-VA by gregarious males continuously increased after adult eclosion, with maximal 4-VA release at PAE 8 days. The age of maximal 4-VA production outwardly seems to be unmatched with the sensitive developmental stage to 4-VA of females (PAE 3-4 days). In insects, it is very common for males to mature earlier than females (Alonzo, 2013). In the locust, male adults also display earlier sexual maturation for several days, compared to females. In given locust population, individuals emerge to adults successively in a couple of days, not in completely synchronous period. Therefore, age-dependent increase in 4-VA release in gregarious male adults presents a persistent stimulus for less-developed young female adults, and thus maximizes synchronous maturation of female locusts, which could reduce male competitions for mate selection”.

      5. Similar to the introduction, the discussion section also does not present comprehensive arguments regarding the importance of reproductive synchronization in female locusts. Points that could have been discussed include: females' oviposition disrupting migration, synchronization affecting sexual selection, accelerating intra-sex competition over mates as well as oviposition sites, and more.

      We appreciate the reviewer’s nice suggestions. We have provided additional discussions on this point following these suggestions. Details were shown as: “Reproduction synchrony involves consistence in maturation, mating, and egg laying, among which sexual maturation synchrony serves as the most foundational step for oviposition uniformity (Hassanali et al., 2005). Extremely high energy cost for female reproduction could restrict migration to pre, post, or inter oviposition period in locusts, thus have crucial effects on collective movement of local populations (Min et al., 2004). Given this, a balance of sexual maturation timing among female members presents an essential subject for maintenance of locust swarms. We here demonstrated that young female adults reared with older gregarious male adults show faster and more synchronous sexual maturation in the migratory locust, supporting the accelerate role of crowding in sexual maturation of females (Guo and Xia, 1964, Norris and Richards, 1964,). Together with the accelerating effects on immature male sexual maturation induced by older gregarious male adults reported previously (Torto et al., 1994, Mahamat et al., 2000), young adults of both sexes lived in gregarious conditions prefers more synchronous maturation than individuals reared in solitary. The consistent maturation in both sexes will greatly reduce intra- and inter-sexes competitions for mate selection and thus ensures reproductive synchronous in whole locust populations. We demonstrated that a single minor component (4-VA) of the volatiles abundantly released by gregarious male adults is sufficient to induce the maturation synchrony of female adults. By comparison, four volatiles (benzaldehyde, veratrole, phenylacetonitrile, and 4-vinylveratrole) showed stimulatory effects on male maturation (Mahamat et al., 2000). Thus, there might exist a sex-dependent action modes of maturation-accelerating pheromones: multi-component pheromones for males and single active component for females, possibly due to different selective pressures between two sexes in response to social interaction. Further exploration will be performed to confirm this hypothesis by determining whether 4-VA has maturation-accelerating effects on male adults in the migratory locust in future”.

      Reviewer #3 (Public Review):

      Strengths: Grouping behavior for marching, sexual maturation, swarming, oviposition and egg hatching in gregarious locusts is complex and it's mediated by a combination of cues-olfactory, tactile, and visual cues to ensure synchronous behavior. The authors show that only olfactory cues released by gregarious adult males mediates maturation synchrony of females. This finding is a confirmatory result of a well-established phenomenon for maturation synchrony in both sexes of adult locusts, although in this study, the authors focused on only females. Further, the authors validated their findings using gene editing techniques to show that maturation synchrony was diffused in Or35-/- mutant adult females but not in wild type females exposed to adult male volatiles and the individual component identified as 4-vinylanisole among five male-abundant volatiles as promoting synchronous sexual maturation in only post adult eclosion females (PAE) 3-4 days old. Use of molecular and single sensillum recordings, followed by physiological experiments focused on the interaction between this specific adult pheromone and juvenile hormone to validate the behavioral results found for females add scientific value to the study.

      Weaknesses: Firstly, synchronous and accelerated sexual maturation of young adults by older pheromone-producing ones, is a primer effect driven by males and this facilitates 'integration and cohesion' of both sexes of adults. In my view, the fact that this study focused on only females but not on both sexes, weakens the contribution of the study towards increased understanding of the biology/ecology of locusts.

      We accepted the reviewer’s comment that synchronous and accelerated sexual maturation of young adults by older pheromone-producing ones occurs in both sexes. In fact, early studies have reported that mature males can accelerate sexual maturation of young males through several candidate compounds (Mahamat et al.,1993, Chemoecology; and Mahamat et al., 2000; International Journal of Tropical Insect Science). However, the effects of conspecific interaction on sexual maturation of females are rarely reported. Moreover, distinct volatiles that can accelerate female sexual maturation have not been characterized before this work. Therefore, we focus on female sexual maturation synchrony in the current study. A comparison of regulatory mechanisms underlying sexual maturation synchrony in males and females has been discussed in the revised manuscript.

      There are also weaknesses in the methods, such as focusing on only the five-abundant male volatiles based on heat maps. Basically, the decision as to which components in adult male volatiles may be contributing to sexual maturation should be made by antennae of different ages of PAE females and males to avoid selecting only abundant compounds based on artificial intelligence (AI). Since most studies in this subject area have demonstrated that there is no direct correlation between volatile abundance and detection at the periphery or central nervous systems of an insect, I believe that the authors will agree with me that often some of the minor volatile components tend to contribute more to the chemical ecology of an insect than the more abundant components. Without testing minor components identified in male volatiles as a blend or individually, as additional controls to increase the robustness of the study, I am not convinced that the authors have fully achieved their aim in identifying a male-produced volatile that promotes sexual maturation in females.

      We agree the reviewer’s comments that the activities of volatiles are not always determined by the absolute contents. In fact, in our work, the selection of candidate effective compounds for female sexual maturation did not rely on the absolute content of these volatiles, but mainly based on comparative analysis of their relative contents between gregarious and solitarious male adults, because only volatiles from gregarious male adults could accelerate sexual maturation of females (Figure 1C-F). In the revision process, given that the volatiles released by gregarious males, rather than gregarious females and solitarious males, have the accelerate effects on female sexual maturation, we further performed more comparative analysis of volatile contents among these three groups (G-males, G-females, and S-males). Compared to volatiles released by G-females, and S-males, only five kinds of volatiles display significantly higher emission in G-males (PAN, guaicol, 4-VA, vertrole, and anisole). The roles of five candidate volatiles in female sexual maturation were individually validated by removing the volatile from the stimulation blend one by one. The results showed that only the omission of 4-VA from the blends lost the accelerating effects on sexual maturation synchrony of gregarious females (Figure 2B). Based on these findings, we inferred that 4-VA played major roles in promoting female sexual maturation synchrony.

      JH experiments- My main concern is the lack of proper controls to fully investigate the interactive effect of the male-produced pheromone promoting sexual maturation and juvenile hormone production. JH titers were not measured in females exposed to the other male-abundant compounds including PAN, guaiacol, veratrole and anisole or blend/individual minor components.

      We understand the reviewer’s query. In fact, the potential role of JH pathway was inferred firstly by the RNA-seq analysis of CC-CA, which showed that the expression levels of JH metabolism-related genes were significantly affected by 4-VA treatment at PAE 3-4 days. The measurement of JH titer after 4-VA treatment was further performed to support the involvement of JH in 4-VA-accelerated sexual maturation in female adults. Since other male-abundant compounds have been excluded due to the omission of any of the four volatiles (Figure 2B), we don’t think it is necessary to detect their effects on JH titers in females including PAN, guaiacol, veratrole, or anisole.

      Another notable weakness is the 'JH Rescue Experiment'. The authors did not inhibit JH synthesis in the corpora allata (allalectomized locusts) in treated locusts before injecting the JH-analog methoprene to accelerate maturation and reproduction in females.

      Thanks for the reviewer’s comments. The JH rescue experiments in Figure 4D-F were performed in Or35 female mutants, which showed lower JH levels and sexual maturation rate. Thus, the JH analog was applied to Or35^-/- females to test whether activation of JH pathway could recover sexual maturation rate and Vg expression. To provide additional evidence, we performed addition rescue experiments in WT females by inhibiting JH synthesis using Precocene (PI) before JH treatment. The results showed that PI treatment significantly inhibited sexual maturation rate and Vg expression in 4-VA-exposed WT females, whereas JH treatment post PI application can obviously recovered the sexual maturation rate and Vg expression (Figure 4G-I).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this report, Shekhar et al, have profiled developing retinal ganglion cells from embryonic and postnatal mouse retina to explore the diversification of this class of neurons into specific subtypes. In mature retina, scRNAseq and other methods have defined approximately 45 different subtypes of RGCs, and the authors ask whether these arise from a common postmitotic precursor, or many ditinct subtypes of precursors. The overall message, is that subtype diversification arises as a "gradual, asynchronus fate restriction of postmitotic multipotential precursors. The authors find that over time, clusters of cells become "decoupled" as they split into subclusters. This process of fate decoupling is associated with changes in the expression of specific transcription factors. This allows them to both predict lineage relationships among RGC subtypes and the time during development when these specification events occur. Although this conclusion based almost entirely on a computational analysis of the relationships among cells sampled at discrete times, the evidence presented supports the overall conclusion. Future experimental validation of the proposed lineage relationships of RGC subtypes will be needed, but this report clearly outlines the overall pattern of diversification in this cell class.

      We thank the reviewer for their thoughtful assessment of our study.

      Reviewer #2 (Public Review):

      The manuscript "Diversification of multipotential postmitotic mouse retinal ganglion cell precursors into discrete types" by Shekhar and colleagues represents an in-depth analysis of an additional transcriptomic datasets of retinal single-cells. It explores the progression of retinal ganglion cells diversity during development and describes some of aspects of fate acquisition in these postmitotic neurons. Altogether the findings provide another resource on which the neural development community will be able to generate new hypotheses in the field of retinal ganglion cell differentiation. A key point that is made by the authors regards the progression of the number of ganglion cell types in the mouse retina, i.e., how, and when neuronal "classes diversify into subclasses and types" (also p. 125). In particular, the authors would like to address whether postmitotic neurons follow either a predetermination or a stepwise progression (Fig. 2a). This is indeed a fascinating question, and the analysis, including the one based on the Waddington-OT method is conceptually interesting.

      Comments and questions:

      Is the transcriptomic diversity, based on highly variable genes (the number of which is not detailed in the study) a robust proxy to assess cell types? One could argue that early on predetermined cell types are specified by a small set of determinants, both at the proteomic and transcriptomic level, and that it takes several days or week to generate the cascade that allows the detection of transcriptional diversity at the level of >100 gene expression levels.

      We had tested the dependence of our results on the number of highly variable genes (HVGs) used. This analysis, shown in Figure 2h, demonstrates that results are robust over the range tested – 1244-3003 total HVGs. Since the analysis in the paper employs 2800 HVGs (~800- 1500 at each stage), we are confident that we are in comfortable excess of the number at which we would need to worry. We have expanded the discussion to avoid confusion on this point. We also address the possibility that a small set of determinants are sufficient to define cell state in a transcriptomic study. This is a common argument, but we believe it is a tenuous one. We believe that the only way a small number of genes can truly define cell state is if they are expressed at very high levels. If these are expressed at high levels, they should be detected in our data and should drive the clustering. If they are expressed at extremely low levels, then given the nature of molecular fluctuations in cells, they cannot be expected to serve as a stable scaffold for differentiation. Indeed, a small set of determinants (usually transcription factors) may be necessary to specify a cell type. However, sufficiency of specification requires the expression of a usually much larger of number downstream regulators.

      Since there are many RGC subsets (45) that share a great number of their gene expression, is it possible that a given RGC could transition from one subset to another between P5 and P56? Or even responding to a state linked to sustained activity? Was this possibility tested in the model?

      We cannot address the possibility that cells swap types postnatally so that the cells comprising type X at P5 are not the same ones that comprise type X at P56. It does seem pretty unlikely, as the cell types are well-separated in transcriptional space (~250 DE genes on average). Regarding activity, we have made some initial tests by preventing visually evoked activity from birth to P56 in three different ways (dark-rearing and two mutant lines). We find no statistically significant effect on diversification. These results are currently being prepared for publication.

      The authors state that early during development there is less diversity than later. This statement seems obvious but how much. Can this be due to differential differentiation stage? At E16 RGC are a mix of cells born from E11 to E16, with the latter barely located in the GCL. Does this tend to show a continuum that is may be probably lost when the analysis is performed on cells isolated a long time after they were born (postnatal stages)? Alternatively, would it be possible to compare RGC that have been label with birth dating methods?

      Regarding the amount of diversification, we quantified this using the Rao diversity index (Figure 2h), which suggests an overall increase in 2-fold transcriptional diversity at P56 compared to the early stages. The continuum is likely because cells at early stage are close to the precursor stage and not very differentiated. Regarding combining RNA-seq with birthdating, although elegant methods now make this combination possible, it falls beyond the scope of this study.

      Comparing data produced by different methods can be challenging. Here the authors compared transcriptomic diversity between embryonic dataset produced with 10X genomics (E13 to P0) and, on the other hand, postnatal P5 that were produced using a different drop-seq procedure). Is it possible to control that the differences observed are not due to the different methods?

      It is correct that most of the P5 data was produced using Drop-seq, but that dataset also includes transcriptomes obtained by the 10X method. The relative frequency of RGC clusters and the average gene expression values obtained using either method was highly correlated (Reviewer Fig. 1). This is now pointed out in the “Methods.”

      Reviewer Fig. 1. Comparison between the relative frequency of types (left) and the average gene expression levels (right) at P5 between 10X data (y-axis) and Drop-seq data (x-axis). R corresponds to the Pearson correlation coefficient. The axes are plotted in the logarithmic scale.

      It might be important to control the conclusion that diversity is lower at E13 vs P5 when we see that thrice less cells (5900 vs 180000) were analyzed at early stage (BrdU, EdU, CFSE...)? A simple downsampling prior to the analysis may help.

      Although we collected different numbers of cells at different ages, we noted in the text that they do not influence the number of clusters. Regarding P5 specifically, Rheaume et al. (who we now discuss) obtained very similar results to ours with only 6000 cells (3x lower).

      Ipsilateral RGC: It is striking that the DEG between C-RGC and I-RGC reflect a strong bias with cells scored as" ipsi" are immature RGC while the other ("contra") are much more mature. This bias comes from the way ipsilateral RGC were "inferred" using non-specific markers. Can the author try again the analysis by identifying RGC using more robust markers? (eg. EphB1). Would it be possible to select I-RGC and C-RGC that share same level of differentiation? Previous studies already identified I-RGC signature using more specific set-up (Wang et al., 2016 from retrogradely labelled RGC; Lo Giudice et al., 2019 with I-RGC specific transgenic mouse).

      We are not sure how the reviewer concludes that the putative I-RGCs are more immature than the putative C-RGCs. As discussed earlier, insofar as expression levels of pan-RGC markers are indicative of maturational stage, we found no evidence that clustering is driven by maturation gradients. Thus, we expect our putative I-RGCs and C-RGCs to not differ in differentiation state. Following the reviewer’s suggestion, we now include EphB1(Ephb1) in our I-RGC signature. The impact of replacing Igfbp5 with Ephb1 on the inferred proportion of I-RGCs within each terminal type was minimal (Reviewer Fig. 2). We would like to note that to assemble our IRGC/C-RGC signatures we relied on data presented Wang et al. (2016). Outside of wellestablished markers (e.g. Zic2, and Isl2), we chose the RNA-seq hits in Wang et al. that had been validated histologically in the same paper or that are correlated with Zic2 expression in our data. This nominated Igfbp5, Zic1, Fgf12, and Igf1.

      Reviewer Fig. 2. Comparison of inferred I-RGC frequency within each terminal type (points) using two I-RGC signature reported in the paper. For the y-axis we used Zic2 and EphB1.

      It would be important to discuss how their findings differs from the others (including Rheaume et al., 2018). To make a strong point, I-RGC shall be isolated at a stage of final maturation (P5?) and using retrograde labelling, which is a robust method to ensure the ipsilateral identity of postnatal RGCs.

      We cite Rheaume et al. in several places. In fact, there is good transcriptional correspondence between our dataset and theirs (Figure S1i), despite the differences in the number of cells profiled (~6000 vs ~18000) and technologies (10X vs. Drop-seq/10X). We now mention this is the text. Note also that we had compared our P56 data with Rheaume et al.’s, P5 data in an earlier publication (Tran et al., 2019) and observed a similar tight correspondence between clusters. Zic1 is expressed in I-RGCs (Wang et al., 2016) at early stages, and in our dataset its expression at E13 and E14 is similar to that of Zic2 (Supplementary Fig. 8); Postnatally, however, it marks W3B RGCs (Tran et al., 2019), many of which project contralaterally (Kim et al., J. Neurosci. 2010). Regarding retrograde labeling, as noted above, additional experiments would take a prohibitively long time (up to a year) to complete.

      It is unclear how good Zic1 and Igf1 can be used as I-RGC marker. Can the author specify how specific to I-RGC they are? Have they been confirmed as marker using retrograde labelling experiments?

      We have relied on previous work, primarily from the Mason lab, to choose I-RGC and C-RGC markers. Igf1 is a C-RGC marker that is expressed in a complementary fashion with Igfbp5, an I-RGC marker as noted in Wang et al, 2016. They also perform ISH to show that Igf1 is not expressed in the VT crescent, while Igfbp5 is (see Fig. 5 in Wang et al., 2016). Similarly, Zic1 is also cited in Wang et al. as an RNA-seq hit for I-RGCs. Although Zic1 was not validated using ISH, we found its expression pattern to be highly correlated with Zic2 at E13 (Supplementary Fig. 8c).

      The enrichment procedure may deplete the RGC subpopulation that express low levels of Thy1 or L1CAM. A comparison on that point could be done with the other datasets analysed in the study.

      We presume the reviewer is referring to the data of Lo Guidice and Clark/Blackshaw, which we show in comparison to ours in Figure S1. In both of those studies, all retinal cells were analyzed, whereas we enriched RGCs. As noted in the text, RGCs comprise a very small fraction of all retinal cells, so Lo Giudice and Clark/Blackshaw lacked the resolution to resolve RGC diversity at later time points. Indeed, there is no whole retina dataset available in which RGCs are numerous enough for comprehensive subtyping. Our approach to this issue was to collect RGCs with both Thy1 and L1 at E13, E14, E16 and P0, with the idea that the markers might have complementary strengths and weaknesses. In fact, at each age, all clusters are present in both collection types, although frequencies vary. This concordance supports the idea that neither marker excludes particular types. We now stress this point in results and in the Supplementary Fig. 2 legend.

      In supplemental Fig. S1e: why are cells embedded from "Clark" datasets only clusters on the right side of the UMAP while the others are more evenly distributed?

      Actually, both the Clark et al. and Lo Giudice et al. datasets are predominantly clustered on the right side of the UMAP. This reflects the methodological difference noted above: they profiled the whole retina, whereas we isolated RGCs. Thus, their datasets contain a much higher abundance of RPCs and non-neurogenic precursors compared to ours. The right clusters represent RPCs due to their expression of Fgf15 and other markers, while the left clusters represent RGCs based on their expression of Nefl. Indeed, a main reason for including these plots was to illustrate the relative abundance of RGCs in our data (also see Supplementary Fig. S1h).

      What could explain that CD90 and L1CAM population are intermingled at E14, distinct at E16, and then more mixed at P0?

      We believe the reviewer is referring to Supplementary Figs. S2a-c. Given the temporal expression level changes in Thy1 and L1cam (Supplementary Fig. S1c) in RGCs, a likely possibility is that they enrich RGC precursor subsets at different relative frequencies. We now note this in the Supplementary Fig. 2 legend.

      On Fig. 6: the E13 RGC seems to be segregated in early born RGC expressing Eomes and later born expressing neurod2. Thus, fare coupling with P5 seems to suggest that Eomes population at P5 may have been generated first, and Neurod2 generated later. Is that possible?

      That the Eomes RGCs are specified before Neurod2 RGCs is one of our conclusions from the fate decoupling analysis (Figures 6f-h). Whether this is because the former arise from early born cells and the latter arise from later born cells is not clear. There is disagreement in the literature on whether ipRGCs are born at a different time than other RGCs, so we prefer not to make a comment.

      Methods: The Methods section is extensive, and yet it is presented in a rather complex manner so that it is difficult to understand for a broad audience. It would be valuable if the authors could simplify or better explain some parts (the WOT section in particular).

      We believe that the sections on animals, molecular biology and histology are quite straightforward, but agree that the sections describing the computational analysis are hard going. We have modified them in several places as requested. As regards better explanation of the WOT, we now precede that section with an “overview” as a way of making it easier to follow. (We had already included an overview of the clustering procedures.) We have also provided further detail on some of the reviewer’s subsequent questions on this section, including the use of HVGs, the Classifier, and the strategy for inferring I-RGCs (see below). Perhaps most important, we have worked to make the “Results” and “Discussion” sections accessible to a broad audience.

      *Highly variable genes (HVG) used for clustering and dimensionality reduction: how many of them and what are they? Are they the same used for each stage?

      Since clustering was performed at each stage independently, we determined HVGs at each stage separately using a statistical method introduced in one of our previous studies (Pandey et al., Current Biology, 2018). The total number of HVGs at each stage were as follows: E13: N=1094 E14: N=834 E16: N=822 P0: N=881 P5: N=1105 P56: N=1510

      We note that these are not necessarily the same at each stage due to the temporal variation in gene expression. Together these correspond to 2854 unique genes (union of all HVGs). The WOT analysis was done using this full set.

      *In the methods p9: "The common features G = GR ∩ GT are used to train a third classifier ClassR on the reference atlas AR. This ensures that inferred transcriptomic correspondences are based on "core" gene expression programs that underlie cell type identity rather than maturation-associated genes." Could the authors explain the relevance of using a third model and, more importantly, is there any genes that eliminated through the procedure that could be important to drive the diversification process? If so, would it be possible to estimate their number and the relative impact?

      The rationale for this was as follows. Our goal is to map cells from one time point to a type at another time point. The naïve way to do this would be to use a classifier trained entirely at either of the time point. However, the features of such a classifier is likely to contain genes that are not expressed at the earlier time point, and likely to generate spurious mappings (since the set of cluster specific genes are not identical). Therefore, we sought to train a classifier that is trained using genes that are part of conserved transcriptional signatures at both time points, which corresponds to the third model.

      When this filtering was not performed, the temporal correspondences in the supervised classification model were less specific than those reported. In particular, ARI values dropped by about 15% on average. The simple reason for this is that a cluster specific gene at E13 (for e.g.) may no longer be expressed at E14, and vice-versa. Thus, by restricting the features to a common set of cluster specific genes, we obtained the “best possible” transcriptomic correspondences between clusters at consecutive time points. We note that the correspondences obtained in this way (Figure 3) were recovered through WOT when the results of the latter were collapsed at the cluster level (Supplementary Fig. 5).

      *Methods page 15: Inference of ipsilaterally-projecting RGC types. Wouldn't it be more valuable to consider more markers to distinguish RGC precursors?

      As indicated before, we used I-RGC genes and C-RGC genes reported in Wang et al., 2016 (Table 2), in addition to the well-known markers Zic2 and Isl2. Here, we prioritized genes that had been histologically validated (Figs. 4 and 5), which were expressed in our data (Sema3e and Tbx20 were not considered as these undetectable at E13 in our data). Following the reviewer’s earlier suggestion, we also noted that including Ephb1 in our signature minimally impacts the results.

      Discussion: *Is there somewhat a plasticity that allow the RGC subgroups to switch over time? (IF we were to record the transcriptome of the same cell over time, will one observe that the cell belong to another cluster / subgroup?

      One can only speculate. Other than long-term in vivo imaging combined with vital type-specific markers we know of no way to experimentally address the possibility that cells swap types postnatally so that the cells comprising type x at P5 are not the same ones that comprise type x at P56. It does seem pretty unlikely though.

      *While the data appears technically rigorous, and the number of cells sequenced very high, the results seem redundant with several prior studies and the discrepancies are not sufficiently discussed.

      We are confused by this point, since the reviewer does not cite the papers to which s/he refers. To our knowledge there is no study at present that has described RGC diversification, so it is not clear what would be discrepant.

    1. Author Response:

      Joint Public Review:

      A highly robust result when investigating how neural population activity is impacted by performance in a task is that the trial to trial correlations (noise correlations) between neurons is reduced as performance increases. However the theoretical and experimental literature so far has failed to account for this robust link since reduced noise correlations do not systematically contribute to improved availability or transmission of information (often measured using decoding of stimulus identity). This paper sets out to address this discrepancy by proposing that the key to linking noise correlations to decoding and thus bridging the gap with performance is to rethink the decoders we use : instead of decoders optimized to the specific task imposed on the animal on any given trial (A vs B / B vs C / A vs C), they hypothesize that we should favor a decoder optimized for a general readout of stimulus properties (A vs B vs C).

      To test this hypothesis, the authors use a combination of quantitative data analysis and mechanistic network modeling. Data were recorded from neuronal populations in area V4 of two monkeys trained to perform an orientation change detection task, where the magnitude of orientation change could vary across trials, and the change could happen at cued (attended) or uncued (unattended) locations in the visual field. The model, which extends previous work by the authors, reproduces many basic features of the data, and both the model and data offer support for the hypothesis.

      The reviewers agreed that this is a potentially important contribution, that addresses a widely observed, but puzzling, relation between perceptual performance and noise correlations. The clarity of the hypothesis, and the combination of data analysis and computational modelling are two essential strengths of the paper.

      Overall this paper exhibits a new factor to be taken into account when analysing neural data : the choice of decoder and in particular how general or specific the decoder is. The fact that the generality of the decoder sheds light on the much debated question of noise correlations underscores its importance. The paper therefore opens multiple avenues for future research to probe this new idea, in particular for tasks with multiple stimuli dimensions.

      Nonetheless, as detailed below, the reviewers believe the manuscript clarity could be further improved in several points, and some additional analysis of the data would provide more straightforward test of the hypothesis.

      1. It would be important to verify that the model reproduces the correlation between noise and signal correlations since this is really a key argument leading to the author's hypothesis.

      We have incorporated this verification of the model into the manuscript, as referred to below in the Results:

      “Importantly, this model reproduces the correlation between noise and signal correlations (Figure 2–figure supplement 1) observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011). This correlation between the shared noise and the shared tuning is a key component of the general decoder hypothesis. We observed this strong relationship between noise and signal correlations in our recorded neurons (Figure 2–figure supplement 1A) as well as in our modeled data (Figure 2–figure supplement 1B). Using this model, we were able to measure the relationship between noise and signal correlations for varying strengths of attentional modulation. Consistent with the predictions of the general decoder hypothesis, attention weakened the relationship between noise and signal correlations (Figure 2–figure supplement 1C).”

      The new figure is as below:

      Figure 2–figure supplement 1. The model reproduces the relationship between noise and signal correlations that is key to the general decoder hypothesis. (A) As previously observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011), we observe a strong relationship between noise and signal correlations. During additional recordings collected during most recording sessions (for Monkey 1 illustrated here, n = 37 days with additional recordings), the monkey was rewarded for passively fixating the center of the monitor while Gabors with randomly interleaved orientations were flashed at the receptive field location (‘Stim 2’ location in Figure 1C). The presented orientations spanned the full range of stimulus orientations (12 equally spaced orientations from 0 to 330 degrees). We calculated the signal correlation for each pair of units based on their mean responses to each of the 12 orientations. We define the noise correlation for each pair of units as the average noise correlation for each orientation. The plot depicts signal correlation as a function of noise correlation across all recording sessions, binned into 8 equally sized sets of unit pairs. Error bars represent SEM. (B) The model reproduces the relationship between noise and signal correlations. Signal correlation is plotted as a function of noise correlation, binned into 20 equally sized sets of unit pairs (n = 2000 neurons), for each attentional modulation strength (green: least attended; yellow: most attended). The results were averaged over 50 tested orientations. (C) The slope of the relationship between noise and signal correlations (y-axis) decreases with increasing attentional modulation (x-axis). This suggests that noise is less aligned with signal correlation with increasing attentional modulation.

      2. Testing the hypothesis of the general decoder:<br /> 2.1 In the data, the authors compare mainly the specific (stimulus) decoder and the monkey's choice decoder. The general stimulus decoder is only considered in fig. 3f, because data across multiple orientations are available only for the cued condition, and therefore the general and specific decoders cannot be compared for changes between cued and uncued. However, the hypothesized relation between mean correlations and performance should also be true within a fixed attention condition (cued), comparing sessions with larger vs. smaller correlation. In other words, if the hypothesis is correct, you should find that performance of the "most general" decoder (as in fig. 3f) correlates negatively with average noise correlations, across sessions, more so than the "most specific" decoder.<br /> We have added a new supplementary figure to the manuscript:

      Figure 3–figure supplement 1. Based on the electrophysiological data, the performance of the monkey’s decoder was more related to mean correlated variability than the performance of the specific decoder within each attention condition. (A) Within the cued attention condition, the performance of the monkey’s decoder was more related to mean correlated variability (left plot; correlation coefficient: n = 71 days, r = -0.23, p = 0.058) than the performance of the specific decoder (right plot; correlation coefficient: r = 0.038, p = 0.75). The correlation coefficients associated with the two decoders were significantly different from each other (Williams’ procedure: t = 3.8, p = 1.5 x 10^-4). Best fit lines plotted in gray. Data from both monkeys combined (Monkey 1 data shown in orange: n = 44 days; Monkey 2 data shown in purple: n = 27 days) with mean correlated variability z-scored within monkey. (B) The data within the uncued attention condition showed a similar pattern, with the performance of the monkey’s decoder more related to mean correlated variability (n = 69 days, r = -0.20, p = 0.14) than the performance of the specific decoder (r = 0.085, p = 0.51; Williams’ procedure: t = 2.0, p = 0.049). Conventions as in (A) (Monkey 1: n = 42 days – see Methods for data exclusions as in Figure 3C; Monkey 2: n = 27 days).

      2.2 In figure 3f, a more straightforward and precise comparison is to use the stimulus decoders to predict the choice, and test whether the more specific or the more general can predict choices more accurately.

      We have added a new panel to Figure 3 (Figure 3G) that illustrates the results of this analysis comparing whether the specific or more-general decoders predict the monkey’s trial-by-trial choices more accurately:

      Figure 3… (G) The more general the decoder (x-axis), the better its performance predicting the monkey’s choices on the median changed orientation trials (y-axis; the proportion of leave-one-out trials in which the decoder correctly predicted the monkey’s decision as to whether the orientation was the starting orientation or the median changed orientation). Conventions as in (F) (see Methods for n values).

      The description of this new panel in the Results section is as below:

      “Further, the more general the decoder, the better it predicted the monkey’s trial-by-trial choices on the median changed orientation trials (Figure 3G).”

      The updated Methods section describing this new panel is as below:

      “For Figure 3G, we performanced analyses similar to those performed for Figure 3F, in that we tested each stimulus decoder: ‘1 ori’ decoders (n = 8 decoders; 1 specific decoder for either the first, second, fourth, or fifth largest changed orientation, for each of the 2 monkeys), ‘2 oris’ decoders (n = 12 decoders; 1 decoder for each of the 6 combinations of 2 changed orientations, for each of the 2 monkeys), ‘3 oris’ decoders (n = 8 decoders; 1 decoder for each of the 4 combinations of 3 changed orientations, for each of the 2 monkeys), and ‘4 oris’ decoders (n = 2 decoders; 1 decoder for the 1 combination of 4 changed orientations, for each of the 2 monkeys). However, unlike in Figure 3F, where the performance of the stimulus decoders was compared to the performance of the monkey’s decoder on the median orientation-change trials, here we calculated the performance of the stimulus decoder when tasked with predicting the trial-by-trial choices that the monkey made on the median orientation-change trials. We plotted the proportion of leave-one-out trials in which each decoder correctly predicted the monkey’s choice as to whether the orientation was the starting orientation or the median changed orientation.”

      3. The main goal of the manuscript is to determine the impact of noise correlations on various decoding schemes. The figures however only show how decoding co-varies with correlations, but a direct, more causal analysis of the effect of correlations on decoding seems to be missing. Such an analysis can be obtained by comparing decoding on simultaneously recorded activity with decoding on trial-shuffled activity, in which noise-correlations are removed.

      We have added the following Discussion section to address this point:

      “The purpose of this study was to investigate the relationship between mean correlated variability and a general decoder. We made an initial test of the overarching hypothesis that observers use a general decoding strategy in feature-rich environments by testing whether a decoder optimized for a broader range of stimulus values better matched the decoder actually used by the monkeys than a specific decoder optimized for a narrower range of stimulus values. We purposefully did not make claims about the utility of correlated variability relative to hypothetical situations in which correlated variability does not exist in the responses of a group of neurons, as we suspect that this is not a physiologically realistic condition. Studies that causally manipulate the level of correlated variability in neuronal populations to measure the true physiological and behavioral effects of increasing or decreasing correlated variability levels, through pharmacological or genetic means, may provide important insights into the impact of correlated variability on various decoding strategies.”

      4. How different are the four different decoders (specific/monkey, cued/uncued)? It would be interesting to see how much they overlap. More generally, the authors should discuss the alternative that attention modulates also the readout/decoding weights, rather than or in addition to modulating V4 activity.

      We have added the following to the manuscript:

      A fixed readout mechanism

      A prior study from our lab found that attention, rather than changing the neuronal weights of the observer’s decoder, reshaped neuronal population activity to better align with a fixed readout mechanism (Ruff & Cohen, 2019). To test whether the neuronal weights of the monkey’s decoder changed across attention conditions (attended versus unattended), Ruff and Cohen switched the neuronal weights across conditions, testing the stimulus information in one attention condition with the neuronal weights from the other. They found that even with the switched weights, the performance of the monkey’s decoder was still higher in the attended condition. The results of this study support the conclusion that attention reshapes neuronal activity so that a fixed readout mechanism can better read out stimulus information. In other words, differences in the performance of the monkey’s decoder across attention conditions may be due to differences in how well the neuronal activity aligns with a fixed decoder.

      Our study extends the findings of Ruff and Cohen to test whether that fixed readout mechanism is determined by a general decoding strategy. Our findings support the hypothesis that observers use a general decoding strategy in the face of changing stimulus and task conditions. Our findings do not exclude other potential explanations for the suboptimality of the monkey’s decoder, nor do they exclude the possibility that attention modulates decoder neuronal weights. However, our findings together with those of Ruff and Cohen shed light on why neuronal decoders are suboptimal in a manner that aligns the fixed decoder axis with the correlated variability axis (Ni et al., 2018; Ruff et al., 2018).”

      5. Quantifying the link between model and data :<br /> 5.1 the text providing motivation for the model could be improved. The motivation used in the manuscript is, essentially, that the model allows to extrapolate beyond the data (more stimuli, more repetitions, more neurons). The dangers of extrapolation beyond the range of the data are however well known. A model that extrapolates beyond existing data is useful to design new experiments and test predictions, but this is not done here. Because the manuscript is about information and decoding, a better motivation is the fact that this model takes an actual image as input, and produces tuning and covariance compatible with each other because they are constrained by an actual network that processes the input (as opposed to parametric models where tuning and covariance can be manipulated independently).

      We have modified the manuscript as below:

      “Here, we describe a circuit model that we designed to allow us to compare the specific and monkey’s decoders from our electrophysiological dataset to modeled ideal specific and general decoders. The primary benefit of our model is that it can take actual images as inputs and produce neuronal tuning and covariance that are compatible with each other because of constraints from the simulated network that processed the inputs (Huang et al., 2019). Parametric models in which tuning and covariance can be manipulated independently would not provide such constraints. In our model, the mean correlated variability of the population activity is restricted to very few dimensions, matching experimentally recorded data from visual cortex demonstrating that mean correlated variability occupies a low-dimensional subset of the full neuronal population space (Ecker et al., 2014; Goris et al., 2014; Huang et al., 2019; Kanashiro et al., 2017; Lin et al., 2015; Rabinowitz et al., 2015; Semedo et al., 2019; Williamson et al., 2016).”

      “Our study also demonstrates the utility of combining electrophysiological and circuit modeling approaches to studying neural coding. Our model mimicked the correlated variability and effects of attention in our physiological data. Critically, our model produced neuronal tuning and covariance based on the constraints of an actual network capable of processing images as inputs.”

      We have also removed the Results and Discussion text that suggested that the model allowed us to extrapolate beyond the data.

      5.2 The ring structure, and the orientation of correlations (Fig 2b) seem to be key ingredients of the model, but are they based on data, or ad-hoc assumptions?

      We have modified the manuscript to clarify this point, as below:

      “As the basis for our modeled general decoder, we first mapped the n-dimensional neuronal activity of our model in response to the full range of orientations to a 2-dimensional space. Because the neurons were tuned for orientation, we could map the n-dimensional population responses to a ring (Figure 2B, C). The orientation of correlations (the shape of each color cloud in Figure 2B) was not an assumed parameter, and illustrates the outcome of the correlation structure and dimensionality modeled by our data. In Figure 2B, we can see that the fluctuations along the radial directions are much larger than those along other directions for a given orientation. This is consistent with the low-dimensional structure of the modeled neuronal activity. In our model, the fluctuations of the neurons, mapped to the radial direction on the ring, were more elongated in the unattended state (Figure 2B) than in the attended state (Figure 2C).”

      5.3 In the model, the specific decoder is quite strongly linked to correlated variability and the improvement of the general decoder is clear but incremental (0.66 vs 0.83) whereas in the data there really is no correlation at all (Fig 3c). This is a bit problematic because the author's begin by stating that specific decoders cannot explain the link between noise correlations and accuracy but their specific decoder clearly shows a link.

      We appreciate this point and have modified the manuscript as below:

      “Indeed, we found that just as the performance of the physiological monkey’s decoder was more strongly related to mean correlated variability than the performance of the physiological specific decoder (Figure 3C; see Figure 3–figure supplement 1 for analyses per attention condition), the performance of the modeled general decoder was more strongly related to mean correlated variability than the performance of the modeled specific decoder (Figure 3D). We modeled much stronger relationships to correlated variability (Figure 3D) than observed with our physiological data (Figure 3C). We observed that the correlation with specific decoder performance was significant with the modeled data but not with the physiological data. This is not surprising as we saw attentional effects, albeit small ones, on specific decoder performance with both the physiological and the modeled data (Figure 3A, B). Even small attentional effects would result in a correlation between decoder performance and mean correlated variability with a large enough range of mean correlated variability values. It is possible that with enough electrophysiological data, the performance of the specific decoder would be significantly related to correlated variability, as well. As described above, our focus is not on whether the performance of any one decoder is significantly correlated with mean correlated variability, but on which decoder provides a better explanation of the frequently observed relationship between performance and mean correlated variability. The performance of the general decoder was more strongly related to mean correlated variability than the performance of the specific decoder.”

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.”

      6. General decoder: Some parts of the text (eg. Line 60, Line 413) refer to a decoder that accounts for discrimination along different stimulus dimensions (eg. different values of orientation, or different color of the visual input). But the results of the manuscripts are about a general decoder for multiple values along a single stimulus dimension. The disconnect should be discussed, and the relation between these two scenarios explained.

      We have modified the manuscript as below:

      “Here, we report the results of an initial test of this overarching hypothesis, based on a single stimulus dimension. We used a simple, well-studied behavioral task to test whether a more-general decoder (optimized for a broader range of stimulus values along a single dimension) better explained the relationship between behavior and mean correlated variability than a more-specific decoder (optimized for a narrower range of stimulus values along a single dimension). Specifically, we used a well-studied orientation change-detection task (Cohen & Maunsell, 2009) to test whether a general decoder for the full range of stimulus orientations better explained the relationship between behavior and mean correlated variability than a specific decoder for the orientation change presented in the behavioral trial at hand.

      This test based on a single stimulus dimension is an important initial test of the general decoder hypothesis because many of the studies that found that performance increased when mean correlated variability decreased used a change-detection task…”

      “We performed this initial test of the overarching general decoder hypothesis in the context of a change-detection task along a single stimulus dimension because this type of task was used in many of the studies that reported a relationship between perceptual performance and mean correlated variability (Cohen & Maunsell, 2009; 2011; Herrero et al., 2013; Luo & Maunsell, 2015; Mayo & Maunsell, 2016; Nandy et al., 2017; Ni et al., 2018; Ruff & Cohen, 2016; 2019; Verhoef & Maunsell, 2017; Yan et al., 2014; Zénon & Krauzlis, 2012). This simple and well-studied task provided an ideal initial test of our general decoder hypothesis.

      This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      “This initial study of the general decoder hypothesis tested this idea in the context of a visual environment in which stimulus values only changed along a single dimension. However, our overarching hypothesis is that observers use a general decoding strategy in the complex and feature-rich visual scenes encountered in natural environments. In everyday environments, visual stimuli can change rapidly and unpredictably along many stimulus dimensions. The hypothesis that such a truly general decoder explains the relationship between perceptual performance and mean correlated variability is suggested by our finding that the modeled general decoder for orientation was more strongly related to mean correlated variability than the modeled specific decoder (Figure 3D). Future tests of a general decoder for multiple stimulus features would be needed to determine if this decoding strategy is used in the face of multiple changing stimulus features. Further, such tests would need to consider alternative hypotheses for how sensory information is decoded when observing multiple aspects of a stimulus (Berkes et al., 2009; Deneve, 2012; Lorteije et al., 2015). Studies that use complex or naturalistic visual stimuli may be ideal for further investigations of this hypothesis.”

      7. Some statements in the discussion such as l 354 "the relationship between behavior and mean correlated variability is explained by the hypothesis that observers use a general strategy" should be qualified : the authors clearly show that the general decoder amplifies the relationship but in their own data the relationship exists already with a specific decoder.

      We have modified the manuscript as below:

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.

      “Together, these results support the hypothesis that observers use a more general decoding strategy in scenarios that require flexibility to changing stimulus conditions.”

      “This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      8. Low-Dimensionality, beginning of Introduction and end of Discussion: experimentally, cortical activity is low-dimensional, and the proposed model captures that. But some of the reviewers did not understand the argument offered for why this matters, for the relation between average correlations and performance. It seems that the dimensionality of the population covariance is not relevant: The point instead is that a change in amplitude of fluctuations along the f'f' direction necessarily impact performance of a "specific" decoder, whereas changes in all other dimensions can be accounted for by the appropriate weights of the "specific" decoder. On the other hand, changes in fluctuation strength along multiple directions may impact the performance of the "general" decoder.

      We have modified the manuscript as below:

      “These observations comprise a paradox because changes in this simple measure should have a minimal effect on information coding. Recent theoretical work shows that neuronal population decoders that extract the maximum amount of sensory information for the specific task at hand can easily ignore mean correlated noise (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; for review, see Kohn et al., 2016). Decoders for the specific task at hand can ignore mean correlated variability because it does not corrupt the dimensions of neuronal population space that are most informative about the stimulus (Moreno-Bote et al., 2014).”

      “Our results address a paradox in the literature. Electrophysiological and theoretical evidence supports that there is a relationship between mean correlated variability and perceptual performance (Abbott & Dayan, 1999; Clery et al., 2017; Haefner et al., 2013; Jin et al., 2019; Ni et al., 2018; Ruff & Cohen, 2019; reviewed by Ruff et al., 2018). Yet, a specific decoding strategy in which different sets of neuronal weights are used to decode different stimulus changes cannot easily explain this relationship (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; reviewed by Kohn et al., 2016). This is because specific decoders of neuronal population activity can easily ignore changes in mean correlated noise (Moreno-Bote et al., 2014).”

    1. Author Response:

      Reviewer #1 (Public Review):

      The recent development of AlphaFold2 has improved the ability to predict protein fold from sequence. However, this approach typically yields a defined structural fold, while it is known that proteins exhibit structural diversity through different conformations. In particular, membrane transport proteins and receptors are known to adopt distinct conformational states in order to allow for alternate access or signaling across the membrane. In this study, the authors demonstrate that by reducing the size of the input sequence alignment fed into AlphaFold2, conformational diversity in the structural predictions is increased, with some of these corresponding to known experimentally determined structures. They test this with a diverse set of transporters where the structures have been solved in both inward and outward facing conformations, as well as GPCRs in active and inactive states. Decreasing the size of the sequence alignment from 5120 to 32 leads to a general increase in conformational diversity with the predicted structures and that these structures are generally bounded by the experimental structures. The RMSF analysis of residues amongst the different models, corresponds to RMSD of residues in the experimental structures, and principal component analysis demonstrates that these models connect the two known conformations. Altogether, this analysis validates that the ability to predict alternate conformations of transporters and receptors is already present in the AlphaFold2.

      This validation is important, but further analysis is necessary to move beyond a demonstration and towards a procedure for predicting relevant conformations. Along these lines, quantification of the robustness of the approach along different parameters is needed. Furthermore, the study stops short of defining how to statistically weed through the ensemble of models to predict meaningful conformations. AlphaFold2 may generate highly accurate models, but how does the user pick which ones are likely to be relevant? Therefore, this is an interesting study that is expected to be broadly impactful for the study of all proteins, not just membrane proteins tested here. However, limitations remain on the interpretation of the results and a clarification is needed to demonstrate how others may use this approach to predict new biologically relevant conformations.

      We believe the approach used in this manuscript can only sample the energy minimum; identification of the relevant individual states of interest will likely require experimental validation. Thus, we have modified the text to reinforce this point in various parts of the manuscript. We have edited the text at the end of “Introduction”:

      "Finally, we propose a modeling pipeline for researchers interested in sampling alternative conformations of specific membrane proteins, which we apply to the structurally unknown GPR114/AGRG5 adhesion GPCR as an example."

      Additional clarification is provided in “Results and Discussion” subsection “Distributions of predicted models relative to the experimental structures”:

      "Indeed, the models with the most extreme PC1 values were also among the most accurate: average TM-scores were 0.94 for the top one, top three and top ten PC1 models, and Pearson correlation coefficients between PC1 and TM-scores of the ensemble of models exceeded 0.8 for all transporters in this dataset. Moreover, the experimental structures virtually always flanked the AF2 models along PC1. The exception, PTH1R, was determined in a partially inactive and active conformation29, suggesting that models extending beyond the former state along PC1 may represent the fully inactive conformation. Therefore, these results indicate that accurate representative models of conformations of interest can be selected from the extreme positions along PC1."

      Finally, we have added a sentence in subsection "Concluding remarks":

      "Accurate representatives of distinct conformers were generally obtainable with exhaustive sampling and could be identified by performing PCA and selecting models at the extreme positions of PC1."

      Reviewer #3 public review:

      This manuscript describes a workflow for using AlphaFold2 (AF2) to model membrane proteins in different conformations. It then evaluates the models generated by this workflow on eight different membrane protein structures representing different structural classes and mechanisms. The authors conclude that AF2 can provide models with reasonable accuracy and conformational diversity of membrane proteins, but additional improvements are needed to be able to sample biologically relevant conformations.

      In principle, the research presented in this study is timely and can be of general interest to the community. It attempts to address the question of whether AF2 can accurately predict membrane protein dynamics. As the authors state, they provide "a hack" for modeling membrane proteins with AF2. My main concern with this manuscript is that the adopted workflow needs to be optimized and assessed more rigorously, in order to support the conclusions regarding the usefulness of AF2 for modeling membrane proteins.

      In addition to the importance of the topic, some strengths of the study include: focusing on proteins representing different folds and families, using different measures for structural evaluation, and presenting several examples in greater detail, particularly of important human proteins.

      My specific comments can be found below:

      A significant concern is that the Methods section of this manuscript is lacking. Additional details are needed in order to be able to evaluate the validity of the approach and reproduce these results. I list below some specific issues.

      The alignments used to develop the models should be provided. Specific details on how the visual inspection of the alignments guided their refinement should also be included. I could imagine that the alignment quality may correlate with model accuracy. This is an important analysis to include.

      We introduced modifications to the manuscript to clarify that all alignment subsampling was performed randomly by the AF2 program. As the major modification discussed here is the reduction of the size of the MSA subsampled at each iteration of the program, our pipeline did not provide an opportunity for either modification or saving of the alignments by the user. Analysis of the alignments responsible for producing specific models is therefore not possible.

      For some of the targets, the template-based modeling clearly improved sampling of various conformations and for others it did not. The authors only vaguely discussed this observation without providing a detailed analysis. For example, how were the template selected for the template-based modeling? Was the performance of AF2 dependent on the sequence similarity between the template(s) and the target? These are critical points that are needed to understand the utility of the approach and how one can adopt the proposed workflow.

      In response to a similar comment made by another Reviewer, we have expanded the relevant section in Methods regarding the use of templates. However, due to the relatively small size of this test set, a thorough quantitative analysis is likely not currently possible.

      A key conclusion of this study is that there is no one-model fits-all approach with AF2 for accurately sampling the conformational space of membrane proteins. Although this conclusion sounds plausible, the authors do not provide significant evidence to support it: they tested the performance of the models for a very limited set of parameters. For example, they only used a few MSA depths, and they do not report performances for templates with different similarities to the target. Also, is it possible that a "one-model fits-all" exists for particular folds or families? For example, LAT1 and MCT1 each represent very large protein families and a clear workflow for each would represent an important advance in the field.

      Per the recommendation of another Reviewer, we carried out a more rigorous analysis of MSA depths (see Figure 1 - figure supplement 1). However, these results support our general conclusion that there are too few proteins to confidently identify the optimal set of parameters for accurate prediction of multiple conformations. We have rewritten a sentence in “Concluding remarks”:

      "Thus, while the results presented here provide a blueprint for obtaining AF2 models of alternative conformations, they also argue against an optimal one-size-fits-all approach for sampling conformational space of every protein with high accuracy."

      How were misfolded models were identified? Providing a reference is not sufficient here. It is also stated that "padding MSAs with additional sequences had the desirable effect of decreasing the proportion of these models, it also limited the extent to which alternative conformations were sampled. Thus, our results revealed a delicate balance that must be achieved to generate models that are both diverse and natively folded. No general pattern was readily apparent regarding the ideal MSA depth required to achieve this balance.". While this is interesting initial observation, finding a pattern in the ability to detect those misfolded structures (for at least some folds or protein families) could increase the impact of the work.

      We have rewritten this paragraph and remade Figure S2 (now numbered Figure 1 - figure supplement 1) in response to a similar comment made by another Reviewer.

      In general, the definition of the different conformations is nuanced for each structural class and a better explanation is needed for those proteins that are discussed in greater detail. For example, when discussing one of these proteins, MCT1, the authors state: "One target, MCT1, was exclusively modeled by AF2 in either IF or fully occluded conformations regardless of MSA depth. Notably, these results closely parallel those reported by DeepMind during their attempt to model multiple conformations of LmrP in CASP14.". Could the authors elaborate on this statement? Could they provide quantitative data defining how occluded and open conformations are defined? Many of the readers are unlikely to know the LmrP example from a previous publication.

      We agree with this statement and have rewritten the paragraph to remove the reference to the CASP14 in this section.

      The authors evaluate the models on structures that were not included in the AF2 training set. It would be useful to provide the list of the PDB ids that were included in the training of the AF2 version that was used in this study. This is important because the structures of some of these proteins were solved a few years ago with minor differences, even though they were classified as a "different conformation". As mentioned in the point above, the definition of "different conformation" can be highly nuanced depending on the protein family and the mechanism used by the protein.

      We have edited the first paragraph of “Results and Discussion” to more explicitly state that the structures of the proteins used in this test set were entirely absent from the version of the PDB used to train AF2. This design decision was critical in allowing us to sidestep this question of whether the conformations of interest, or similar conformations, were present or absent from the training set.

      In the section "Alternative conformations cannot be predicted for proteins with structures in the training set", the results should be described in a more quantitative way. Specifically, the following statement should be accompanied by quantitative data: "virtually every transporter model superimposed nearly perfectly with the training set conformation, and none resembled the alternative conformation".

      Per recommendations made by another Reviewer, we have added metrics to quantify the similarity of these models to the training set conformers. This also allows us to establish the similarity of these predictions to those of MCT1.

    1. Author Response:

      Reviewer #2 (Public Review):

      This work aimed to advance knowledge of the roles of polycystin-1 and polycystin-2 (PC-1, PC-2) in the vascular endothelium. For this, the authors developed tamoxifen-inducible Cre-lox models to delete PC-1, PC-2 or both specifically in endothelial cells of mice. Evidence is presented that flow or sheer stress activates PC-1-dependent current in endothelial cells, which is associated with NOS and KCa channel activation, smooth muscle hyperpolarization, and flow-dependent vasodilation. The Jaggar laboratory has recently reported that deletion of endothelial PC-2, a member of the TRP family, leads to loss of flow-induced Ca2+ influx, NOS and SK/IK activation, reduced vasodilation, and higher blood pressure. Thus, the novelty of the current work is the finding that PC-1 is similarly critical for activation of this pathway by flow, and that it is a physical interaction between membrane-localized PC-1 and PC-2 that underlies complex activation by flow.

      Strengths of the current study include the use of powerful inducible knockout models in combination with a wide array of in vivo and ex vivo methods to test hypotheses. Thus, conclusions are based on multiple approaches and are mostly well supported. However, there are some concerns, specifically related to a lack of clarity on the interactions and purported interdependence between PC-1 and PC-2 that warrant further consideration.

      1.The prospective impact of the current study is based on the suggestion that interactions between PC-1 and PC-2 via coiled-coil domains are required for activation of inward current by flow. However, the authors did not show evidence, via fluorescence imaging or otherwise (e.g., coIP), that peptides generated to disrupt this interaction actually do so. Does treatment with the coiled-coil domain peptides cause a shift in the PC-1-to-PC-2 distance (using TIRF-SMLM as in Fig 5)?

      We have performed new experiments and now show that scrambled peptides of either the PC-1 or PC-2 coiled-coil domains do not alter flow-activated I_Cat in endothelial cells (Figure 6E-G, Figure 6 – figure supplement 2). In contrast, peptides corresponding to the coiled-coil domains present in PC-1 or PC-2 similarly inhibit flow-activated cation currents in endothelial cells.

      Multiple different domains in PC-1 and PC-2 physically interact to form the heteromeric complex. Several groups have demonstrated that PC-1 and PC-2 couple via their C-terminal coiled-coils (Qian et al, Nat. Genet. 1997; Zhu et al., PNAS 2011; Yu et al., PNAS 2009; Tsiokas et al., PNAS 1997). Recombinant PC-1 and PC-2 lacking coiled-coils also interact via N-terminal loops (Babich et al, JBC 2004; Feng et al., JBC 2008). The structure of a PC-1/PC-2 heterotetramer that lacked N- and Ctermini was resolved using cryo-EM and indicated that a region between TM6 and TM11 of PC-1 interdigitates with PC-2 (Su et al, Science 2018). As such, it is unlikely that that the coiled-coil domain peptides physically separate PC-1 and PC-2 subunits. Rather, these data suggest that coiled-coil domain coupling in PC-1 and PC-2 is required for flow to activate non-selective cation currents in endothelial cells. In response to your comment, we have expanded discussion of this point in the manuscript.

      2.The use of immunoFRET to test for PC-1/PC-2 proximity is not ideal. At minimum, proper negative controls (e.g., use of cells from KO models) should be provided to demonstrate the specificity of this technique for PC-1/PC-2 interactions in endothelial cells.

      We agree. As suggested, we have performed new experiments and now provide immunoFRET data for Pkd1 ecKO and Pkd2 ecKO endothelial cells (Figure 4B, C; Figure 4 - figure supplement 1A, B). These data show that N-FRET between PC-1 and PC-2 antibodies is extremely low in Pkd1 ecKO and Pkd2 ecKO endothelial cells.

      3.The authors conclude that PC-1/PC-2 clusters in KO cells in SMLM experiments are likely due to non-specific antibody binding. While I agree with this, it raises a question as to the meaning of cluster size data. Considering that the approach relies on fluorophore-tagged antibodies, which cannot be assumed to be in 1:1 stoichiometry with proteins of interest, how relevant is cluster size?

      This is an interesting question. All immunofluorescence techniques rely on the use of antibodies to tag proteins. We recognize that the size of clusters reflects the size of both the proteins and antibodies. We now include text in the Discussion stating that the size of the PC-1 and PC-2 clusters reported is the size of both the proteins and the antibodies.

      4.Based on data shown in Figure 1, the authors conclude that there is a reduction in inward current with flow. Since the applied technique measures total current, couldn't this result also reflect an increase in outward current (e.g., K+) due to flow that depends on the presence of Ca2+? Also related to these data, the magnitude of initial flow-induced transient current was quite variable (~8 - ~45 pA). Was this due to differences in cell size? The authors should consider expressing data from current recordings in terms of density (pA/pF).

      We agree that presenting data as current density is useful and now do so throughout the manuscript, including in figure 1. The results in figure 1 do reflect a flow-activated increase in K^+ current as this response is partially inhibited by apamin/tram-34 (Mackay et al. eLife 2020). We state that there is “a reduction in inward current” as the entire current range in these experiments is negative of 0 pA.

      5.Endothelium-specific deletion of PC-1 increased blood pressure, implying that the proposed role for PC-1 is generally applicable to the resistance arterial network; yet here, only small mesenteric vessels were studied. Given the known heterogeneity in the regulation of vascular tone by sheer stress among different arterial beds, is the identified role of PC-1 observed outside of the mesenteric circulation?

      We agree that the blood pressure phenotype in Pkd1 ecKO mice suggest that flowactivates PC-1 in endothelial cells of other vascular beds to induce vasodilation. We have now discussed this concept in the manuscript.

    1. Author Response:

      Reviewer #2:

      In this study, Jairaman et al used iPSC-derived microglia in which the AD-associated TREM2 gene has been knocked out to determine the impact of TREM2 loss of function on receptor-evoked Ca2+ signaling and chemotaxis. Cytoslic Ca2+ measurements were performed using the genetically-encoded ratiometric indicator Salsa6f previously developed by this laboratory. The authors made the critical discovery that loss of TREM2 leads to enhanced sensitivity and increased Ca2+ signaling of microglia to purinergic agonists, in particular to ADP. They showed that Store-operated Ca2+ entry in response to passive -maximal-store depletion by SERCA blockers was not altered in TREM2 KO cells. Rather, the enhanced sensitivity of the TREM2 KO cells was shown to originate from an upregulation of the purinergic receptors P2YR12 and P2YR13, leading to a left shift in EC50 of Ca2+ responses to ADP. The enhanced Ca2+ responses of TREM KO cells were associated with altered directional chemotaxis, whereby TREM2 KO cells showed enhanced displacement but reduced directionality. This phenotype was rescued with the application of P2YR antagonists in ADP-dependent chemotaxis assays. These results are novel, significant and of potentially broad impact to the pathology of AD. Although the molecular mechanisms of how lack of TREM2 leads to enhanced P2YRs is beyond the scope of this study, one moderate criticism of this manuscript pertains to lack of insights on how enhanced cytosolic Ca2+ leads to reduced directional chemotaxis and the potential effector proteins/pathways mediating this effect. Other relatively moderate issues and suggestions regarding controls have also been noted.

      We thank the reviewer for the positive comments and a thorough evaluation of the manuscript. We have addressed specific points related to the mechanism of purinergic Ca2+ signaling in TREM2 KO microglia. The issue of precisely how Ca2+ regulates chemotaxis differently in WT and TREM2 KO microglia, while no doubt a very important question, would involve a comprehensive evaluation of how Ca2+ affects different proteins involved in microglial motility and is best suited as part of a follow up study.

    1. Author Response:

      Reviewer #1:

      The manuscript by Piccolo and colleagues employs an in vitro neuruloid system to investigate the role of Hippo/YAP signaling pathway in early ectodermal fate specification. The authors examine YAP expression in forming neuruloids and test how manipulation of Hippo/Yap signaling affects their cellular composition. They observe that YAP expression is dynamic and enriched in cells occupying periphery of the neuruloid. Overactivation of the YAP activity by the Lats-kinase inhibitor TRULI leads to an expansion of TFAP2A+ cells (NNE) at early stages and of KRT18+ cells (epidermal) at later stages of development. Accordingly, the authors propose that YAP acts as a lineage determinant that (i) promotes a NNE fate during early development and (ii) impacts the fate of NNE cells by promoting an epidermal instead of a neural crest fate. Finally, the authors report that neuruloids developed with cells harboring mutations characteristics of Huntington's disease display elevated Yap activity.

      The study takes advantage of the neuruloid system to examine the role of Hippo-Yap in early development and disease. A strength of the study is the use of the neuruloid as a proxy for the human embryo, which allows the authors to examine the control of spatial patterning in early development (in both wild type and altered cellular states). Yet, this model also presents significant limitations. Some of the results indicate a high degree of variability in YAP activity (and ectodermal patterning) in neuruloids obtained from different inductions. This raises the concern that the neuruloid system may interfere with Hippo/YAP. Furthermore, the model proposed by the authors is not consistent with the functional manipulations with pharmacological agents (e.g., pharmacological activation of YAP results in an increase of both neural and NNE cells; inhibition of YAP does not result in the expected phenotypes).

      We thank the reviewer for her/his compliments on our work. The reviewer also points to the limitations of our neuruloid models and asks for clarifications.

      The authors propose that YAP activation promotes a non-neural ectodermal (NNE) fate in early neuruloids, and subsequently drives NNE to differentiate into epidermis. However, manipulation of Hippo signaling with pharmacological inhibitors does not entirely support this, as treatment of neuruloids with agonist TRULI leads to expansion of both the PAX6 neural population and the NNE Tfap2a population. A prediction of the model is that treatment with verteporfin should neuralize the organoids, which is not the case (Fig 6A). This disconnect between the model presented in Figure 6D and the experimental results should be addressed by the authors.

      We would like to thank the reviewer for this request. In our experiments we observed a dual effect of YAP activation (or HD mutation). As noted by the reviewer, ectodermal lineage- specification occurs both early (increased NNE induction) and late (enhanced epidermis differentiation and contraction of NC differentiation). Moreover, we observed a structural consequence of increased YAP activation in neuruloids, failure of the NE domain to fully close. Following the reviewer suggestion, we have now included an additional panel in Figure 6 to illustrate the phenotype alongside the difference in ectodermal lineage specification (panel E). We have also added in the Discussion a paragraph that highlights the architectural aspect of the observed phenotype.

      Regarding the interpretation of the effect of pharmacological inhibition of YAP, we believe that the result of verteporfin treatment on WT neuruloids indicates that YAP activity is not required for this specification but can skew the differentiation towards NNE and epidermis. This is now included in the Results, and a new paragraph has been added in the Discussion directly addressing this point.

      The study at times conflates YAP expression with activation of the Hippo-YAP pathway. While the images in figures 1,2, and 4 show changes in YAP expression, confirmation of Hippo-YAP pathway activity should include the use of a reporter (e.g., HOP-Flash) or at least high magnification images showing translocation of YAP to the nucleus. Overall, inclusion of better quantification of YAP-activity is crucial to support the manuscript's conclusions (the authors should also state the number of micropatterns used in each quantitative experiment).

      Our evidence correlating YAP nuclear localization with activity is based on: (i) Immunoblots (Figures 1D and 2B); (ii) Confocal image analysis (Figures 1E, 2D, and 4B); and (iii) Induction of YAP target-genes expression as demonstrated by our scRNA-seq analysis, occurs in same epidermal (KRT18+) lineage cells that display the highest levels of YAP nuclear accumulation (Figure 2). However, to strengthen this argument and following the reviewer’s advice, we have now added magnified confocal microscope images of YAP/DAPI staining used to measure nuclear YAP localization at D4 (Figure 1—figure supplement 5). We have also added a slowed and magnified videos of the YAP-GFP/H2B-mCherry (and YAP-GFP alone) at D3-D4, which illustrates the dynamic accumulation of YAP in the nucleus of cells upon BMP4 stimulation (Figure 1—video 2, Figure 1—video 3, Figure 1—video 5, Figure 1—video 6, Figure 4—video 2 and Figure 4—video 3). Finally, the number of colonies analyzed for each experiment is now added in the Figure Legends.

      A limitation of the study is that it does not investigate the possibility that Hippo/Yap could be affecting cell proliferation in the different lineages, instead of acting as a cell fate determinant. This is particularly important since Hippo is affected by cell density, which varies from the center to the periphery of the neuruloid. Different rates of proliferation over several days could potentially lead to drastic changes in neuruloid cellular composition.

      To address the reviewer’s legitimate point, and assess to potential effects of YAP activation in HD-neuruloids, we performed three sets of experiments. First, we performed RNA-velocity analysis to determine the cellular trajectories within each lineage (FigureXA, below), and calculated the velocity of Seurat’s “cell cycle-associated” genes in each cell population in our scRNA-seq dataset at D4. This analysis indicates that the three ectodermal progenitors have a comparable rate of cell division, with NE being slightly faster than the others and epidermis being the slowest (Figure XB). However, these differences are subtle: the mean velocity of these cell-cycle genes within each population are not significantly different across the three ectodermal lineages (FigureXC). Second, comparison of velocity values between WT and HD, highlighted a significant HD-associated increase in the dynamic of cell-cycle associated genes only within the NE population (FigureXD), consistent with the observation that YAP is ectopically active in this lineage. This increase is also not very dramatic, for the mean velocity of these genes is not significantly different in any comparison at this time (Figure XE).

      Figure X. Proliferation rate analysis of D4 neuruloid from scRNAseq dataset. A) transcriptional trajectories were identified in the three ectodermal lineages. B) Velocity of cell cycle associated genes show that NNE lineages (NC and E) are slightly faster than NE. C) However this is not significant the mean population level. D) NE in HD D4 neuruloids display subtle increase in the velocity of cell cycle associated genes. E) Such effect disappears at the mean population level.

      Finally, quantification of the number of mitotic nuclei per colony as marked by phospho-histone H3 (Kim et al., 2017) at different time points, demonstrated that YAP activation by TRULI leads to an increase in cell proliferation, especially in late neuruloids. This evidence is now presented in the new Supplemental Figure 4—figure supplement 3. We thank the reviewer for bringing this point to our attention.

      It is also important to note that our study does not suggest that YAP is a bona fide cell-fate determinant, but rather that that the global phenotypic signature of YAP activation is influenced by differential regulation of cell-cycle dynamics. Moreover, inasmuch as YAP inhibition with verteporfin does not effect neuruloid formation, we believe that YAP is more of a booster signal operating on top of differentiation programs.

      The results of the study contradict a previous reports, and some of these contradictions are not sufficiently addressed. The authors state that the activation of YAP in culture leads to a "complete loss of NC-like SOX10+ colonies"; however, a number of studies in in vivo models support a role for YAP as a positive regulator of neural crest specification.

      We thank the reviewer for pointing to the results observed in model systems. We have now included a paragraph in which we acknowledge that YAP has been previously associated with NC specification and survival. However, it should be noted that these conclusions are based on data obtained from non-human model organisms such as Xenopus, or relied on differentiation protocols that are independent of BMP4 stimulation. We believe that the phenotype of unbalanced specification of the NNE depends on an epistatic relationship between BMP4 and Hippo-YAP pathway, which might play a crucial role during human neurulation.

      Furthermore, the authors briefly speculate on the finding that Huntington's disease neuruloids have high YAP activity (whereas tissues from patients have low activity), but there is no real clear link to the pathophysiology of the disease.

      In our in vitro assay that recapitulates aspects of human neurulation, we observed an early increase (D4) followed by a later decline (D7) in YAP activity associated with HD mutation, which is comparable to a dysregulation of the Hippo pathway that was observed in HD patients. To better clarify this aspect and its potential implication during embryogenesis we have now expanded our Discussion on the possible connection between HD and embryonic development.

      Experimental results presented in different figures are often inconsistent throughout the manuscript. This should be examined by the authors since it suggests a lack of reproducibility in the neuruloid protocol. For example, the expression of TFAP2A at D4 neuruloids is a sparse halo at D4 in Fig4D, but robust in Fig1E.

      The reviewer is correct in pointing to a certain degree of variability between experiments, especially during the period (D4) when the first NNE lineage begin to emerge (i.e., Supplemental Figure 4—figure supplement 2). Because parallel experimental conditions such as comparison with HD samples or TRULI treatment show consistent trends, however, we believe that our interpretation of these results is fundamentally correct.

      The western blot in fig1D shows bands for tYAP and pYAP at D4, while in Fig2B the bands are not present (Fig1D also shows double bands for both markers while fig2B presents single bands).

      There are several splicing alternative isoforms of human YAP (Vrbský et al., 2021). Immunoblots for YAP in YAP-GFP biallelically tagged cell lines (Figure 1—figure supplement 1B) show that two isoforms are detectable at pluripotency. During neural induction (D1-D3) both isoforms are downregulated, and upon BMP4 stimulation the larger isoform (top band) is primarily upregulated, so that from D4 onwards only the top band is visible (Figure 2B). To better clarify this point, we now discuss this in the Results and include Supplemental Figures with the quantification of the top and bottom bands (D1-D4, Figure 1D) and only of the top band (D4D7, Figure 2B and Figure 1—figure supplement 4).

      As Hippo responds very quickly to cell density, mechanical forces, etc., these inconsistencies could affect the proposed analyses.

      As previously mentioned, we have assessed the effect on proliferation rate due to YAP activation by TRULI or HD mutation in neuruloids by scRNA-seq analysis and by counting the number of mitotic cells at different times. Our manuscript leaves open the relationship between HTT mutation and YAP hyperactivation, which likely is mediated in part by these factors, but we do address possible connections in the discussion.

      Reviewer #2:

      This manuscript by Piccolo et al identifies YAP signalling as key player in lineage determination during development of early human ectoderm. Additionally, the authors show that neuroloids generated using cells engineered to express penetrant levels of CAG repeats in the HTT gene display aberrant YAP signalling during ectodermal specification and that this phenotype can be partially rescued by inhibition of this pathway. This is interesting study and the similarity of the YAP-activated neuroloids and the HD neuroloids is striking. The value of this work would be increased by providing experiments to definitively demonstrate the role of YAP signalling in NNE specification and in HD neuroloids.

      We also thank this reviewer for her/his compliments on our work. The reviewer also expresses specific recommendations listed below:

      Specific comment: The authors describe the emergence of non-neuronal ectoderm (NNE) at the edges of the printed island cell colony and neuronal ectoderm (NE) within this circular colony. However, they do not show images of any lineage markers confirming that these regions are, in fact, NNE and NE.

      We show in Figure 1E that the edges of the neuruloids are positive for TFAP2A, a marker for the NNE lineage. In Figure 4D we also show TFAP2A at the edge (NNE) and PAX6 at the center (NE). Additionally, the spatial identity of the various ectodermal lineages was full characterized in our previous study (Haremaki et al., 2018).

      They also don't show that this YAP-GFP cell line recapitulates endogenous fix-and-stains of YAP in these colonies.

      Figure 1E shows YAP expression at D4 by immunolabeling for YAP/DAPI acquired by confocal microscopy, which recapitulates that of immunofluorescence detection of nuclear YAP , shown in Figure 4B , and the results obtained by live fluorescence (YAP-GFP/H2B, Figure 4A).

    1. Author Response:

      Reviewer #2:

      Weaknesses:

      The competition assay used in this study may not truly reflect the competitiveness of SSIMS males. The mating assay used 20 virgin WT females and 4 males (including both WT and SSIMS), resulting 5:1 sex ratio so the males are not really competing for females. A more competitive ratio (such as WT females: WT males: SSIMA males at 1:1:1) should be designed to address this. Also, the sperm competition assay mixed the mated WT females with SSIMS males for 12 days, allowing plenty of time for the females to remate with these males. Therefore, it's more like a sperm replacement assay rather than competition assay. The authors should either repeat it with a strict time control, or soften their statements for sperm competitiveness.

      We have repeated the experiment at a 1:1:1 ratio as suggested. The new results are reported in the revised Figure 3. It is not clear to us how the timing of the mating experiments differentiates sperm competition versus sperm displacement, but we agree that sperm displacement is a better term to describe what we did. We have repeated the sperm displacement experiment with strict time control based on several published literature precedents and describe the results in the revised manuscript.

      Some necessary information or statistics are not shown or mis-presented. For example, the alternative splicing diagram in Figure 1c likely was taken from the original transformer gene, but here it's the tTA gene so the male intron should be removed since it's not in the construct;

      We have revised text in the manuscript to clarify some of these points. First of all, the male intron is still in the construct, even though we fused the intron to the tTA gene. The alternative splicing between males and females is caused by use of alternative 5' splice sites, which means the intron that is spliced out in males is just a smaller section of the intron that is spliced out in females. Use of an alternative 5' splice site in males means that a protein-coding sequence with multiple stop codons is incorporated to the mature mRNA. We do not support the precise splicing mechanism with empirical data in this paper, but this has been done in a number of previous publications (https://doi.org/10.1016/j.ibmb.2014.06.001; https://doi.org/10.1371/journal.pone.0056303).

      Because the construct works as predicted (100% female lethality in the absence of tetracycline), and we did not change the genetic design in a way that would impact the mechanism of female lethality, we think there is little reason to believe that the splicing is occurring in a different way.

      the panels of Figure 2 were not consistent to the legend and confusing; the statistics for different tetracycline concentration tests were not shown in Figure 2 or text to answer their hypothesis "(to) optimize rearing of SSIMS stock, …..we titrated Tet in the food";

      We re-wrote the text describing Figure 2 to make the results more clear. We clarified in the legend that the symbol signifies p<0.0001 (we were not trying to imply that all experiments had this level of significance, only the ones marked with the symbol in the figure). We removed the word ‘optimize’ from the main text. Optimization was not the true aim of the experiment, and as the review points out, we did not statistically determine an optimal concentration of Tet. Our main goal was to show a dose- dependent response in the number of females surviving on Tet-free medium, which the data supports and which does not require statistical support.

      Figure 3b shows 5-8 day old females were used but in the text it's 5-6 day, and it didn't mention the duration of the first crossing and time lag until the second crossing which are critical in such experiments; the conclusion and statistics for Figure 3c among tests with mixed males should also be mentioned.

      We have corrected the figure (now Figure 3c) to indicate that the females were 5-6 days old. The first mating was for 5-6 days and there was no lag time between being co-housed with different males. We have performed multiple new experiments in revision that have been added to Figure 3. We have revised the discussion of these new experiments (and how they relate to the originally performed experiments) in the revised submission.

      The discussion is largely towards the merits of SSIMS but missing some key points that might decide how it can be translated into applications or transferred to other species. First, the actual basis for tTA lethality that employed in this study is still unknown which is subject to suppression by a pre-existing inherent variation in the targeted field population. The very phenomenon may also be true for any gene-overexpression-based lethality including EGI lines generated here. Second, the complete penetrance observed from the relatively small sample size here can be hardly used to predict field or mass-rearing condition. Previous study showed that mutations in such lethal construct could occur at a one out of 10,000 frequency, and typical SIT program release millions of sterile insects every week. Third, while the authors claimed SSIMS is "one of the most complex engineered systems in insects", they also proposed that "the genetic design is likely to be portable to other species" without mention any potential obstacles along the way. Therefore, efforts should be made to give full picture of SSIMS including rain and sunshine.

      We have added discussion of possible failure modes for this genetic biocontrol approach to the discussion section. We have also added text to discuss how the complexity of SSIMS is a potential obstacle to its translation to non-model organisms.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript is a follow-up of an earlier manuscript using the LRET technology, but extends the study by identifying a new "open" state and using experimental distance constraints to provide molecular models of the different states. All in all, the manuscript is well written, the experiments are described in sufficient details and experiments are done to high quality with the appropriate controls. The data corroborate the partially open state as published early, but extend the study to a second, open state. It is very good to see that the observed states are not only present in the catalytic head but the authors also use the full-length protein and find similar states. However, in the present manuscript, I find the conceptual advance with respect to the mechanism of MR somewhat limited. The authors curiously do not include any DNA in their structural studies, so the observed states are only relevant for the free MR complex, but not the complex "in action" bound to DNA where quite different conformations might occur. As one consequence, the structurally proposed states do not directly correlate with the functional nuclease states that are necessarily bound to DNA. Perhaps as a consequence, in the author's model, Rad50 is merely a gate-keeper for Mre11, but this is not the case as recent structural work shows that Rad50 forms a joint DNA binding surface with Mre11. Likewise, biochemical studies are done with physiologically unclear/less relevant 3' exonuclease activity only, but not with the physiological important 5' endonuclease activity. In my opinion, it is important for a publication in a journal with the scope of eLife and addressed to a broad audience to provide structural analysis in the presence of DNA and validate the structures using the endonuclease activity.

      We thank the reviewer for these comments.

      Specific recommendations:

      1) Instead of using the physiological unclear exo activity, I suggest to use the more relevant endonuclease activity to validate the mutants.

      We now include plate- and gel-based endonuclease activity assays, using a variety of DNA substrates, for all of the validation mutants. We have expanded Fig. 3 and included a new Supplemental Fig. S4 to show this data. We have expanded the Results section of the modified manuscript to present and discuss these findings.

      2) Since the authors mutated one side of newly identified/proposed salt-bridges, I also suggest to test whether a charge reversal on both sides of the salt bridge rescues the phenoptype. I find this important because MR has quite many conformations, and mutating a single residue might not unambiguously validate the proposed conformation, a rescue by a charge reversed salt bridge is much stronger.

      We thank the Reviewer for this suggested experiment, and we tried to do it. Although we were successful in generating each of the charge reversal mutations in full-length Rad50, all of the mutants unfortunately had issues with either expression or purification. For example, the 6x His-tag for several of the new Rad50 mutants was not accessible to the TEV protease for cleavage indicating that the mutated proteins were mis-folded (the His-tag of the WT full-length Rad50 is readily cleaved off by TEV). As such, we did not feel confident using these proteins in subsequent MR activity assays.

      3) Since all LRET experiments are done without DNA, the authors do not capture relevant DNA processing states and comparison of structural (w/o DNA) and biochemical data (w/ DNA) is not really justified, in my opinion. Also, they might miss critical conformations. Is there a technical reason for not including DNA in the LRET studies?

      We have collected LRET data on ATP-bound MRNBD in the presence of a hairpin DNA or a ssDNA as substrates. We still observe three states in the presence of both DNAs; however, the open conformation appears to be slightly more compact (i.e., closer distance between Rad50NBD protomers) in the presence of ssDNA. As described above, we have added to the Results section of the modified manuscript and included a new figure (Fig. 4) describing these data.

      4) If the authors want to claim processive movement coupled to partially open/open state interchanges, they should provide experimental evidence. Where would the energy come from for such a movement, this is not clear from the model?

      On the surface, ATP hydrolysis by Rad50 would seem to be the perfect source of energy for the conformational changes that drive the sequential and/or processive nuclease functions of the MR complex. However, the D313K mutant is not as good at ATP hydrolysis as the wild type enzyme (Fig. 3E), and the data in Fig. 3 and Supplemental Fig. S4 clearly demonstrate that D313K is by far the best nuclease. If the free energy for the movement does not come from ATP hydrolysis, where else could it come? Richardson and co-workers measured a release of -5.3 kcal mol-1 (-22.17 kJ mol-1) of free energy for the hydrolysis of a DNA phosphodiester bond (Dickson, K.S. et al. 2000 J. Biol. Chem. 275:15828–15831). Thus, the free energy released from the Mre11 nuclease activity could be the driving force for the conformational changes we propose. We have made this point in the Discussion of the revised manuscript.

      5) The SAXS data for the "open" state do not validate the model, in my opinion. Experimental data and model are not inconsistent, but the curve looks to me as if the open state is perhaps much more flexible (i.e. an ensemble) or extended? Please comment.

      We agree with the Reviewer on this point. We have updated Fig. 5A (original Fig. 4) to include the two-state fits to the experimental SAXS data. Although the multi-state fit to the apo MR SAXS data is better than any of the single model fits (2 = 1.05 vs. 1.26, respectively), the 2 is still larger than the multi-state fits to the ATP-bound MR SAXS data. Thus, an additional unobserved conformation (perhaps the so-called “extended”) might be present in solution for apo MRNBD. We have added a sentence to the revised manuscript with this point.

      To explore the possibility that the previously described “extended” structure might be contributing to the SAXS data, we built a model of the extended conformation of Pf MRNBD based on the Tm MRNBD structure (PDB: 3QG5) and used Rosetta to connect the coiled-coils and add the linker to the Mre11 HLH. When this model was used in the FoXS calculations for the apo SAXS data, the 2 was 4.77 (versus 2 of 1.26 for the “open” model). The MultiFoXS two-state fit gave 90% open + 10% closed (2 of 1.04), whereas the three-state fit gave 65% open + 20% extended + 15% part open (2 of 0.84). Thus, there is some improvement when using the extended model, but since that model is not measurable in our LRET experiments and we are unsure of its validity as we have modeled it for Pf MR, we have chosen to omit it from the analysis.

      6) Distance errors for the full complex are much smaller than those for the catalytic module only (Fig. 1d). Does that mean that the full complex is more rigid, please comment?

      From looking at the data presented in Fig. 1D, it is logical to suggest that the full-length complex may be more rigid or better defined by the LRET data. However, we note that there are nearly as many distance errors which are similar between MRNBD and MR as there are MR errors less than MRNBD. And although many are not identical, most are of a similar magnitude. Because of this, we do not think the variations in LRET errors are systematic (i.e., related to a more rigid full-length complex).

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary

      The authors have discovered and characterized a novel genetic pathway responsive to hypoxia, which acts in parallel to the canonical response through activation of Hypoxia-Inducible Factor (HIF). Specifically, the authors discovered that the Caenorhabditis elegans nuclear hormone receptor NHR-49, ortholog to mammalian PPAR-alpha, is essential for survival under hypoxic conditions and regulates target gene expression that is hif-1-independent; identifying an essential role of autophagy. Further the authors discover both positive and negative regulators of NHR-49 and a putative feedback loop.

      Overall analysis

      The genetic analysis conducted by the authors is outstanding. However, the study is lacking in a few key areas and the authors may have over-interpreted results in a few places, which diminishes my overall enthusiasm. These concerns are addressable and doing so would greatly strengthen the manuscript. I highlight individual major concerns below, and save minor concerns and specific suggestions for private recommendations for the authors.

      Major concerns

      1 The authors have provided strong genetic evidence for a parallel mechanism to canonical HIF-1 activity in response to hypoxia. The authors should more rigorously test whether there is evidence for cross-talk between the two mechanisms. In the discussion the authors' highlight findings in mammals that support this possibility. For example, does loss of one lead to hyperactivation of the other in an attempt to compensate for hypoxia?

      We thank the reviewer for suggesting these interesting experiments to examine cross-talk!

      Specific examples:

      • In regards to lines 425-426, does loss of hpk-1 stabilize HIF-1 (or does hpk-1(oe) repress hif-1)?

      We attempted to study HPK-1–HIF-1 cross talk via GFP imaging of the UL1447 HIF-1::GFP strain after hpk-1 RNAi (Figure R4, below). However, although we did observe an increase in GFP levels in hypoxia (vs. normoxia), we did not observe nuclear localization, possibly due to the rapid degradation of HIF-1 in normoxia, which occurs inevitably during our experimental procedure. We therefore opted not to include these data in the manuscript.

      Figure R4: Regulation of HIF-1::GFP. Quantification of GFP levels in UL1447 (unc-119(ed3) III; leEx1447 [hif-1::GFP + unc-119(+)) adult animals expressing HIF-1::GFP. Animals were fed EV RNAi or nhr-49, hif-1, hpk-1, or nhr-67 RNAi as indicated and exposed to 4 hr of 0.5% O2 without recovery (three repeats totalling >30 individual animals per strain). X/, XXX,**** p <0.05, 0.001, 0.0001 (two-way ANOVA corrected for multiple comparisons using the Tukey method).

      • Does loss of hif-1 or nhr-49 alter the expression, stability, or activity of the other (either under normoxic or hypoxic conditions)?

      We appreciate the reviewer’s interest in examining the interaction between nhr-49 and hif-1. To address this, we generated an NHR-49::GFP;hif-1(-) strain and analysed it by imaging after exposure to normoxia or hypoxia. Although loss of hif-1 does result in a slight whole-body up-regulation of NHR-49::GFP, this increase was not significant (new Figure 2—figure supplement 1C, D). Higher magnification images did not show a tissue-specific effect in NHR-49::GFP increase in the hif-1(-) background either (new Figure 2D, Figure 2—figure supplement 1E, F). For reasons mentioned above, the HIF-1::GFP;nhr-49(RNAi) experiment was inconclusive.

      • Can overexpression of either hif-1 or nhr-49 rescue the developmental defects caused by loss of the other (i.e. overexpress hif-1 in nhr-49 mutant animals, and vice versa).

      With the new NHR-49::GFP;hif-1(-) strain, we were able to study compensatory effects of overexpressing NHR-49 in hif-1 mutants by performing embryo hypoxia survival experiments (new Figure 2E). Excitingly, while NHR-49 overexpression does not provide enhanced hypoxia survival at baseline (vs. non-GFP siblings), NHR-49 overexpression rescued the deficiency of hif-1 mutants. This suggests that nhr-49 can partially compensate for loss of the hif-1 pathway. Testing whether HIF-1::GFP overexpression rescues nhr-49 loss requires non-GFP sibling controls. Although the UL1447 strain expresses HIF-1::GFP from an extrachromosomal array, in our hands, we never observed non-GFP worms (i.e. 100% HIF-1::GFP offspring), and therefore were unable to test whether HIF-1 overexpression compensates for nhr-49 loss.

      • Does NHR-67 negatively regulate hif-1 (specificity to NHR-49)?

      As noted above, we were unfortunately unable to conclusively assessed HIF-1::GFP levels, likely due to rapid degradation during the normoxia that occurs during animal harvest.

      2 The role of autophagy in hypoxia should be explored in greater detail. While the evidence presented by the authors clearly demonstrates autophagy is essential for hypoxic survival, autophagy is an important component of many biological processes. Thus, it's critical to distinguish whether autophagy is merely required (perhaps for very indirect reasons) or whether autophagy is a part of an adaptive response to hypoxia. The authors (Miller lab) previously failed to find a role for autophagy in hypoxia (Fawcett et al. 2015 Aging Cell), which should be addressed. Has autophagy been previously linked to hypoxia in C. elegans? The novelty of this discovery should be discussed in greater detail.

      We appreciate that the link of autophagy to hypoxia survival needed to be examined further. We now provide substantial new evidence showing that not only are autophagy genes and autophagosome formation induced in hypoxia, but also that mutations in autophagy genes result in hypoxia sensitivity. In our opinion, this strongly supports a key role for autophagy in hypoxia adaptation.

      We note that the study by Fawcett et al., 2015 studied only two genes in hypoxia, bec-1 and unc-51, none of which were found to be regulated by hypoxia in our RNA-seq analysis. Another study from the Miller lab found that an 18-hour anoxia exposure of L2/L3 stage C. elegans results in a significant induction of autophagy in the intestine (Chapin et al., 2015; Fig 4B, C). Although the conditions in this study are different than in ours (anoxia vs. hypoxia, exposure time, animal developmental stage), this study, like ours, thus finds that that low oxygen availability induces autophagy. Besides the Miller lab, there are several other publications that show an important role for autophagy in hypoxia adaptation across species (Samokhvalov et al., 2008; Zhang et al., 2008). Especially relevant to our manuscript is a recent paper published while we were revising our manuscript, which shows that autophagy gene induction is HIF-1 independent in Drosophila melanogaster (Valko et al., 2021). This agrees well with our exciting new discoveries. We have revised the text to better discuss this context.

      3 The authors have possibly over-interpreted their results in Figure 4B and the possibility that NHR-49 acts cell non-autonomously. The authors speculate that tissue specific genetic rescue by NHR-49 over-expression could indicate the existence of a signaling molecule (line 499). Ectopic over-expression of a transcription factor within one tissue is always tricky to interpret, as it may not be physiologically relevant, which I fear may be the case as rescue is achieved when NHR-49 is over-expressed within any tissue (i.e. there is no specificity). An alternative explanation, which is a more indirect model, is that NHR-49 over-expression shifts metabolism within a tissue to generate metabolites that are released throughout the organism to sustain it during hypoxia.

      We thank the reviewer for this excellent point, and agree that indirect action of NHR-49 remains a possibility. We have added discussion to this point in the revised manuscript.

      4 As an extension of MC#3, the authors demonstrate that NHR-49 is induced throughout the animal after hypoxia (Figure 5A). Presumably sites of NHR-49 induction (tissues) equates to the sites where nhr-49 is necessary. However, the images within 5A cannot be resolved to identify individual tissues, higher resolution images are necessary and quantification of GFP expression within individual tissues could lend biological insight.

      We now provide higher resolution images of NHR-49::GFP in Figure 2D, Figure 2—figure supplement 1E, F.

      5 The gene expression analysis is lacking details. For example, the RNA-seq data shown in Figure 3A&B is confusing. The numbers in the text do not match the figure and it is unclear whether the intersection in the Venn Diagram represent inverse relationships (i.e. the proportion of genes that are upregulated in wild-type that are either hif-1 or nhr-49 dependent). Greater detail and explanation is needed, as presented little biological insight can be discerned from the Figure 3A&B. Next, qRT-PCR validation of autophagy gene expression found in Figure 3C should be provided with that result. Lastly, are there existing datasets for changes in gene expression of C. elegans exposed to hypoxia? If so, how do the datasets compare?

      We apologize for the confusion and have revised our text describing the RNA-seq analysis as well as the Figure legend. We also provide validation of the RNA-seq data with GFP-reporters and have compared our dataset to a previous study on hypoxia dependent gene regulation in C. elegans.

      6 The authors identify a putative negative feedback loop between NHR-67 and NHR-49, and suggest this regulation is at the protein level (Figure 5F,G) based on a translational reporter and not transcriptional regulation based on qRT-PCR results and similar results previously found with hpk-1 (Figures S5A, 7a, and a previous study). However, the authors should more rigorously rule out dynamic changes in expression between tissues that cannot be ascertained by qRT-PCR (i.e. test whether nhr-49p::GFP expression is altered after nhr-67(RNAi) +/- hypoxia.

      We agree and have more rigorously studied this interaction.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript by Gaubitz et al. reports structures of the yeast clamp loader (RFC)-sliding clamp (PCNA) complex in 6 different states during the clamp loading cycle. Although structures of yeast, human, E. coli and T4 clamp-loader-clamp complexes have been determined previously in various states, a major advance of the authors' work is to obtain structures of distinct intermediates in a single system. These structures provide a detailed description of the conformational changes in both RFC and PCNA during clamp loading, explaining ordered PCNA and primer/template DNA binding by RFC, the mechanism of clamp opening and closing, and the regulation of RFC's ATPase activity. In addition, the structures reveal differences in the mode of primer/template recognition between yeast and T4/E. coli clamp loaders. RFC melts the final base pair of the primer/template duplex using a separation pin in RFC-A, which is not seen in T4 or E coli clamp loaders. The authors confirm this interesting and unexpected observation biochemically. Although the authors speculate this mechanism could be used for distinguishing primer/template substrates from other DNA structures, the physiological significance of DNA melting and base flipping by RFC remains unclear. Overall, the findings reveal new nuances of the clamp loading cycle but the manuscript could be strengthened by solidifying the importance of base flipping for substrate recognition and RFC function.

      We thank the reviewer for their support. As mentioned above, we report new experiments examining the role of base-flipping residues on DNA binding and cell physiology (Figure 6 – Figure Supplements 2&3)

      Reviewer #2 (Public Review):

      In this study, Gausman et al. use cryo-electron microscopy to elucidate structures of complexes between the eukaryotic clamp loader (RFC) and its ligands, the DNA polymerase processivity clamp (PCNA) and DNA. Clamp loaders and clamps are required for DNA replication and repair in all domains of life. Understanding of the molecular mechanisms of clamp loading is not only important for DNA replication and repair, but also because clamp loaders are members of a larger group of motor proteins which are critical to many aspects of cellular metabolism. To date, our structural understanding of clamp loader mechanisms is based on comparison of structures for different clamp loader-ligand intermediate complexes from a variety of organisms including E. coli, yeast, bacteriophage, and humans. This paper presents the first structural data for multiple clamp loader-ligand intermediate complexes from a single organism, Saccharomyces cerevisiae, and sheds new light on protein-ligand interactions. Importantly, this work highlights structural features of the clamp loader that give rise to the order of ligand binding where the clamp loader binds and opens the clamp before binding DNA.

      To capture clamp loader ligand complexes, RFC was bound to the slowly hydrolyzable ATP analog, ATPγS, and intermediate complexes were further stabilized by protein crosslinking, predominantly intramolecular crosslinking of RFC subunits. Two types of RFC-PCNA complexes were observed, one in which the PCNA ring closed and a second where it is open. A family of closed complexes was observed in which three of the five RFC subunits contact the surface of the PCNA ring. Rigid body modeling suggests that this closed complex is dynamic such that the plane of the ring 'swings' relative to the clamp loader to potentially allow all five clamp loader subunits to engage the clamp to open the ring. In the open complex, the diameter of the complex expands and the opening in the PCNA ring is large enough to allow ds DNA to enter the ring and the chamber formed by the RFC subunits. A large hinge-like conformational change in the RFC-A subunit on going from closed to open complexes creates a channel for the ssDNA template to bind and exit the chamber. These remarkable structures show that the clamp loader is not in a suitable conformation to bind DNA prior to forming an open clamp complex which favors clamp binding before DNA binding.

      This manuscript provides remarkable insight into intermediate complexes that exist in the clamp loading reaction pathway. Having a family of structures for a single clamp loader and clamp provides a clearer picture of and highlights differences in clamp loading mechanisms from different organisms. Overall, this work well done, but perhaps some of the mechanistic conclusions drawn from static structures should be viewed with caution in the absence of rigorous dynamic or kinetic approaches.

      1) Crosslinking the proteins to stabilize intermediates could potentially bias the pool of conformations that are observed.

      We agree that crosslinking can bias the population of conformations observed. That is one of the reasons why we refrain from interpreting the number of particles in each class as being representative of the actual population of that intermediate.

      2) A statistical analysis of the differences in the ATPase activities of wild-type and mutant clamp loaders would be helpful to determine whether the mutations have an effect on the activity. Moreover, steady-state ATPase activity was measured in this experiment and these rates may not reveal differences in rates of intermediate steps in the clamp loading reactions. For the mutations to affect the ATPase activity, they would have to either change the rate of the rate-limiting step in the pathway or change the identity of the rate-limiting step. Thus, the decrease in ATPase activity for W638G mutant could be interesting if statistically significant.

      As mentioned above, we now report this analysis and interpretation in more detail.

      3) Given that Phe-582 and Trp-638 seem to be important for binding DNA at the 3' end, an analysis of the effects of mutations to these residues on DNA binding activity would be informative.

      As mentioned above, we report experiments examining these residues on DNA binding (Figure 6 – Figure Supplement 1) and new experiments measuring ATPase activity in the presence of various DNA substrates (Figure 6 – Figure Supplement 2).

      4) Kinetic data in the literature support a mechanism in which the clamp loader hydrolyzes ATP prior to clamp closing. In the absence of supporting kinetic data, it may be overinterpreting structural data to assert that the clamp loader need not hydrolyze ATP prior to closing the clamp.

      As mentioned above in detail, we agree and have toned down the interpretation.

    1. Author Response:

      Reviewer #1:

      Suction feeding is recognized as a nearly ubiquitous prey capture mode in fishes, and the hydrodynamics of these flows are reasonably understood. Provoni et al. deal with a role that is as important but much less understood, i.e. the role of these flows in intra-oral prey transport. Specifically, they ask how the flows within the buccal cavity can help transport the prey into the mouth. The major obstacle to understand these flows is that they are internal, so it was difficult to quantify them.

      Here, the authors developed and used a technique that enabled tracking of tracer particles that are smaller and less dense than the prey, thereby improving our ability to quantify the flows. They show that the suction flows can be directed towards the esophagus at least in one of the species they used, and that repeated bidirectional flows can be used to redirect particles trapped at the branchial basket towards the esophagus. In doing so, they highlight the role of the suction flows in transport of food, providing a possible explanation to the ubiquity of suction flows even among fish that don't rely on the external flows to capture their prey.

      Quantifying internal flows is a demanding task, and the paper presents new and exciting data. As is typical to new techniques, there are important limitations to its current use. The tracers are larger than those used for particle imaging velocimetry, and are heavier than the water. Therefore, they don't track the water precisely. It is difficult to predict the error generated due to this limitation, because it depends on the velocity gradients in the flow (for example accelerations). Additionally, the number of tracers is limited, so they provide a partial representation of the flows within the mouth. It stands to reason that particles drawn from different locations will have different trajectories, however this is not quantitatively analyzed.

      Particle tracking can lend itself to a quantitative analysis of the transport flows, but unfortunately the paper does not take full advantage of these capacities. The intake flow patterns are qualitatively described, and a quantitative estimate of the volume of water that passes near the esophagus are examples for such potential. Other important parameters such as efficiency can be potentially derived directly.

      The most important message of the results presented is that the flow of water inside the mouth has a functional role in moving the prey towards the esophagus, and that it can differ between species. These results teach us that the suction flows are important not only to prey capture but also to prey transport.

      Thank you for your interesting public review.

      In a revised version of the paper, we have evaluated the effect of size and density on tracking performance of our particles using CFD analysis. The results are presented in a new Figure (Figure 8), which is further illustrated by a video (Figure 8 – video 1). It serves to quantify the limitations due to particle size and buoyancy imperfection.

      The CFD results reassure that the finite size (1.4 mm diameter) and density (up to about 1050 kg m-3) of the current sample of particles (Figure 7) does not hinder a realistic assessment of the suction flows by fish. A small deviation of the path in the direction of gravity can be expected (Figure 8b), but this should be smaller than 1 mm even for the heaviest particles of 1050 kg m-3. Lag during flow acceleration and overshoot during deceleration for the slightly negatively buoyant particles was relatively small (Figure 8c) and therefore acceptable to describe general patterns of flow during suction feeding.

      Reviewer #2:

      This is a fascinating study that adds great resolution to the mechanisms of water flow in the mouth of fish during suction feeding. Using high-speed x-ray video (XROMM) to track food items and particles in the water, the authors show convincingly that fish have an intriguing ability to generate flows that center the food at the esophagus, and that intraoral flow differs between species. The video is impressive, showing all the particles flow into the mouth, and separation to direct food to the gullet and water to the outflow exit (gill arches).

      The methods of XROMM and particle tracking are quite well known -- there is nothing new in either approach, nor in combining them to track the prey item. However, the authors created a new kind of marker to enable tracking water flow patterns; a bead surrounded by foam, to create neutral buoyancy, that worked really well. Overall a fascinating study that adds to our understanding of suction feeding in fish.

      Thank you for your nice public review.

      We agree that our statement about the method novelty was confusing in the original version of the manuscript, especially in the sentence from the introduction. We rewrote it to: “Here, we develop a new technique based on biplanar high-speed X-ray videography to quantify the 3D pathlines of intraoral water and combine it with existing methods to track food and quantify 3D skeletal motions.” This should clearly separate the new contribution from the existing methods and make it clear that we see the tracking of water as a separate method in addition to the tracking of food. The latter has indeed been done previously in many 2D x-ray studies, and in more recent 3D biplanar X-ray studies. The revised paragraph in the discussion now reverts back to the x-ray particle tracking protocols that have been used in industrial settings (reference Drake et al. 2011).

    1. Author Response:

      Reviewer #1 (Public Review):

      Edmondson et al. develop an efficient coding approach to study resource allocation in resource constrained sensory systems, with a particular focus on somatosensory representations. Their approach is based on a simple, yet novel insight. Namely - to achieve output decorrelation when encoding stimuli from regions with different input statistics, neurons in the sensory bottleneck should be allocated to these regions according to jointly sorted eigenvalues of the input covariance matrix. The authors demonstrate that, even in a simple scenario, this allocation scheme leads to a complex, non-monotonic relationship between the number of neurons representing each region, receptor density and input statistics. To demonstrate the utility of their approach, the authors generate predictions about cortical representations in the star-nosed mole, and observe a close match between theory and data.

      Strengths:

      These results are certainly interesting and address an issue which to my knowledge has not been studied in-depth before. Touch is a sensory modality rarely mentioned in theoretical studies of sensory coding, and this work contributes to this direction of research.

      A clear strength of the paper is that it demonstrates the existence of non-trivial dependence between resource allocation, bottleneck size and input statistics. Discussion of this relationship highlights the importance of nuance and subtlety in theoretical predictions in neuroscience.

      The proposed theory can be applied to interpret experimental observations - as demonstrated with the example of the star-nosed mole. The prediction of cortical resource allocation is a close match to experimental data.

      We thank the reviewer for the feedback. Indeed, demonstrating an ‘interesting’ effect in even such a simple model was one of the main aims.

      Weaknesses:

      The central weakness of this work are the strong assumptions which are not clearly stated. In result, the consequences of these assumptions are not discussed in sufficient depth which may limit the generality of the proposed approach. In particular:

      1.The paper focuses on a setting with vanishing input noise, where the efficient coding strategy is to reduce the redundancy of the output (for example through decorrelation). This is fine, however, it is not a general efficient coding solution as indicated in the introduction - it is a specific scenario with concrete assumptions, which should be clearly discussed from the beginning.

      2.The model assumes that the goal of the system is to generate outputs, whose covariance structure is an identity matrix (Eq. 1). This corresponds to three assumptions: a) variances of output neurons are equalized, b) the total amount of output variance is equal to M (i.e. the number of of output neurons), c) the activity of output neurons is decorrelated. The paper focuses only on the assumption c), and does not discuss consequences or biological plausibility of assumptions a) and b).

      We have clarified the assumptions in the revised version. The original version did not distinguish clearly between assumptions that were necessary to allow study of the main effect, and assumptions that were included to present a full model but that could have been chosen otherwise without affecting the results. This has now been made much clearer. Regarding the noise issue (point 1), we have clarified the main strategy pursued by the model namely decorrelation, we acknowledge other possible strategies, and we make clear whether and how noise could be incorporated into the model. Regarding the biological plausibility of our assumptions (point 2).

      Reviewer #2 (Public Review):

      The authors propose a new way of looking at the amount of cortical resources (neurons, synapses, and surface area) allocated to process information coming from multiple sensory areas. This is the first theoretical treatment of attempting to answer this question with the framework of efficient coding that states that information should be preserved as much as possible throughout the early sensory stages. This is especially important when there is an explicit bottleneck such that some information has to be discarded. In this current paper, the bottleneck is quantified as the number of dimensions in a continuous space. Using only the second-order statistics of the stimulus, and assuming only the second-order statistics carrying information, the authors use variance instead of Shannon's information. The result is a non-trivial analysis of ordering in the eigenvalues of the corresponding representations. Using clever mathematical approximations, the authors arrive at an analytical expression -- advantageous since numerical evaluation of this problem is tricky due to the long thin tails of the eigenvalues of the chosen covariance function (common in decaying translation-invariant covariances). By changing the relative stimulus power (activity ratio), receptor density (effectively the width of the covariance function), and the truncation of dimensions (bottleneck width), they show that the cortical allocation ratio, surprisingly, is a non-trivial function of such variables. There are a number of weaknesses in this approach, however, it produced valuable insights that have a potential to start a new field of studying such resource allocation problems all across different sensory systems in different animals.

      ##Strengths

      *A new application of the efficient coding framework to a neural resource allocation problem given a common bottleneck for multiple independent input regions. It's an innovation (initial results presented at NeurIPS 2019) that brings normative theory with qualitative predictions that may shed new light to seemingly disproportionate cortical allocations. This problem did not have a normative treatment prior to this paper.

      *New insights into allocation of encoding resources as a function of bottleneck, stimulus distribution, and receptor density. The cortical allocation ratios have nontrivial relations that were not shown before.

      *An analytical method for approximating ordered eigenvalues for a specific stimulus distribution.

      ##Weaknesses

      The analysis is limited to noiseless systems. This may be a good approximation in the high signal-to-noise ratio regime. However, since the analysis of allocation ratio is very sensitive to the tail of eigenvalue distribution (and their relative rank order), not all conclusions from the current analysis may be robust. Supplemental figure S5 perhaps paints a better picture since it defines the bottleneck as a function of total variance explained instead of number of dimensions. The non-monotonic nonlinear effects are indeed mostly in the last 10% or so of the total variance.

      We agree that the model is most likely to apply in the low-noise regime, as stated in the Discussion. The robustness of the results is indeed a worry, and indeed we have encountered some difficulties when calculating model results numerically due to the issue pointed out by the reviewer, and this led us to focus on an analytical approach in the first case. However, to test model robustness we have now included numerical results for several other covariance functions to demonstrate that, at least qualitatively, the results presented in the paper are not simply a consequence of the particular correlation structure we investigated.

      In case where the stimulus distribution is Guassian, the proposed covariance implies that the stimulus distribution is limited to spatial Gaussian processes with Ornstein-Uhlenbeck prior with two parameters: (inverse) length-scale and variance. While this special case allowed the authors to approach the problem analytically, it is not a widely used natural stimuli distribution as far as I know. This assumed covariance in the stimulus space is quite rough, i.e., each realization of the stimulus is spatially continuous isn't differentiable. In terms of texture, this corresponds to rough surfaces. Of course, if the stimulus distribution is not Gaussian, this may not be the case. However, the authors only described the distribution in terms of the covariance function, and lacks additional detail to fill in this gap.

      We would argue that somewhat ‘rough’ covariance structure might be relatively common, for example in vision objects have clear borders leading to a power law relation and similarly in touch objects are either in contact with the skin or they are not. In either case, we have now extended the analysis to test several other covariance functions numerically. We found that, qualitatively, the main effects described in the paper were still present, though they could differ quantitatively. Interestingly, the convergence limit appeared to depend on the roughness/smoothness of the covariance function, indicating that this might be an important factor.

      The neural response model is unrealistic: Neuronal responses are assumed to be continuous with arbitrary variance. Since the signal is carried by the variance in this manuscript, the resource allocation counts the linear dimensions that this arbitrary variance can be encoded in. Suppose there are 100 neurons that encode a single external variable, for example, a uniform pressure plate stimulus that matches the full range of each sensory receptor. For this stimulus statistics, the variance of all neurons can be combined to a single cortical neuron with 100 times the variance of a single receptor neuron. In this contrived example, the problem is that the cortical neuron can't physiologically have 100 times the variance of the sensory neuron. This study is lacking power constraint that most efficient coding frameworks have (e.g. Atick & Redlich 1990).

      We agree that the response model, as presented, is very simplistic. However, the model can easily be extended to include a variety of constraints, including power constraints, without affecting the results at all. Unfortunately, we did not make this clear enough in the original version. The underlying reason is that decorrelation does not uniquely specify a linear transform and the remaining degrees of freedom can be used to enforce other constraints. As the allocation depends only on the decorrelation process (via PCA), we do not explicitly calculate receptive fields in the paper and any additional constraints (power, sparsity) would affect the receptive fields only and so were left out in the original specification. We have now added clearer pointers for how these could be included and why their inclusion would not affect the present results.

      The star-nosed mole shows that the usage statistics (translated to activity ratio) better explains the cortical allocation than the receptor density. However, the evidence presented for the full model being better than either factor is weak.

      We agree that the results do not present definitive evidence that the model directly accounts for cortical allocations and as we state in the paper, much stronger tests would be needed. Our idea here was to test whether, in principle, the model predictions are compatible with empirical evidence and therefore whether such models could become plausible candidates for explaining neural resource allocation problems. This seems to be the case, even though the evidence in favour of the ‘full model’ versus the ‘activity only’ model is indeed not overwhelming (though this might be expected as the regional differences in activity levels are much greater than those in density). We have now added additional tests to show that the results are not trivial. We would also like to note that it is not obvious that the ‘full’ model would perform better than the ‘activity only’ model: for either we choose the best-fitting bottleneck width (as the true bottleneck width is unknown), and therefore the degrees of freedom are equal (with both activity levels and densities fixed by empirical data).

      Reviewer #3 (Public Review):

      This work follows on a large body of work on efficient coding in sensory processing, but adds a novel angle: How do non-uniform receptor densities and non-uniform stimulus statistics affect the optimal sensory representation? The authors start with the motivating example of fingers and tactile receptors, which is well chosen, as it is not overstudied in the efficient coding literature. However, the connection between their model and the example seems to break down after a few lines when the authors state that they treat individual regions as independent, and set the covariance terms to zero. For finger, e.g. that would seem highly implausible, because we typically grasp objects with more than one finger, so that they will be frequently coactivated.

      Our aim was to take a first stab at a model that could theoretically account for neural resource allocation under changes in receptor density and activity levels, and by necessity this initial model is rather simple. Choosing a monotonically decreasing covariance function along with some other simplifications allowed us to quantify the most basic effects, and do so analytically. Any future work should take more complex scenarios into account. Regarding the sense of touch, we agree that the correlational structure of the receptor inputs will be more complex than assumed here, however, whether and how this would affect the results is less clear: Across all tactile experiences (not just grasps, but also single finger activities like typing), cross-finger correlations might not be large compared to intra-finger ones. Unfortunately, there is currently relatively little empirical data on this. That said, we agree with the broader point that complex correlational structure can be found in sensory systems and would need to be taken into account when efficiently representing this information.

      The bottleneck model posited by the authors requires global connectivity as they implement the bottleneck simply by limiting the number of eigenvectors that are used. Thus, in their model, every receptor potentially needs to be connected with every bottleneck neuron. One could also imagine more localized connectivity schemes that would seem more physiologically plausible given the observed connectivity patterns between receptors and relay neurons (e.g. in LGN in the visual system). It would be very interesting to know how this affects the predictions of the theory.

      We agree that the model in its current form is not biologically plausible. While individual receptive fields can be extremely localised, the initial allocation of neurons to regions we describe in the paper relies on a global PCA, and it is not clear how this might be arrived at in practice under biological constraints. However, our aim here was to specify a normative model that generates the optimal allocation and thereby answer what the brain should be doing under ideal circumstances. Future work should definitely ask whether and how these allocations might be worked out in practice and how biological constraints would affect the solutions.

      The representation of the results in the figures is very dense and due to the complex interplay between various factors not easy to digest. This paper would benefit tremendously from an interactive component, where parameters of the model can be changed, and the resulting surfaces and curves are updated.

      We have aimed to make the figures as clear as possible, but do appreciate that the results are relatively complex as they depend on multiple parameters. The code for re-creating the figures is available on Github (https://github.com/lauraredmondson/expansion_contraction_sensory_bottlenecks), making it easy to explore scenarios not described in the paper.

      For parts of the manuscript, not all conclusions made by the authors seem to follow directly from the figures: For example, the authors interpret Fig. 3 as showing that activation ratio determines more strongly whether a sensory representation expands or contracts than density ratio. This is true for small bottlenecks, but for relatively generous ones it seems the other way around. The interpretation by the authors, however, fits better the next paragraph, where they argue that the sensory resources should be relatively constant across the lifespan of an animal, and only stimulus statistics adapt. However, there are notable exceptions - for example, in a drastic example zebrafish change their sensory layout of the retina completely between larvae and adult.

      We have amended the text for this section in the paper to more closely reflect the conclusions that can be drawn from the figure. These are summarised below. The purpose of Fig. 3B is to show that knowledge of the activation ratio provides more information about the possible regime of the bottleneck allocations. We cannot tell the magnitude of the expansion or contraction from this information alone, or where in the bottleneck the expansion or contraction would occur. Typically, when we know the activation ratio only, we can tell whether regions will be expanded or contracted or whether both occur over all bottleneck sizes. For a given activation ratio (for example, a = 1:2, as shown in the 3B), we know that the lower activation region can be either contracted only or both expanded and contracted over the course of the bottleneck. In this case, regardless of the density ratio, the lower activation region cannot be contracted only. Conversely, for any density ratio (see dashed horizontal line in Fig. 3B), allocations can be in any regime.

      In the final part of the manuscript, the authors apply their framework to the star nosed mole model system, which has some interesting properties; in particular, relevant parameters seem to be known. Fitting to their interpretation of the modeling outcomes, they conclude that a model that only captures stimulus statistics suffices to model the observed cortical allocations. However, additional work is necessary to make this point convincingly.

      We have now included a further supplementary figure panel providing more details on the fitting procedure and results for each model. Given that we fit over a wide range of bottleneck sizes, where allocations for each ray can vary widely (see Figure 6, supplement 1A), we tested an additional model to confirm that the model requires accurate empirical density and/or activation values for each ray to provide a good fit to cortical data. Here we randomise the values for the density and activation of each ray within the possible range of values for each. We find that with this randomisation of the values the model performs poorly on fitting even with a range of bottleneck sizes. This suggests that the model can only be fitted to the empirical cortical data when using the empirically measured values.

    1. Author Response:

      Reviewer #2 (Public Review):

      This paper presents an intriguing pipeline that can be applied to understand and predict the mechanical and biophysical properties of intermediate filaments in a given cell type. The work is very well documented as a continuum from obtaining the imaging data to the analyses in Matlab and Fiji to the translation into virtual reality. The descriptions are concisely written so anyone can understand the essence of different parameters. The strategy forms a rather pioneering multi-dimensional visualization approach that revealed hallmark features of different keratin filaments networks in various cell types.

      • There is a good selection of cell lines to accommodate the varied presentations of keratin filaments in cell lines with different properties. The morphological representations of the figures in 3D very well illustrate the nature and organization of the cells in vitro and in vivo which is further examined in the measurements of different parameters such as curvature and orientation. • This pipeline introduces a fresh strategy to analyze, compare and interpret network organization of cellular filaments. The comparison of the filament orientation between MDCK and HaCaT B9 cells was intriguing, as it highlights the nature of their arrangement in in vitro monolayers and draws a parallel between how cell shape influences network arrangement aside from the assumed polarity. • It would be interesting to compare how these parameters differ in MDCK cells with cuboid or cylindrical geometries.

      We agree with the Reviewer that this would be an interesting property to analyse in the future, since we are especially interested in the biomechanical function of the cortical keratin cytoskeleton and its contribution to cell shape changes.

      • With regards to segmentation of images, there seems to be a difficulty in segmentation of denser areas and some dim segments in light to medium intensity areas as noticeable in Fig .1. Any remedy for this?

      We concur with the Reviewer that segmentation is limited by the microscopic resolution in xy and especially in z. Improvement is expected by increasing the microscopic resolution and by further improvement of segmentation algorithms using, e.g. machine learning.

      • It would be informative if an expert panel would manually segment some images to compare with automatically segmented ones so that a false positive/negative ratio could be established.

      We are aware that manual segmentation has the reputation of being the gold standard but question that even experts would fully agree on a ground truth. The technical difficulties of segmentation and annotation of 3D data as well as the human bias make this approach quite challenging. By making the data sets publicly available, specific questions, however, may now be addressed by interested individuals.

      • In the transformation of 3D fluorescence recordings of keratin filaments into digital networks, other than whole-cell networks, it will be interesting to show a few examples of keratin structures at representative subcellular domains, such as the nucleus.

      • The authors pointed out that in MDCK cells, the basal domain has thicker bundles compared to the apical domain, while the lateral keratin network is more heterogeneous. Is it possible to statistically present this feature of keratin filaments? And what would be the case in HaCaT and REP cells?

      Exploration of subdomains is afforded by the interactive 3D renderings provided at KerNet.rwth-aachen.de. However, systematic and quantitative analyses of segment properties in subcellular domains is an obvious but quite challenging issue. The main difficulty is a precise spatial definition of subdomains in 3D, which would require substantial effort.

    1. Author Response

      Reviewer #1 (Public Review):

      Cui and colleagues have performed a longitudinal analysis of blood cell counts in a cohort of ALS patients. The major findings include increases in neutrophils and monocytes that negatively correlated with ALSFRS-R score, but not disease progression rate. Increases in NK and central memory TH2 T cells correlated with a lower risk of death, while increased CD4 CD45RA effector memory and CD8 T cells were correlated with a higher risk of death.

      Strengths of the study include the sample size and effort to broadly include data.

      Thank you for the positive comment.

      Limitations of the study include indication bias, as the authors acknowledge, because the timing of the blood draws is not predefined. The specific review for possibility of infection does not, in this reviewer's opinion, sufficiently address this potential for bias. Also concerning is the fact that half the subjects have only a single measurement, and how well the findings generalize to more or late measurements is not clear. Similarly, the number of later measurements driving some of the main findings is much lower, further raising concern about the potential bias. Given these issues, one really would want to see disease controls, and how the different cell counts change in another disease. Finally, there is not discussion about how or whether treatments, or changes in treatment, could influence observed counts.

      We agree with the reviewer regarding indication bias and that is precisely why we performed the sensitivity analyses including 1) restricting the analysis to the first cell measure of each patient and 2) excluding cell measures with signs of ongoing infection at the time of blood draw. Reassuringly, both analyses provided rather similar results as those of the main analysis. We also agree with the reviewer regarding the varying numbers of measurements between patients. This is an unavoidable challenge to any longitudinal study of ALS patients, primarily due to the high mortality rate of this patient group. We have now added this limitation to the discussion:

      “First, the main cohort was heterogeneous in terms of the numbers of cell measurements and the time intervals between measurements, as the timing of blood sampling was not predefined. Indication bias due to, for example, ongoing infections might therefore be a concern. The sensitivity analysis excluding all samples taken at the time of infections provided however rather similar results. Further, the longitudinal analysis of cell counts should be interpreted with caution because not all patients contributed repeated cell measurements. This is however an unavoidable problem for any longitudinal study of ALS patients, given the high mortality rate of this patient group. Regardless, when focusing on the first cell measures, we obtained similar results as in the main analysis.”

      We further agree with the reviewer regarding the use of disease control. We have access to a cohort of patients with relapsing-remitting MS (RRMS) treated by rituximab (n=34), who had been measured with all the studied cell populations at the start of treatment and the 6-month follow-up. These cell measurements were processed during the same time-period using the identical setup at Karolinska University Hospital as the ones studied in the present study. In brief, we found different longitudinal changes of the studied immune cell populations between RRMS patients and ALS patients (please see below figure for details). The declining B cells are most likely due to rituximab treatment.

      Given the largely different disease mechanisms, phenotypes, and treatments between RRMS and ALS, we are not confident that RRMS would be a good disease control for the present study. We are certainly willing to reconsider our position if the reviewer and editors would disagree with us. We have regardless now added discussion about this in the manuscript:

      “It would therefore be interesting to compare ALS with other diseases, especially other neurodegenerative diseases, regarding the studied cell counts, in terms of both their longitudinal trajectories during disease course and their prognostic values in predicting patient outcome.”

      Finally, we agree that it is interesting to consider treatment in the analysis of cell counts. Among the ALS patients of the main cohort, majority (89.6%) were treated with Riluzole. We have now added a supplementary figure to demonstrate the leukocyte counts before and after start of Riluzole treatment. The corresponding analysis is however not possible for the FlowC cohort as majority of the patients started Riluzole treatment around time of diagnosis and almost all measurements were taken after Riluzole treatment. Th17 of CD4+ CM cells CD4+ EMRA cells CD8+ T cells Naïve CD8+ T cells CD8+ EM cells CD8+ CM cells CD8+ EMRA cells CD4+ HLA-DR+ CD38- cells CD4+ HLA-DR+ CD38+ cells CD8+ HLA-DR+ CD38- cells CD8+ HLA-DR+ CD38+ cells.

      We have now added this analysis to Methods and Results, including a new Figure 1—figure supplement 2.

      “To evaluate whether ALS treatment would influence the cell counts, we further visualized the temporal patterns of differential leukocyte counts before and after Riluzole treatment.”

      “The levels of leukocytes, neutrophils and monocytes increased, whereas the levels of lymphocytes decreased, after Riluzole treatment, compared with before such treatment (Figure 1—figure supplement 2).”

      Reviewer #2 (Public Review):

      Cui et al. investigated the correlation of immune profiles in ALS patients to functional status (by ALSFRS-R score), disease progression (rate of ALSFRS-R decline) and/or risk of death (or invasive ventilation use). The study longitudinally assessed basic immune profiles from a large cohort of ALS patients (n=288). Additionally, they deeply immunophenotyped a subset of ALS patients (n=92) to examine immune cell subtypes on ALS status, progression rate, and survival. The longitudinal design, deep immunophenotyping, and large cohort are significant strengths. Using various statistical models, the authors found leukocyte, neutrophil, and monocyte counts increased gradually over time as ALSFRS-R score declined. Within lymphocyte subpopulations, increasing natural killer cells and Th2-diffrentiated CD4+ central memory T cell counts correlated with a lower risk of death. Increasing CD4+ effector memory cells re-expressing CD45RA T cell and CD8+ T cell levels associated with a higher risk of death. These findings have broad implications for ALS pathogenesis and the development of immune-based ALS therapies tailored to specific immune cell populations.

      Thank you for the very positive comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript elegantly demonstrates that the degradation of PTPN14 by human papillomavirus (HPV) 16 and 18 E7 proteins previously reported by the authors is essential for E7-mediated YAP1 activation. This is important for E7-mediated maintenance of basal cell state and presumably persistence of HPV infection. The authors use a series of innovative tissue models combined with validation in clinical samples to demonstrate the importance of YAP1 activation in high-risk HPV pathogenesis.

      The data are of high quality with excellent controls. The manuscript is well-written and the rationale of each experiment easy to follow. In general the results support the authors conclusions. I have the following suggestion to improve the manuscript: The enhanced nuclear expression of YAP in the basal cells of epithelia expressing HPV16/18 E7 is difficult to see in the low resolution IF images shown. The magnified images do show enhanced expression compared to HFK cultures, but to remove any bias in selection of enhanced areas, could the authors include quantification of the distribution of IF signal in the basal cells, compared to the suprabasal cells, of the epithelia shown with statistical analysis? Figure 2 would also benefit from quantification as described above.

      We appreciate the positive feedback and constructive suggestions from Reviewer #1. We used widefield images with the goal of presenting as many cells in organotypic cultures as possible, but at low magnification. We have further analyzed the imaging data and updated the manuscript as follows:

      1) We assessed YAP1 intensity in basal and suprabasal layers as suggested by the reviewer. Consistent with literature reports, YAP1 is expressed predominantly in basal cells in each of our organotypic cultures, independent of E7 status (see figure below).

      2) Because YAP1 is always more highly expressed in basal cells than in suprabasal cells and YAP1 is regulated at the level of nuclear/cytoplasmic localization, we anticipated that quantification of YAP1 nuclear localization in our organotypic cultures may be more useful to readers than basal/suprabasal quantification.

      Consequently, we conducted classification-based analyses to quantify YAP1 nuclear localization (a surrogate for YAP1 activity) in the cultures. Each image to be analyzed was deidentified and assigned a coded name. Each cell in the basal layer was then classified as having either predominantly nuclear YAP1 staining, predominantly cytoplasmic YAP1 staining, or YAP1 staining that is comparably distributed between the nucleus and cytoplasm. At least three fields were analyzed per raft. We assessed YAP1 localization in 8,323 cells (average 378.3 cells/culture shown in the text for almost all cultures). The quantifications are now included in Figure 1-figure supplement 2C-E, Figure 1-Figure supplement 5A-C, and Figure 2-figure supplement 1D-F.

      The new quantifications do not change our interpretations of the results nor our conclusion that HPV E7 degrades PTPN14 to activate YAP1 in basal cells. We noted that HPV E6 may promote YAP1 nuclear localization to some degree and have updated the text accordingly.

      Reviewer #2 (Public Review):

      Strengths: A major strength of this report is the use of several different technical approaches, the results from which converge to provide several types of data supporting their conclusions. These various techniques include genetic knockdown/overexpression in primary keratinocytes, organotypic raft cultures, laser-capture microdissection, cell fate monitoring assays, and analysis of publicly available datasets. The manuscript is well-written and the figures are well-made. Weaknesses: Overall, there are only a few minor weaknesses related to figure quality and presentation (which will be conveyed in the private recommendations to the authors).

      We appreciate the positive feedback and these thoughtful comments from reviewer #2.

      Are claims/conclusions justified by data? Overall, the authors' conclusions are adequately justified by the data. However, there were a few interpretations I felt were somewhat overstated given the experiments performed and data provided.

      1. The first issue relates to the interpretation/conclusion of the results from experiments analyzing basal cell number. In Figure 2, the basal cell number was indeed reduced in R84S compared to WT E7. However, it was not reduced to parental HFK levels, suggesting other E7 activities are involved in increasing basal cell number. A similar observation is presented in Figure 7 (E-F), where the R84S E7 mutant still had significantly higher basal cell retention than the empty vector control, albeit lower than WT E7. While their data certainly indicates that the binding and subsequent degradation of PTPN14 is an E7 function important to increasing basal cell number and retention, there are clearly other E7 functions involved. While the authors don't necessarily overinterpret these findings, the possibility that other E7 functions are involved is not explicitly acknowledged or explored in the Discussion.

      Indeed, cells expressing HPV18 E7 R84S retain some capacity to increase basal cell number (Figure 2) and promote basal cell retention (Figure 7). It is possible that an activity of HPV E7 in addition to PTPN14 degradation influences these phenotypes. HPV18 E7 R84S retains the capacity to bind and degrade RB1 (Hatterschide et al., 2020). The basal cells in the HPV18 E7 R84S cell fate experiment were predominantly found in clusters indicative of possible clonal expansion. We hypothesize that such clusters reflect proliferation induced by RB1 inactivation and cause the ratio of basal to suprabasal cells to remain high even in the R84S mutant condition. Our hypothesis is now described in further detail in the text.

      1. The second issue pertains to the findings related to the effect on differentiation upon modulation of key Hippo pathway components (Figure 4). It does not appear that the authors performed these studies in the presence of any well-known stimuli that induce the differentiation process in keratinocytes grown in 2D culture (high calcium, high serum, etc) nor did they use these cells in organotypic rafts wherein differentiation occurs during the raft stratification process. This is particularly true in the studies exploring PTPN14 plus LATS1/2 silencing and the effect on repression of keratinocyte differentiation. Whereas it seems PTPN14 itself was serving as the differentiation stimuli in earlier experiments (Figure 4C/D), it does not appear any differentiation stimuli were provided in the experiments shown in Figures 4E-I. For these reasons, the interpretation drawn by the authors that "...inactivation of three different YAP1 inhibitors dampens differentiation gene expression" (Line 220-221) and "inactivation of LATS1 or LATS2...also repressed differentiation genes" (Lines 349-350) seems specific to endogenous levels of differentiation genes. It seems difficult to conclude that inactivation of the Hippo pathway is actively repressing the induction of differentiation if the cells are not being treated with stimuli to induce differentiation.

      Indeed, no differentiation stimuli were used in these experiments. We previously observed that PTPN14 knockout or E7 expression reduced differentiation gene expression both in undifferentiated cells and in cells stimulated to differentiate (Hatterschide et al., 2019, 2020). We anticipate that gene expression in unstimulated cells is reflective of gene expression in cells stimulated to differentiate. We altered the results and discussion text to emphasize that the experiment measures differentiation gene expression in unstimulated cells.

    1. Author Response:

      Evaluation Summary:

      This manuscript addresses a phenomenon of great interest to researchers in cell metabolism and cancer biology: namely, why do cancer cells often secrete high levels of lactate, despite the presence of abundant oxygen to power nutrient oxidation (Warburg effect). The authors propose that lactate export and subsequent extracellular acidification provides a selective advantage and the concomitant rise in intracellular pH is sufficient to drive flux through glycolysis, thereby sustaining the Warburg effect. This is an intriguing hypothesis that ties together many published observations, but it would require further support both from the technical and conceptual side.

      The concept proposed in the evaluation summary is not quite correct, in this paper we have tried to show that it is not lactate export that drives extracellular acidification, but that cells which can increase proton export, via over-expression or increased activity of proton exporting proteins, can subsequently drive upregulation of glycolysis and increased lactate production, likely due to increased intracellular pH (pHi) and the ability of glycolytic enzymes to have enhanced activity under slightly higher pHi. As mentioned in the summary, although some of these observations are known, the novelty lies in that they have not been directly proven by inducing acid export prior to a glycolytic phenotype, we believe showing the casual nature of proton export on glycolysis is the novelty of this research.

      Reviewer #1 (Public Review):

      In this manuscript, the authors tackle an interesting puzzle: why do cancer cells secrete most of their glucose as lactate? The authors propose that acid export is sufficient to enhance glycolysis and provide a selective advantage to cancer cells growing in vivo. To this end, the authors show that clonal lines expressing CA-IX or PMA1, each of which will facilitate proton export, have elevated capacity to acidify extracellular medium and can drive increased migration/invasion and tumor growth or metastases. In support of the model that extracellular pH is a key driver of metastases, the effect of CA-IX expression on lung metastases is reversed following bicarbonate treatment. While many of the individual conclusions of the manuscript are not novel-for example, pH has been reported to control glycolysis and it is established that CA-IX expression modulates migration/metastases-providing a comprehensive assessment of the ability of proton export to drive the Warburg effect, and assessing the significance of metabolic rewiring driven by acid export on tumor growth, would represent an important resource for researchers intrigued by the pervasive observation that cancer cells secrete lactate despite potential bioenergetic disadvantages of discarding biomass.

      The strength of the manuscript lies therefore in tying these disparate observations together in a coherent model and testing the role of acid export per se on glycolytic flux. The technical weaknesses of the paper prevent such coherent model building. A major concern is that all cell lines appear to be generated by transient transfection followed by clonal selection, giving rise to cells with notable variability and inconsistent phenotypes. More traditional approaches to manipulate enzyme expression will provide more robust model systems to test the proposed model. Similarly, direct measures of glycolytic flux are required to make conclusions about the role of acid export in promoting glycolysis. Another strength is the use of heterologous enzyme systems to alter proton export in cancer cells, but alternative explanations for these results are not fully considered. Ultimately, to what extent acid export per se, as opposed to altered metabolism driven by acid export, drives enhanced tumor metastases is not addressed.

      We agree wholly with Reviewer 1 that although individual components of this manuscript have previously been implicated in cancer research, the novelty lies in directly assessing metabolic changes, specifically the Warburg effect, as a result of proton production to determine causality rather than correlation as previous studies have shown. The reviewer makes a valid point about our use of clones and this is something we considered at length. When originally designing these experiments, we had many conversations within our lab and with collaborators and colleagues, and the overall consensus was that bulk populations are more likely to have heterogeneous expression levels unrelated to transfection, which could result in the phenotype generated being noisy and not indicative of what occurs when proton exporters are over-expressed. We chose to isolate single clones, maintaining these in antibiotic selection media, to ensure stable over-expression. After confirming over-expression, cells were grown without antibiotics and screened regularly for maintenance of protein expression. This was also one of the reasons why we utilized over-expression of two different proton exporters in multiple different cell lines to be confident that proton export was changing the metabolic phenotype and not just due to changes in an individual isolated clonal line. We utilized bulk population for the MOCK clones, to ensure we weren’t selecting for a clone which had inherently different metabolic traits from the parental population. As described in the paper, while some of the behaviors of the different clones are indeed divergent, the impact of expression on increased glucose uptake and lactate production is wholly consistent and highly correlated to expression of PMA1 or CA-IX. Although we utilized metabolic profiling, we do not claim to infer flux from these data. Flux was assessed via lactate production and glucose consumption rates. The metabolomic analyses showed that glycolytic intermediates upstream of Pyruvate Kinase (PK) were uniformly increased in transfectants. This was an unequivocal finding and, given the increased flux, we have concluded that transfection results in activating glycolytic enzymes upstream of PK. The pleiotropic nature of these effects have led us to propose that intracellular pH was increasing and likely enhancing glycolytic enzyme activity throughout the glycolytic pathway. We measured the intracellular pH and showed that it was generally elevated in the transfectants. Finally, the reviewer was concerned that we did not address the mechanism by which pH increases metastases. Such a study would be beyond the scope of this paper and, indeed, was the subject of a two-volume special issue of Cancer Mets. Rev. in 2019 (PMC6625888). Hence, in this paper, we were not trying to address the mechanism by which pH affects metastasis, but simply wanted to show additional biological relevance.

      Reviewer #2 (Public Review):

      The work by Xu et al proposes that the Warburg effect - the increase of glycolytic metabolism usually displayed by tumor cells, is driven by increased proton excretion rather than by oncogenic dysregulation of glycolytic enzyme levels. As a proof-of-principle, they engineered tumor cells to increase proton excretion. They observed an increase in glycolytic rate, pH, and malignancy in their engineered cells.

      1. My main issue with this work is that I do not agree with the authors when they say that the "canonical view" is that oncolytic mutations are thought to drive the Warburg effect. What I understand the consensus to be, is that it is fast proliferating cells - rather than malignant cells - the ones who display this form of metabolism. The rationale is that glycolytic metabolism allows keeping biomass by redirecting lactate and from the phosphate pentose pathway. In contrast, the end product of oxidative phosphorylation is CO2 that cannot be further utilized in cell metabolism.

      They claim that they Vander Heiden et al., 2009 shows that "fermentation under aerobic conditions is energetically unfavorable and does not confer any clear evolutionary benefits." This is incorrect. While that review states that the Warburg effect has little effect on the ATP/ADP ratio, they do show this form of metabolism has significant benefits for fast proliferating cells. In fact, the whole review is about how the Warburg effect is a necessary metabolic adaptation for fast proliferation rather than a unique feature of malignant cells.

      1. Their main observation is not surprising. From a biochemical standpoint, protons are final product of glycolysis (from the production of lactic acid). Thus, by mass action, any mechanism to remove protons from the cell will result in accelerated glycolytic rate. Similarly, reducing intracellular pH will necessarily slow down LDHA's activity, which in turn will slow down pyruvate kinase and so on.

      2. Their experiments are conducted on transformed cells - that by definition - have oncogenic driver mutations. They should test the effect of proton exporter using primary non-transformed cells (fresh MEFs, immune cells, etc). I would expect that they will still see the increase in glycolysis in this case. And yet, I would still have my concerns I expressed in my previous point.

      3. The fact that they can accelerate the Warburg effect by increasing proton export does not mean is the mechanism used by tumor cells in patients or "the driver" of this effect. As I mentioned, their observation is expected by mass action but tumors that do not overexpress proton transporter may still drive their Warburg effect via oncogenic mutations. The biochemical need here is to increase the sources of biomass and redox potential and evolution will select for more glycolytic phenotypes.

      Comment 1: We disagree with the reviewer that the energetic demands of a faster proliferating cell drive glycolysis in order to produce the biomass needed for generation of new cells. Available evidence does not support this hypothesis. As the reviewer mentioned, there is a correlation between proliferation and aerobic glycolysis (i.e. if cells are stimulated to grow they will consume more glucose), and the same can be said for motility (i.e. more motile cells have higher aerobic glycolysis). This is also true for normal cells and tissues that exhibit high levels of aerobic glycolysis. We agree that glycolytic ATP generation is more rapid than oxidative phosphorylation and that this may confer some selective advantage for transporters, as we described in PMC4060846. Nonetheless, it is clear that under conditions of similar proliferation and motility, more aggressive cancer cells ferment glucose at much higher rates. However, correlations between neither proliferation nor motility are the “Warburg Effect” which is a higher rate of aerobic glycolysis in cancers, regardless of proliferation or migration. As we described in PMID 18523064, the prevailing view in the cancer literature is that the Warburg effect is driven by oncogenes (ras, myc), transcription factors (HIF) and tumor suppressors (p53/TIGAR) through increased expression of glycolytic enzymes. This assumes that expression levels drive flux which has not been proved empirically. In biochemical pathways, it is canon that flux is regulated by demand (e.g. ATP) or through some post-transcriptional control (e.g. pH). In Vander Heiden’s paper the steady state levels are reported of ATP/ADP ratios, not flux. The first paragraph of the intro has been modified to accommodate this concern.

      Comment 2: The fact that our results are not surprising is our major argument: i.e. that glycolytic flux can be enhanced by increasing the rate of H+ export. We saw an increase in intracellular pH (pHi), but our metabolomics data do not support a direct effect on LDHA or PK. Instead, we show that clones with higher pHi have a crossover point at PK, due to reduced inhibition of upstream enzymes which is not there in clones at lower pHi.

      Comment 3: We agree it would be interesting to study the effects of proton export on immune cells especially given the increase in immunotherapy use in cancer treatment. We did utilize HEK 293 cells shown in supplemental figure S6, to show this was not a cancer cell line specific phenomenon, and we saw increased aerobic glycolysis with over-expression of CA-IX.

      Comment 4: We agree that oncogenic mutations can alter glycolytic rate, but we observed that increased expression and activity of proton exporters is sufficient to drive a Warburg effect. Although the reviewer indicates that glycolysis is responsible for generating the biomass needed for these faster proliferating cells, we have shown that proton exporter driven aerobic glycolysis does not increase proliferation rates. The literature, see Vander Heiden’s paper below, suggests that amino acids, mainly glutamine, can support the majority of biomass needs of a proliferating cell. Hence, reliance on aerobic glycolysis remains energetically inefficient and inefficient in that most of the carbons are removed, and thus will not be selected by evolution.

      Hosios, A.M., Hecht, V.C., Danai, L.V., Johnson, M.O., Rathmell, J.C., Steinhauser, M.L., Manalis, S.R., & Vander Heiden, M.G. (2016). Amino Acids Rather than Glucose Account for the Majority of Cell Mass in Proliferating Mammalian Cells. Developmental cell, 36 5, 540-9 .

      Reviewer #3 (Public Review):

      The authors claim that "proton export drives the Warburg effect". For this, they expressed proton-exporting proteins in cells and measured the intracellular proton concentration and the Warburg effect. Based on their data, however, I do not see elevated Warburg effect in these cells and thus conclude that the claim is not supported.

      The authors concluded that the CA-IX or PMA1 expressing cells had increased Warburg effect. I don't think this conclusion can be made based on the data presented. For the MCF-7 cells, the glucose consumption is ~18 pmol/cell/24hr (Fig. 5E) and lactate production is ~0.6 pmol/cell/24hr (Fig. 5F), indicating that 0.6/18/2 = 1.7% of the glucose is excreted as lactate. This low percentage remains true for the PMA1 expressing cells. For example, for the PMA1-C5 cells, the percentage of glucose going to lactate is about 1.8/38/2 = 2.4% (Fig. 5EF). While indeed there was an increase of both the glucose and lactate fluxes in the PMA1 expressing cells, the vast majority of the glucose flux ends up elsewhere likely the TCA cycle. This is a very different phenotype from cancer cells that have Warburg effect. The same calculation can be done for the CA-IX cells but the data on the glucose and lactate concentration there are inconsistent and expressed in confusing units (which I will elaborate in the next paragraph). Nevertheless, as there were at most a few folds of increase in lactate production flux in the M1 and M6 cells, the glucose flux going to lactate production is likely also a few percent of the total glucose uptake flux. Again, these cells do not really have Warburg effect.

      The glucose and lactate concentration data are key to the study. The data however appear to lack consistency. The lactate concentration data in Fig. 1F shows a ~5-fold increase in the M1 and M6 cells than the controls but the same data in S. Fig. 2 shows a mere ~50% increase. The meaning of the units on these figures is not clear. While "1 ng/ug protein" means 1ng of lactate is produced by 1 ug protein of cells over a 24 hour period, I do not understand what "ng/ul/ug protein" means (Fig. 1F). Also, "g/L/cell" must be a typo (S. Fig. 2). Furthermore, regarding the important glucose consumption flux, it is not clear why the authors did not directly measure it as they did for the PMA1 cells (Fig. 5E). Instead, they showed two indirect measurements which are not consistent with each other (Fig. 1E and S. Fig. 1).

      The reviewer pointed out discrepancies in our data and, upon reviewing, we have identified a dilution error leading to miscalculation of glucose consumption in Fig 5E. We have also repeated these experiments which agree with our re-calculation. Originally, it appeared from the data we presented that there was very little lactate flux, we have re-calculated the glucose excreted as lactate (average % using data from Fig. 5E and 5F) and present in a table below. We do believe we observed a Warburg effect in our proton exporting cells consistently. The reviewer points out that we utilized multiple methods to measure glycolysis in these cells leading to inconsistency, however we felt using multiple methods/instruments/kits to assess glucose consumption, lactate production, and glucose induced proton production rates was a strength of our findings as we consistently saw increased glycolysis in our proton exporting clones, irrespective of proton exporter, cell line, or method utilized. We are also not suggesting that glucose is solely being metabolized through glycolysis and do agree that it can metabolized through other metabolic pathways too such as TCA cycle, as the reviewer stated. The units used for these graphs are described in the methods and figure legends, in some assays such as Fig. 1F lactate was graphed as the ng of lactate per ul of cell culture media and then normalized per ug protein, which was determined by calculating the protein concentration of cells per well of the assay. Supplementary figure 2 has been re plotted per 10K cells to match other normalization values in the paper. Fig 1E and Fig. S1 are two different time points, M6 acidified media faster than M1 and this is likely why at 1 hour we are not yet seeing substantial increase in glucose uptake of M1.

    1. Author Response:

      Reviewer #1 (Public Review):

      This is an interesting study looking at the evolution of ageing in social insects using ants as a model. As I haven't seen the initial submission, I have looked at the manuscript and the response to reviewers and I base my suggestions on both documents.

      Evolution of ageing remains only partially understood and this field seems to be experiencing a sort of renaissance in recent years with a surge of theoretical advances and new empirical findings. Queens of social insects, and ant queens in particular, have remarkable lifespans and understanding the biology of their long life can help in understanding the biology of ageing in a more general sense.

      In this study, the authors focus on following quite a large number of ant (C. obscurior) colonies and provide intriguing data in relation to age-specific mortality and reproduction. The gist of their argument is that the mortality is decreasing with age while reproduction (production of sexuals) is increasing with age, such that there is little evidence of ageing in this species.

      Overall I think this is an interesting dataset that provides important information that will advance the field. However, I think the manuscript currently lacks clarity, structure and suffers from poor formulation of ideas in places, and is rather difficult to follow even for an expert in the field. I think that it requires quite a bit of work to sort this out. However, I also have a methodological question (#15) which could be key for the interpretation of the results.

      We hope that this manuscript is clearer now, especially with the additional data.

      My understanding is that queens live for 40-50 weeks max (Fig. S3). Fig. 4 suggests that from week 30 onwards the production of eggs, worker pupae and queen pupae decline. This suggests that while queen mortality declines in late life, so does queen reproduction. So, do queens of this species show reproductive senescence?

      Yes, they do experience reproductive senescence.

      The data do suggest that relative investment into reproduction (queen worker ratio) increases with age, but the absolute number of queens declines with age. This suggests an interesting result from the life-history theory perspective - increased investment in reproduction with reduced residual reproductive value, but not necessarily the absence of reproductive senescence. Please clarify.

      We hope this new version of the manuscript addresses clearly that ants queens do experience reproductive senescence and actuarial senescence, but only after late in life (after the peak of sexual investment is reached). Therefore, we state that senescence is delayed.

      Reviewer #2 (Public Review):

      The authors investigated the evolutionary drivers of delayed senescence in ant queens by carefully observing the survival and productivity of C. obscurior colonies that were maintained at 10, 20, or 30 workers. They show that the 10 worker treatment produces fewer new queens, and lower quality workers, indicating low colony efficiency under a reduced workforce. The authors focused their conclusions on the observation of a hump-shaped relative mortality curve, with queens having a higher than average mortality around 30 weeks and then a lower than expected mortality around 40 weeks. The colonies produced more queens at the end of their lifespan, so the authors conclude high fitness gains at the end of life selects for minimal senescence in ant queens, thus generating the drop in mortality they observed at 40 weeks.

      There is a large body of research focused on the early life stage and establishment of ant colonies, but relatively little that follows their worker and reproductive trajectory to the end of life. Partially, this is because many commonly studied ant species have a lifespan too long to feasibly track, and partially because most ant species do not readily produce sexual queens or males in the lab setting. For this alone, the study provides valuable insight into the ant lifecycle and demonstrates that C. obscurior is an ideal species for future study. The experimental design and analyses are sound, and I must acknowledge the incredible amount of work that must have gone into the data collection. However, I have some serious concerns about how the results are interpreted, and what is left out of the discussion on ant colony structure and limitations that are crucial to reaching accurate conclusions.

      One issue is that the conclusions hinge on the observation that relative queen mortality decreases at the latest observational period, around 40 weeks. The authors raise this as evidence that queens are under selection for reduced senescence, as they also conclude that fitness gains (queen production) are highest late in life. The problem is that according to figure S3, only a handful of queens survive past week 40, and they all manage to hang on for another month or two before dying out. I cannot be sure how many colonies survive to this period from how the data is presented, but I worry that the authors are resting their conclusion on a low number of particularly tenacious queens. These colony numbers should be provided, and the authors should demonstrate that the drop in mortality is observable even if these outliers are excluded.

      Fitness gains are highest late in life, and this is shown for all queens, regardless whether they are short- or long-lived. Therefore, selection is maintained until late in life. We calculate relative mortality as a function of age as in Jones et al. (2014), (Fig. 4.) As suggested by the first reviewer we also now include age-specific mortality of the best-model fitted using BaSTA and the estimated parameters in the supplement (Figure 4 - Figure supplement 1, Supplementary File 8 and 9). We have also included RNAseq data of queens near and middle-aged queens. The data support our conclusion of a delayed selection shadow, as age signs were not obvious in the middle-aged queens. This is in line with two studies (Wyschetzki et al. MBE 2015; Harrison GBE et al. 2021), where no signs of aging were found in middle-aged queens of the same species.

      It also appears that the queen pupae production drops off precipitously during the end of the observational period, according to figure 4A, which runs counter to the argument that selection is reducing senescence in these older queens because they have high reproductive output at this stage. The authors put a lot of emphasis on the queen/worker ratio being highest at the end of the observational period, but this doesn't necessarily mean queens are receiving the highest fitness during this period. A queen would have a high queen to worker production ratio if she lays one worker and one queen, but she would have higher fitness if she lays 100 workers and 10 queens. Figure 2A indicates that the highest overall queen pupae laying occurs around 30 weeks, which actually corresponds with the highest level of relative queen mortality. The question of fitness gains at advanced queen age would be better answered by just analyzing which stage in their life they produced the most queen pupae. Does the queen laying rate reach a maximum and remain stable for the rest of a queen's life, or does it decrease along with worker production as they reach end of life? Figure 4A makes it appear that it decreases towards end of life, but I'm not sure if that is only because so few colonies lasted until the end of the observational period.

      We have included that “This caste ratio shift does not occur because a drop of pupae production at the end of life. Actually, pupae production is at its highest just before death (Figure 2 - Figure supplement 1).” We added a figure with raw numbers of pupae produced at the end of life for the 99 tracked queens.

      Another factor that should be discussed is sperm depletion. The authors state that each queen mated with a single male when they set up the colonies, so sperm depletion may be more important than senescence for determining the reproductive lifespan of these queens. I'm not sure if this species is normally single mated in the wild, or the length of their natural colony lifespan, but this is important information to provide in order to dismiss issues of sperm depletion in this study. Without this information it is impossible to determine if the decrease in egg laying towards the end of the study is due to senescence or sperm depletion.

      Taken together, it could be argued that these data better support selection on an optimal lifespan, around 30 weeks, as opposed to selection for directional extended lifespan and reduced senescence. If the reproductive benefits of an extended lifespan are capped by sperm depletion, the alternative strategy would be to produce a robust workforce as quickly and efficiently as possible, and then produce as many sexual offspring as possible with the remaining sperm. Perhaps selection has determined that the optimal length of this cycle is around 30 weeks, with variation dependent on the amount of sperm transferred during mating and the condition of the queen. This possibility should be addressed, and if possible additional data should be provided on sperm depletion in C. obscurior, and the colonies that survived to the end of the observation period. Without these additions, the conclusions on senescence and lifespan remain tenuous.

      We now discuss in the manuscript that sperm depletion is not commonly seen in this species, and also occurred only once in this study (of the 99 colonies). All colonies were tracked until death. Therefore, there is no evidence of stabilizing selection to a lifespan of 30 weeks based on sperm depletion. This manuscript addresses the question of how is the “shape” of aging in this species, and not the “pace” (lifespan extension), but gives a hint on why extended lifespans should be favored.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors explore mechanisms in Pseudomonas aeruginosa involved in defending cells from T6SS-dependent attacks by other bacteria. Using a genome-wide Tnseq approach, the authors identify three gene clusters involved in this defensive response. They also report that these gene clusters are activated by the GacS/GacA/Rsm pathway. The authors also convincingly show that each of the three gene clusters encode proteins involved in the defence against specific toxins. Finally, one of the defence proteins is analyzed in more detail and found to prevent the accumulation of lysophospholipids generated by the Tle3 phospholipase toxin.

      I did not identify any weaknesses except that the manuscript is incredibly densely written, making it difficult to read.

      We are grateful to the reviewer for their positive assessment of our study. We regret any difficulty that our writing style may have caused and we thank the reviewer for providing this comment. We have incorporated revisions throughout the manuscript to improve readability.

  3. Jan 2022
    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript has great potential. The study is well designed, performed, and written, with good statistical analyses. On the other hand, it does not have a sufficient experimental basis. The authors investigated whole body immunoglobulin diversity in killifish and found that it decreases with age. This decrease is mostly driven by larger clones, in other words, by the expansion of B cell clones. They further analyzed immunoglobulin diversity in the intestine and found that its decrease is much more pronounced than in the whole body. It was also observed that the transfer of the young gut flora to old fish does not rejuvenate the B cell repertoire. The major novelty of this work is the model organism, killifish. Also, while this study is solid, it is descriptive, without many mechanistic insights.

      We thank the reviewer for their frank assessment of our manuscript, as well as for their helpful suggestions of possible ways to dig deeper into the phenomenon of killifish repertoire ageing. We agree with the assessment that this study is primarily descriptive in nature, and that experimental interventions – including infection challenge studies – would help establish causal mechanisms. Nevertheless, we have provided new data supporting an association between loss of repertoire diversity (see our response below) which we believe supports the biological relevance of our findings.

      While our initial submission demonstrates that the diversity of the killifish repertoire declines with age, it is true that this does not necessarily imply that this decline is linked to changes in immune functionality. To provide functional insights into the transcriptomic signature associated with different antibody diversity orders, we now include an analysis linking repertoire diversity data in our intestinal cohort to pre-existing intestinal RNA-seq data from the same individuals (Figure 6). The combination of these two data sets allows us to analyse changes in gene expression with respect to intestinal antibody diversity, controlling for age. We find that a number of immune-activity GO terms – including “B cell receptor signaling pathway”, “B cell proliferation”, and “lymphocyte activation” are significantly positively enriched with respect to repertoire diversity across multiple diversity orders. A decline in intestinal antibody diversity – as seen in ageing – is thus associated with a decline in B-cell immune activity in killifish. We acknowledge that confident demonstration of a causal link between repertoire diversity and immune state will require experimental challenge of host immunity, for example through infection experiments – something we will address in the future and is beyond the scope of this work. However, we believe these new data are sufficient to demonstrate a significant association between the two, supporting the biological relevance of the age-associated decline in diversity we observe.

      Some of the following experiments, or other experiments, may help explore mechanisms and make the study more compelling: 1) whole genome sequencing of lymphoid tissues and brain as a control, from the same old fish to determine whether there are clonal somatic mutations. If confirmed, it may be an important finding, as it would mean that clonal expansions emerge as fast as the killifish lifespan, and it would be a great model to study mechanisms of mutation accumulation and clonal selection with age. This WGS data may be further used to reconstruct immunoglobulin repertoires to understand if the whole-body decrease is driven solely by intestine B cells, or it initiates in lymphoid tissues.

      We agree that further investigation of primary repertoire development in killifish lymphoid organs would be a valuable direction for future work, and would help disentangle whole-body from intestinespecific repertoire changes. However, we believe our current analysis is sufficient to demonstrate the presence of clonal somatic mutations in the whole-body repertoire. The pRESTO/Change-O pipeline used in our analysis can distinguish heavy-chain sequences arising from different naive ancestors, and the presence of large clones in the killifish repertoire (see e.g. Supplemental Figure 5A) necessitates rapid clonal expansion. Ongoing work in our group is indeed directed at studying somatic DNA sequence variation across tissues during aging in killifish, including alternative experimental approaches to investigating killifish repertoire aging. We have now added a sentence about these further research directions to the manuscript discussion. However, we feel these further experiments may be beyond the specific scope Bradshaw et al. Point to point rebuttal of the present work, which is focused on high-level changes in killifish antibody repertoire composition with age.

      2) RNA sequencing of intestine samples or spleen from young versus old killifish to obtain insights into possible molecular mechanisms clonal expansion and diversity loss. Spleen RNA sequencing may be used to reconstruct the immunoglobulin repertoire. The authors used 750 ng of total RNA in the current study, so there should be enough material for RNA sequencing. As an alternative, single cell RNA sequencing may be performed.

      We certainly agree that investigation of repertoire aging in a wider array of immune organs, including spleen, would be highly valuable, and that killifish is a promising model organism in which to carry out these investigations. We have now included analysis of RNA-sequencing data from the killifish gut, which as discussed above supports an association between loss of repertoire diversity and immune function in that organ (see response to A.1). We hope for future work to more comprehensively explore the landscape of organ-specific repertoire ageing in the turquoise killifish; however, we feel that this would be beyond the scope of the present study.

      Reviewer #2 (Public Review):

      This study introduces the killifish as a short-lived vertebrate model for immune aging and immunosenescence and characterizes the changes in the immune-repertoire during aging. The authors convincingly show a decrease in diversity of the large expanded B-cell clones that is greater than small clones and a more pronounced change in the intestinal antibody repertoire with age. A limitation of the current study is its descriptive nature and lack of strong evidence that these animals truly experience functional immunosenescence. The impact of this work could be strengthened by functional data showing a decline in adaptive immunity that goes along with the loss of diversity in the antibody repertoire or citation and discussion of prior literature supporting this relationship. As it is, it is difficult to know the extent to which the observed changes are strongly correlated with changes in immune function, and the manuscript currently somewhat overstates the importance of the observations. It should be explicitly noted that further research is needed to determine whether the changes in immune-repertoire actually reflect immune senescence or simply changes with little or no consequence.

    1. Author Response:

      Thank you for taking the time to review the Digital Brain Bank, and for providing several suggestions to improve both the manuscript and website. We appreciate the positive comments surrounding our new resource, given the considerable effort that has been invested to date. Below, we provide a summary of the key changes that have been made to the Digital Brain Bank manuscript, reflecting the Editors’ and Reviewers’ suggestions.

      Resource Description

      We appreciate from the Reviewers’ comments that the description of the Digital Brain Bank as an “interactive data discovery and release platform” and a “cross-scale, cross-species investigation framework” does not reflect the current underlying functionality of the website. Although considerable effort has been made to enable users to visualise datasets directly on the website, the primary purpose of the Digital Brain Bank is a data release platform. We have adapted our wording to align with this, shifting the emphasis of the Digital Brain Bank as a data resource. We have additionally clarified the scope of datasets available in the resource, alongside the types of data available on the Digital Brain Bank website.

      Context of Resource

      The Reviewers noted that the original manuscript did not frame the Digital Brain Bank in the context of existing resources. In the revised manuscript, we have added a discussion of the Digital Brain Bank in terms of existing neuroimaging resources spanning multiple domains, including histology, transcriptomics, in vivo MRI & post-mortem MRI. We anticipate that the Digital Brain Bank will complement existing open-science initiatives in both human and non-human neuroimaging. We foresee the greatest overlap and integration between the Digital Brain Bank and existing in vivo and post-mortem MRI databases, where common signal-forming mechanisms facilitate comparisons.

      Web-based Image Viewer (Tview)

      Our web-based image viewer, Tview, provides visualisation of multi-scale (e.g. MRI & microscopy) data in a single 2D plane. This functionality was not readily available with existing viewers, requiring careful implementation due to the large size of the high-resolution microscopy datasets. The Reviewers note that Tview is only implemented for certain datasets in the first data release to the Digital Brain Bank. In the new manuscript we motivate this decision. Notably, several of the datasets in the first release are MRI-only. For these datasets, we found that a detailed static image was more suitable for visualisation.

      To further improve visualisation of these datasets, we are in the process of implementing a second web-based viewer to the Digital Brain Bank website, NiiVue. NiiVue is an open-source 3D volume viewer under active development. This will enable users to navigate 3D MRI datasets directly on the website, and supports overlays to localise the histology sampling location. These points are raised in the new manuscript, with an online NiiVue example available at https://niivue.github.io/niivue/features/overlay.multiplanar.html.

      Datasets

      Reviewer 1 raises that the Digital Anatomist and Pathologist categories have relatively few datasets. In the updated manuscript, we emphasise the uniqueness of the data available under these themes, and that the Digital Brain Bank represents one of the most substantial resources of its kind, providing data from 45 brains in total. We additionally provide further details of datasets which are intended for future release to the Digital Brain Bank. These are the Forget-Me-Not developing Human Connectome Project (dHCP) study - providing diffusion MRI datasets acquired in unfixed, post-mortem neonatal brains; BigMac dataset - providing in vivo MRI, post-mortem MRI, PLI and immunohistochemistry in a single, whole macaque brain; a cohort study combining multi-modal MRI and histology to investigate mouse models of ALS; and further primate species, alongside extensions into orders Carnivora and Rodentia.

      Corpus Callosum Analysis

      The corpus callosum analysis in Figure 3 has a small control cohort, and Reviewer 1 raises whether this analysis can produce meaningful results. We agree that the low number of controls and difficulty matching between groups is a major limitation of this analysis. Certainly, one would need to be cautious about interpreting any new observations based on our results. However, the purpose of this analysis was to demonstrate that we can use our data to replicate findings which have been previously reported in ALS (e.g. Chapman et al., 2014). This has been clarified in the new manuscript, alongside text to acknowledge the limitations of our analysis.

      MRI-Microscopy Registrations

      Co-registration between the MRI and microscopy data for the Human ALS MRI-Histology dataset is ongoing. As raised by Reviewer 1, coregistered MRI-microscopy datasets were previously available in only two brains. Since submission of the original preprint, we have additionally coregistered the PLP (myelin) staining data in multiple anatomical regions (5-8 regions per brain) for 13 brains in the Human ALS MRI-Histology dataset. These will now be available through the Digital Brain Bank.

      API, Metadata & Versioning

      Reviewer 3 raises that the resource is not currently designed for programmatic interactions or versioning. In the new manuscript we discuss why these are not yet implemented due to the current ad hoc nature of data access through signing MTAs via email. We have also taken the opportunity to outline our ambitions for incorporating these features in a future iteration of the Digital Brain Bank. Specifically, we intend on developing a new database to streamline data access and enable a programmatic interface. This database will perform user sign-up, authentication, and approval directly on the Digital Brain Bank website. This will enable approved users to access datasets directly on the website, which can readily incorporate stricter standards for linking data and dataset tracking.

    1. Author Response:

      Reviewer #2 (Public Review): Gaffield and Christie trained mice to an interval task of self-initiate bouts of licking to understand how the cerebellar activity relates to the organization of well-timed transitions to motor action and inaction during discontinuous periodically performed movements. Recording and optogenetically stimulating the activities of Purkinje cells, they concluded that the cerebellum encodes and influences the motor transitions, initiation and termination of discontinuous movements. The conclusion of the paper is very interesting and potentially provides insights on the neural mechanism of the previously proposed principle that the cerebellum controls the timings of discrete movements (Ivry et al. 2002). However, in the logic and interpretation to the conclusion I have concerns which they need to address. [Major comments]

      We thank the reviewer for their positive evaluation of our work and their helpful comments. We have substantially altered our manuscript to address their concerns, including an entirely new figure as well as additional supplemental figures.

      First, the activity of Purkinje cells can largely encode each bout of licking movements, in addition to initiation and termination of movements. Figure 2BCEF plays the peak of neural activity around the water time and Figure 2DG indicates the relationship between the neural activity and lick rate. The encoding of the initiation and termination alone cannot explain these observations. Related to this, none of the panels Figure 2BCEF shows a lead of the onset of neural activity to that of the lick rates (around -5 sec to water time). This looks inconsistent with the lead shown in Figure 3. The authors need to explain why such an inconsistency can happen.

      We agree that Crus I and II PCs encode parameters of licking bouts in addition to movement initiation and termination and deeply apologize for not making this point more clearly. To address this concern, we have extensively edited the text in several sections and have added an additional figure to emphasize the richness of the PC representation of behavioral attributes, beyond just initiation and termination alone. We disagree that there is an inconsistency in the lead times differences in our datasets. As the reviewer points out, the water-delivery-aligned firing rate z-scores do not seem to lead the licking rate (Fig. 2B-E). However, these data are averaged across trials with a high variance in the timing of lick initiation relative to water delivery; consequently, it is not possible to assess the timing of PC activity relative to lick bout initiation from these panels. When, by contrast, data are aligned to welldefined licking bouts (i.e., bouts with no licking in the preceding 2 s), it becomes clear that PC firing ramps up in advance of the bouts (Fig. 4C-D). We have edited the text, explaining this rationale, as requested by the reviewer.

      Second, the positive sign of neural modulation indicates biased recording sites. So far, many studies have been indicating the increasing firing modulation at the deep cerebellar nuclei in cerebellar timing tasks and motor tasks (e.g. Ten Brinke et al. 2017 eLIFE for the eyeblink conditioning; Ohmae et al. 2017 JNS for a self-initiate timing task; Becker and Person 2019 Neuron). Ramping-up modulation of Purkinje cells is not able to activate the deep cerebellar nuclei. When the motor-driving module generates negative modulation of Purkinje cells, the neighboring modules can generate positive modulation (e.g. Ten Brinke et al. 2017 eLIFE; De Zeeuw 2021 Nat Rev; Ohmae and Medina 2014 Soc. Neurosci. Abstr.). Because the neighboring modules are much wider than the motor-driving module, recording without identifying the driving modules, as in this study, will result in the recording being biased toward the adjacent modules.

      We too were surprised that we did not observe more negatively modulating PCs. However, our craniotomy was relatively large (>2 mm square) exposing an area over Crus I and II that encompassed zebrin bands 7+, 6-, and 6+. We randomly sampled PC activity within this region, so we don’t think our recordings were necessarily “biased”. We are unaware of any definite experiments showing whether positively and negatively PCs form separate, or convergent, channels of output onto their postsynaptic targets in the cerebellar nuclei. If convergent, then the response of the nuclear neurons will be determined by an ensemble of PCs with time varying signs of activity, in addition to the integration of the activity from pontine collaterals.

      We thank the reviewer for highlighting the developing idea of motor and non-motor cerebellar modules and the loops formed by their connectivity. We have edited our text to address how our recordings could fit into such an organizational scheme and have cited their recent unpublished preprint on this topic, now available on BioRxiv (Ohmae et al. 2021). However, we believe several considerations suggest that both positive and negative modulation of Purkinje cell firing rates will impact movement. (1) Large regions of the cerebellar cortex are capable of evoking or modulating movements when microsimulation is applied. Similarly, optogenetic suppression of IntA activity increases the outward velocity of reaching movements in mice (Becker & Person 2019). (2) In contrast with delay eyeblink conditioning, in which the motor output is an impulse-like twitch, rhythmic movements of the tongue (or, similarly, the limbs) require alternating recruitment and de-recruitment of muscles. Thus, motor commands will necessarily be multiphasic in time, and will tend to be out of phase for populations controlling antagonistic muscles. (3) Excitation of the DCN by collaterals of mossy fibers will likely modulate, and perhaps override, Purkinje cell inhibition. Therefore, further work will certainly be necessary to decipher exactly how potential antagonistic cerebellar modules participate organizing complex motor actions.

      Third, the authors used z scores for the unit of spike rate, but it is more appropriate to use spike per second as in Figure 3CD. In particular, I do not understand the meaning of difference of spike rate in the unit of z score in Figure 3E. The spike rate modulation in Figure 4E looks small which should be evaluated in the unit of spike per second as well. For the analysis of the last lick, the spontaneous spike rates should be displayed, instead of (or in addition to) the spike rate in the middle of lick bouts which should be much higher than the spontaneous spike rate according to Figure 2.

      We appreciate the reviewer’s input regarding style, but the current standard in the neurophysiology field is to report firing rate comparisons from a neural population as z-scores. Z-scoring is particularly useful because this metric provides a probability of an individual score occurring within a normal distribution, as well comparisons of different scores from different normal distributions; it also gives an indication of the raw score differs from the mean, information that isn’t available in spike rate comparisons alone. For these reasons, we elect to not change how we represent our data. However, we have modified our figures to report firing rates for traces from individual example cells as z-scoring is not appropriate for this purpose.

      Forth, I did not understand the conclusion for the optogenetic perturbation. In the result section for Figure 7, I think there is a logical gap between the last conclusion sentence and the sentences before it. The suppression of lick bouts in Figure 7D and the rebound induction in Figure 7G can be explained by the cerebellar contribution to each bout of lick movement (shown in Figure 2). I do not understand if these observations indicate the cerebellar contribution to the initiation and termination of a sequence of lick movements. Also, I have a concern about the location of stimulation sites. The stimulation may cover both the motor-driving module and neighboring modules, which makes the observations difficult to interpret because the stimulation is not specific to the positively modulating Purkinje cells.

      A lick bout is composed of a sequence of tongue protrusions and retractions performed at a highly regular rhythm. Apart from the first lick (Bollu et al., 2021), the motor command for this behavior is under the control of central pattern generators in the brainstem. Said another way, a lick bout is a continuous movement rather than series of discrete actions that are repeatedly started and stopped (they are like stepping during locomotion in some animals). Lick bout initiation and directional control of the bout can be commanded by the cerebral cortex. Given this organization, we do not believe our optogenetic experiment can be interpreted as an effect on the initiation and termination of individual licks because licks are not discrete actions when performed in a consummatory bout. However, based on the reviewer’s recommendation, we investigated how PCs encode information pertinent to individual licks in a bout (Figure 3). Although there was entrainment to individual lick cycles, there were no time-locked responses apparent in their average activity. Instead, there was a continuous mapping of the lick cycle across their population. Notably, licking rhythmicity was disrupted by the optogenetic perturbation, consistent with the influence of PC output on this movement parameter. We have edited the text to address these concerns.

      Fifth, For Figure 8, I had difficulty to understand what kind of activity of Purkinje cells can explain the shift of the peak timing of lick rate, because in the result sections of Figures 2-6 I could not find any activity encoding the peak timing of lick rate. For figure 8EFG, the analysis may not be correct. Because lick onset can be delayed with the photostimulation, in Figure 8E the boundary of onset corresponding to the 1s in control should 1+alpha in stimulation trials to correctly pick up the corresponding trials. Because we do not know the exact values of alpha, I think this analysis is not possible.

      PC ramping activity may contribute to the vigor of the ensuing licking response which would dictate peak licking rate timing. In fact, in many individual PCs, we observed correlations between PC firing and lick rate indicating a relationship. However, this was not borne out in the population response, so we did not pursue it further.

    1. Author Response:

      Reviewer #1 (Public Review):

      The underlying data are dominated by data from the UK Biobank, which means that, in effect, only few samples for the 25-50 age group are available. This may not be a big issue in terms of estimating smooth trajectories, but may limit comparisons to the reference model in certain cases (e.g. early disease onset) where this age range may be of particular interest.

      We show per site evaluation metrics, cross validation, and additional transfer examples. These additional analyses show that the model performance is not driven solely by the UKB sample. However, we agree with this comment and have also updated the limitation section (in the Discussion) regarding the overrepresentation of UKB and included a statement regarding the known sampling bias of UKB.

      The manual QC data is somewhat limited as it is based on a predominantly younger cohort (mean age ~30yrs). Furthermore, the number of outcome measures (cortical thickness and subcortical volume) and the number of data modalities (only structural MRI) are limited. However, as the authors also state, these limitations can hopefully be addressed by incorporating new/additional data sets into the reference models as they become available.

      We have added further details regarding the quality checking procedure to the methods section and improved the clarity of directions for implementing the scripts, including an interactive link to view an example of the manual QC environment, on the QC GitHub page to enable others to reproduce our manual QC pipeline.

      Reviewer #2 (Public Review):

      1. The evidence that the model will generalize ("transfer" as per the authors) to new, unseen sites, is very limited. To robustly support the claim that the model generalizes to data from new sites, a cross-validation evaluation with a "leave-one-site-out" (or leave-K-sites-out) folding strategy seems unavoidable, so that at each cross-validation split completely unseen sites are tested (for further justification of this assertion, please refer to Esteban et al., (2017)). The "transferability" of the model is left very weakly supported by figures 3 and 4, which interpretation is very unclear. This point is further developed below, regarding the overrepresentation of the UK Biobank dataset.

      We thank the reviewers for this suggestion and have addressed the concern regarding generalizability in several ways. First, we ran an additional 10 randomized train/test splits of the data in the full sample. These new analyses show the stability of our models, as there is very little variation in the evaluation metrics across all 10 splits. These results are visualized in Figure 3 – Supplement 2. However, the static Figure 3 – Supplement 2 is challenging to read, simply because there are many brain regions fit into a single plot. Therefore, we also created an online interactive visualization tool that shows the image of the brain region and the explained variance when you hover over a point (see the screenshot of the online tool below). This interactive visualization was created for all supplemental tables for easier exploration and interpretations and we now recommend this tool as the primary method to interrogate our findings interactively. Second, we updated and expanded the transfer data set to include 6 open datasets from OpenNeuro.org (N=546) and we provide this example dataset on our GitHub with the transfer code. This simultaneously provides a more comprehensive evaluation of the performance of our model on unseen data and more comprehensive walk-through for new users applying our models to new data (sites unseen in training). Finally, we added per-site evaluation metrics (Figure 3 – Supplement 3) to demonstrate that performance is relatively stable across sites and not driven by a single large site (i.e., UKB). As site is strongly correlated with age, these visualizations can also be used to approximate model performance at different age ranges (i.e., 9–10-year-old performance can be assessed by looking at ABCD sites evaluation metrics, and 50–80-year-old performance can be assessed by looking at UKB evaluation metrics). Moreover, we would also like to emphasize that we should not expect that all sites achieve the same performance because the sampling of the different sites is highly heterogeneous in that some sites cover a broad age range (e.g., OASIS, UKB) whereas other sites have an extremely narrow age range (e.g., ABCD).

      1. If I understand the corresponding tables correctly, it seems that UK biobank data account for roughly half of the whole dataset. If the cross-validation approach is not considered, at the very (very) least, more granular analyses of the evaluation on the test set should be provided, for example, plotting the distribution of prediction accuracy per site, to spot whether the model is just overfitted to the UKB sample. For instance, in Figure 4 it would be easy to split row 2 into UKB and "other" sites to ensure both look the same.

      We have addressed this comment in response to Reviewer 1 above.

      1. Beyond the outstanding work of visually assessing thousand of images, the Quality Control areas of the manuscript should be better executed, and particularly lines 212-233): 3.a. The overall role of the mQC dataset is unclear. QC implies a destructive process in which subpar examples of a given dataset (or a product) are excluded and dropped out of the pipeline, but that doesn't seem the case of the mQC subset, that seems a dataset with binary annotations of the quality of the FreeSurfer outcomes and the image.

      We have addressed this in response to Reviewer 1 above. We included the manual QC in this work, because in prior work by our group (https://www.biorxiv.org/content/10.1101/2021.05.28.446120v1.abstract) that leveraged big data and relied on automated QC, reviewers often criticized this approach and claimed our results could be driven by poor quality data. Thus, in this work we wanted to compare the evaluation metrics of a large, automated QC data set with the manual QC dataset to show very similar performance.

      3.b The visual assessment protocol is insufficiently described for any attempt to reproduce: (i) numbers of images rated by author SR and reused from the ABCD's accept/reject ratings; (ii) of those rated by author SR, state how the images were selected (randomly, stratified, etc.) and whether site-provenance, age, etc. were blinded to the rater; (iii) protocol details such as whether the rater navigated through slices, whether that was programmatic or decided per-case by the rater, average time eyeballing an image, etc; (iv) rating range (i.e., accept/reject) and assignment criteria; (v) quality assurance decisions (i.e., how the quality annotations are further used)

      These details have been added to the methods section where we describe the manual QC process. We have also updated the QC GitHub with more detailed instructions for using and include a link to view an example of the manual QC environment.

      3.c Similarly, the integration within the model and/or the training/testing of the automated QC is unclear. The responses to Reviewer 1 above and our revisions to the methods section should also clarify this. In brief, QC was performed on the data prior to splitting of the data to assess generalizability.

      Additional comments

      • Repeated individuals: it seems likely that there are repeated individuals, at least within the UKB and perhaps ABCD. This could be more clearly stated, indicating whether this is something that was considered or, conversely, that shouldn't influence the analysis. We have clarified in the methods section that no repeated subjects were used in the dataset.
      • Figure 3 - the Y-axis of each column should have a constant range to allow the suggested direct comparison. We have changed Figure 3 to have a constant range across all test sets.
      • Tables 5 through 8 are hard to parse - They may be moved to CSV files available somewhere under a CC-BY or similarly open license, and better interpreted with figures that highlight the message distilled from these results.

      We agree with the reviewer about the difficulty in summarizing such a large number of results in an easily digestible manner and that tables are not the optimal format to achieve this. Therefore, we have created interactive visualizations for Tables 5-8 that make exploring the evaluation metrics much easier. All the CSV files are also hosted on our GitHub page in the metrics folder (https://github.com/predictive-clinical-neuroscience/braincharts/tree/master/metrics).

      • Lines 212-214 about the QA/QC problem in neuroimaging are susceptible to misinterpretation. That particular sentence tries to bridge between the dataset description and the justification for the mQC sample and corresponding experiments. However, it fails in that objective (as I noted as a weakness, it's unclear the connection between the model and QC), and also misrepresents the why and how of QC overall.

      We have considerably expanded upon our motivation for using a manual QC approach and the steps this entails, which should address this issue.

      • The fact that the code or data are accessible doesn't mean they are usable. Indeed, the lack of license on two of the linked repositories effectively pre-empts reuse. Please state a license on them. We thank the reviewer for this suggestion. We have updated both repositories to include a license file.
      • Figure 1 - caption mentions a panel E) that seems missing in the figure.

      We have corrected this mistake in the caption of Figure 1.

      • There is no comment on the adaptations taken to execute FreeSurfer on the first age range of the sample (2-7 yo.).

      We did not make adaptations of the Freesurfer pipeline for this age range and have added this to the limitation section.

      • Following up on weakness 3.c, while scaling and centering is a sensible thing to do, it's likely that those pruned outliers actually account for much of the information under investigation. Meaning, EC is a good proxy for manual rating - but Rosen et al. demonstrate this on human, neurotypical, adult brains. Therefore, general application must be dealt with care. For example, elderly and young populations will, on average, show substantially more images with excessive motion. These images will go through FreeSurfer, and often produce an outlier EC, while a few will yield a perfectly standard EC. Typically, these cases with standard ECs are probably less accurate on the IDPs being analyzed, for example, if prior knowledge biased more the output for the hidden properties of this subject. In other words, in these cases, a researcher would be better off actually including the outliers.

      This is an important point to raise. We agree with the reviewer that the Euler Characteristic is likely correlated with pathology in addition to data quality (e.g., due to movement artefacts) and this is important to consider when modeling clinical populations but also ensure high quality data. First, we point out that the inclusion threshold is mostly important for the estimation of the normative model, which in our work – like Rosen et al – is based on healthy control data. It is easy to repeat predictions for subsequent clinical samples using a more lenient inclusion threshold (or none at all) in cases where this consideration might be operative. Second, in an effort to strike the right balance, we have chosen the EC threshold quite conservatively in that it excludes subjects that are very far into the tail of the (rescaled and centered) EC histogram. This means that we are likely dropping only subjects with true topological defects. This is also an important motivation for the careful manual QC procedures we describe above. That said, we acknowledge that any heuristic is necessarily imperfect, which we acknowledge in the limitations section and in the methods.

      • Title: "high precision" - it is unclear what precision this is qualifying as high. Is it spatial effectively granularity for a large number of ROIs being modeled or is it because the spread of the normative charts is narrow along the lifespan and as compared to some standard of variability.

      We refer to spatial precision in terms of the granularity of the regions of interest that we estimate models for. We have revised the manuscript throughout to make this more explicit.

    1. Author Response:

      Reviewer #1:

      The authors present an interesting concept for the mechanism of rash induction in EGFR inhibitor (EGFRi) treated rats. EGFRi causes production of pro-inflammatory factors in epidermal keratinocytes which may induce dedifferentiation and reduction of the dWAT compartment, presumably mediated via PPAR. Factors produced by dedifferentiated FB then recruit monocytes thereby inducing skin inflammation. This work is aiming to improve targeted cancer therapy efficiency and is therefore of potential clinical relevance.

      However, most of the conclusions drawn by the authors are based on correlations, e.g. between the amount of dWAT and rash intensity. Mechanistic data have been mainly generated in vitro. The exact order of events to formulate a definitive mechanistic proof in vivo for this hypothesis is missing. In particular, it is not clear which cells in the skin, apart from keratinocytes, are specifically targeted by EGFR inhibitors and/or by Rosiglitazone. The authors also do not show EGFR staining in adipocytes and its inhibition by Afa. The effects of Afa and Rosi on monocytes / macrophages are completely ignored by the authors. Additionally, some of the presented results are overinterpreted and not really supporting what is claimed.

      Most importantly, the whole study is based on inhibitor treatments. Afatinib for example is not only inhibiting EGFR but all other erbB family members and as such it represents a panErbB inhibitor and it is not clear whether the observed effects are induced by inhibition of EGFR of other erbB receptors which have been shown to have also effects in the skin. For further specification of the role of EGFR, other, more specific inhibitors should be used to confirm the basic concept along with genetic proof either in genetically engineered mice or by Crispr-mediated-deletion.

      To further support the hypotheses of the authors, the study needs to be further substantiated by mechanistic experiments and the clinical relevance should be strengthened by performing histologic analysis of skin samples of patients treated with EGFRi and respective analysis of rash and e.g. BMI etc.

      Thanks for your positive comments on the potential impact for cancer patients suffering EGFR inhibitor induced skin rash. We have carefully considered all comments from the reviewer and revised our manuscript accordingly. In the following section, we summarize our responses to each comment of the reviewer. We believe that our responses have well addressed all concerns from the reviewer.

      We agree with the reviewer’s comment that our research may need more direct mechanistic in vivo studies upon our in vitro results. In our research, we have collected evidence from previous studies and used various in vitro and ex vivo experiments to investigate our findings. However, the study was still limited by currently available technologies.

      In the revised version, we supplemented the pEGFR and pERK staining of adipocytes in Figure 3-figure supplement 1C. The levels of phospho-EGFR and ERK in dWAT were significantly decreased after EGFRi treatment.

      This study was inspired by the observations of the unusual dWAT reduction during EGFRi treatment, thus we focused on the investigation of dermal adipocytes. In addition, the roles of mastocytes, monocytes, and macrophages in EGFRi-induced cutaneous toxicity have been thought as responders to increased expressions of cytokines. Local depletion of macrophages and degranulated mastocytes just provided partial resolution, indicating a multifactorial and complicated pathology of cutaneous toxicity induced by anti-EGFR therapy(Lichtenberger et al., 2013; Mascia et al., 2013).

      In terms of some inappropriate descriptions, we agree with the reviewer that they will be more convincing if there is a direct assessment from genetically engineered mice. For example, we tried to establish the relationship between S. aureus infection and EGFRi-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRi-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected could verify the causal relationship between EGFRi-induced dWAT reduction and S. aureus infection to some extent. However, the limitation of the technology is an obstacle for us to provide more evidences. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      According to the clinical evidence, the rash can also be induced by many specific Erbb1 inhibitors. All three generations of EGFR inhibitors in the clinic have very high incidence rates of cutaneous toxicity (Supplementary file 1). In the revised version, we provided rash models induced by both first-generation EGFRi, Erlotinib, Gefitinib, and the third-generation EGFRi, Osimertinib. As shown in Figure 1-figure supplement 1D, the rash caused by Erlotinib, Gefitinib, and Osimertinib had the same phenotypes as Afatinib-induced rash.

      In summary, the current form of evidences should support our findings, even more direct mechanistic studies would be better. We are now seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. We also plan to have cooperation with hospitals to explore the clinical evidence of patients receiving EGFR inhibitors.

      References:

      Lichtenberger BM, Gerber PA, Holcmann M, Buhren BA, Amberg N, Smolle V, Schrumpf H, Boelke E, Ansari P, Mackenzie C, Wollenberg A, Kislat A, Fischer JW, Röck K, Harder J, Schröder JM, Homey B, Sibilia M. 2013. Epidermal EGFR controls cutaneous host defense and prevents inflammation. Sci Transl Med 5.

      Mascia F, Lam G, Keith C, Garber C, Steinberg SM, Kohn E, Yuspa SH. 2013. Genetic ablation of epidermal EGFR reveals the dynamic origin of adverse effects of anti-EGFR therapy. Sci Transl Med 5.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

      Reviewer #2:

      Leying Chen et al. investigated the mechanism of EGFR inhibitor-induced rash. They find that atrophy of dermal white adipose tissue (dWAT), a highly plastic adipose tissue with various skin-specific functions, correlates with rash occurrence and exacerbation in a murine model. The data indicate that EGFR inhibition induces the dedifferentiation of dWAT and lipolysis , finally lead to dWAT reduction which is a hallmark of the pathophysiology of rash. Notably, they demonstrate that stimulating dermal adipocyte expansion with a high-fat diet (HFD) or the pharmacological PPARγ agonist rosiglitazone (Rosi) ameliorated the severity of rash. Therefore, PPARγ agonists may represent a promising new therapeutic strategy in the treatment of EGFRI-related skin disorders pending to be confirmed in further study.

      We greatly appreciate the reviewer for giving the above positive comments.

      The conclusions of this paper are mostly well supported by data, but some results need to be clarified and verified.

      1) PPAR signaling in the pathology of EGFRI-induced skin toxicity. In figure 2 , the results show Rosi reversed the dedifferentiation of dermal adipocytes induced by Afa. This may due to PPARγ upregulation but not be confirmed in the results. The relative genes expression in dWAT after treated with Afa and ROSi were not demonstrated in the results.

      We thank the reviewer for reminding us for additional experiment of PPARγ. In the revised version, we collected attatched-dWAT after 5-day Afa or Rosi treatment, and performed transcriptional experiment of Pparg. The expression level of Pparg was downregulated by Afa treatment and upregulated by Rosi treatment (Figure 2-figure supplement 1D).

      2) the effect of PPAR signaling on PDGFRA-PI3K-AKT pathway The AKT pathway is a key downstream target of EGFR kinase, so it is reasonable to see p-AKT1 and p-AKT2 levels were decreased by Afa (figure 3C) However, addition of Rosi to Afa significantly activated both AKT1 and AKT2 . What is the underlying mechanism for the results and whether it is related to the PPAR signaling pathway.

      Given the importance of the PI3K/AKT pathway in regulating AP and mature adipocyte biology(Jeffery et al., 2015), we used p-AKT to characterize the activation of dFBs. The mechanism of how modulating PPARγ affects AKT is still unknown. One study found that MAPK and PI3K are upregulated and activated by rosiglitazone that in turn might enhance adipogenesis(Fayyad et al., 2019). In skeletal muscle, PPARγ enhances insulin-stimulated PI3K and Akt activation(Marx et al., 2004). It is also reported rosiglitazone has a neuroprotection effect against oxidative stress. The PPARγ-rosiglitazone complex binds to the neurotrophic factor-α1 (NF-α1) promoter and activates the transcription of NF-α1 mRNA which is then translated to the protein. NF-α1 binds to a cognate receptor and activates the AKT and ERK pathways(Thouennon et al., 2015). Thus, further studies should be carried out to investigate the effects of rosiglitazone to PI3K/AKT pathway on adipogenesis.

      3) According to figure 3 F , 3G and 3H., authors draw a conclusion that " a lack of APs and mature dWAT impairs the maintenance of the host defense and hair growth in the skin" In my opinion, there are no results can directly prove this. According to figure 3H, the impairment of hair growth may be caused by EGFR inhibition of hair follicles.

      We appreciate the reviewer for pointing this important point out. We tried to establish the relationship between S. aureus infection and EGFRI-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRI-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected depending on the current technology could verify the causal relationship between EGFRI-induced dWAT reduction and S. aureus infection to some extent. However, we agree with the reviewer that this conclusion needs more direct evidence. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      Since recent reports have shown that dermal adipocytes have the capacity to support hair regeneration, we used this conclusion to characterize the function of dWAT. However, we agree with the reviewer that it needs more specific and direct experiments to verify the causality with dWAT. And we are seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. In the revised manuscript, we also adjusted the statements.

      4) EGFRI stimulates keratinocytes (HaCaT cells) to produce lipolytic cytokines (IL-6) (Figure 4G). IL6 enhanced the lipolysis of differentiated dFB (Figure S4M) and C18 fatty acids were supposed to be released the cell matrix during lipolysis. In figure 4H, HaCaTcells supernatants and dFB supernatants were collected. IL-6 was supposed to increase in HaCaTcells supernatants and was confirmed in Figure 4SK and S4L.However, C18 fatty acids were not showed to be in the dFB supernatants in the study directly.

      We thank the reviewer for pointing this out. We conducted additional lipidomics of dFB supernatants. However, because the differentiation medium needs to be changed every two days, it is hard to accumulate enough FFAs. We collected supernatants on Day3, Day 6, and Day 9. They were all below the detection limit of mass spectrum. We agree with the reviewer that more evidences are needed to prove the correlation between C18 FFAs and lipolysis. Therefore, we performed a mass spectrometry analysis of skin tissues from Ctrl and Afa groups after 3-day treatment to confirm the releasing of C18 FFAs. The result showed an increased tendency of C18:2 and other FFAs in the Afa group (Figure 1 in response letter). However, this increase had no significant statistic difference. This might be due to the interference of sebaceous gland and dermal adipocytes. In consequence, we adjusted the descriptions in the revised manuscript to make this statement not that strong.

      Figure 1. C18 concentrations in skin tissues from Ctrl and Afa groups after 3-day treatment. n=3.

      References:

      Fayyad AM, Khan AA, Abdallah SH, Alomran SS, Bajou K, Khattak MNK. 2019. Rosiglitazone Enhances Browning Adipocytes in Association with MAPK and PI3-K Pathways During the Differentiation of Telomerase-Transformed Mesenchymal Stromal Cells into Adipocytes. Int J Mol Sci 20.

      Jeffery E, Church CD, Holtrup B, Colman L, Rodeheffer MS. 2015. Rapid depot-specific activation of adipocyte precursor cells at the onset of obesity. Nat Cell Biol 17:376–385.

      Marx N, Duez H, Fruchart J-C, Staels B. 2004. Peroxisome proliferator-activated receptors and atherogenesis: regulators of gene expression in vascular cells. Circ Res 94:1168–1178. Thouennon E, Cheng Y, Falahatian V, Cawley NX, Loh YP. 2015. Rosiglitazone-activated PPARγ induces neurotrophic factor-α1 transcription contributing to neuroprotection. J Neurochem 134:463–470.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

    1. Author Response:

      Reviewer #2 (Public Review):

      This manuscript by Barton and colleagues explores the roles of the conserved Eco1 transacetylase in modulating cohesin function in meiosis in budding yeast. Numerous studies in mitotically dividing cells have shown that the Eco1 family of transacetylases acetylate the Smc3 subunit of cohesin and that this acetylation renders cohesin on chromosomes resistant to removal by the Wapl (Wpl1 in budding yeast) family of proteins. Cohesins play critical roles in both sister chromatid cohesion and chromatin organization (through the formation of intrachromosomal loops). How cohesins are regulated by Eco1 in meiosis to accommodate meiotic chromosome structures such as the synaptonemal complex, chromatin domains around centromeres, repair of programmed meiotic double strand DNA breaks in prophase, and sequential removal of cohesins - first at arms in meiosis I and centromeres at meiosis II - is largely unexplored. Thus, this manuscript is exploring important new areas.

      The authors show that Eco1 persists thru prophase I (longer than it does in vegetative cell cycles), that it is not necessary for cohesin loading at centromeres but is needed to counteract Wpl1 to protect centromeric cohesion, that it is critical for the establishment of chromatin loops on meiotic chromosome arms and that it is critical for protection of the arm cohesin from removal by Wpl1. The authors also provide evidence that, in meiosis, Wpl1 exhibits underappreciated functions in cohesin loading or cohesion establishment in addition to its recognized role in cohesin removal.

      The experiments demonstrate that Eco1 is necessary for sharp cohesin boundaries that flank the centromeres and suggest this might be a replication-independent function of Eco1 (the boundaries form in clb5, clb6 cells with no DNA replication phase) but it is unclear if the detectable, but diminished, boundaries in clb5,clb6 cells were formed in the replication-free meiosis or presist from the S-phase associated loading and cohesion establishment from the preceding mitotic cycle.

      Entry into meiosis occurs from G1 when there is no cohesin on the chromosomes and boundaries are not present, therefore this would only be a concern if there were persistent mitotic cells in G2 (i.e. after DNA replication). Our flow cytometry shows that the cells used in the experiment were unreplicated, so even if mitotic cells were present, they would not have been through S phase.

      Nevertheless, we addressed this point by analysis of pre-S phase meiotic cells (ime1/ime4 block) and by anchoring away Eco1 in unreplicated cells.

      Immunofluorescence imaging assays are used to observe the behavior of sister chromatids in meiosis I and meiosis II as a function of Eco1 activity. In wild-type cells sister chromatids co-orient in meiosis I and move to the same pole of the spindle. In mammalian cells and fission yeast this co-orientation requires cohesin while studies in budding yeast have suggested the co-orientation is cohesin-independent. Here, the authors show that when Eco1 is depleted, the sisters often move to opposite poles at meiosis I, and suggest that cohesin (and Eco1) is indeed required for sister co-orientation. An alternate possibility is that the sisters have lost their association in meiotic prophase (due to cohesin failures) before attaching to microtubles and segregating randomly - often to opposite poles.

      We agree with this point, but would argue that the “alternative possibility” (which our data support) still leads to the conclusion that cohesin and Eco1 are required for sister co-orientation. A prior study (Monje-Casas et al., 2007) had suggested that monopolin could link sister kinetochores even without cohesin. We now show that this is not the case, which we believe to be an important conclusion.

      Our results indicate that establishment of monoorientation requires the cohesin that is localized at centromeres. WPL1 deletion in eco1-aa rescues centromeric cohesion (Figure 2F, Figure 8E), but not chromosome arm cohesion (Figure 2H) or sister chromatid segregation in meiosis II (Figure 8F), indicating that pericentromeric cohesion must still be defective.

      For clarity, please note that the relevant data is not immunofluorescence, but live cell imaging (now shown in Figure 8) so these conclusions are based on observation of single chromosomes in individual live cells from prophase I until anaphase II.

      In summary the authors show that Eco1 has distinct roles on chromosome arms and centromeres and probably in both replication-linked and replication-independent events, acts to modulate cohesin location and function in meiosis.

      Reviewer #3 (Public Review):

      This paper investigates the meiotic roles of two regulators of cohesin, the cohesin destabilizer Wpl1 and the cohesin acetyltransferase Eco1. The authors provide evidence that Eco1 antagonizes Wpl1 to allow stabilization of centromeric cohesin, which is important to establish meiotic chromosome segregation patterns. In addition, Eco1 regulates the stable anchoring of cohesin at boundaries to promote defined chromosome loop formation in meiotic prophase.

      The study uses a combination of calibrated ChIP-seq analysis, and chromosome conformation capture techniques to convincingly show that loop formation is altered in wpl1 depletion and eco1 depletion mutants. Well-established cytological techniques are used to demonstrate different effects on chromosome cohesion along arms and at centromeres, and to show that Eco1 is important for establishing the meiotic segregation pattern. The paper is well written and the data largely support the conclusions. As such, this paper is expected to be of substantial interest to the field.

      One notable weakness is the poor definition of the eco1 anchor-away allele (eco1-aa), on which much of the eco1 phenotypic analysis is based. The presented data indicate that addition of the FRB-GFP tag alone causes most of the phenotypes, regardless of nuclear depletion. It is well possible that the tag creates a meiosis-specific loss-of-function allele, although it is surprising that the tag does not have mitotic defects even though Eco1 presumably has the same substrate (the cohesin subunit Smc3) in both situations. Encouragingly, some of the phenotypes could be confirmed using a non-acetylatable smc3 mutant. However, the tag may also create neomorphic effects that may contribute to the Wpl1-independent effects and the apparent stronger defects of the eco1-aa allele compared to the non-acetylatable smc3 mutant.

      Available evidence suggests that eco1-aa is a loss of function allele.

    1. Author Response:

      Reviewer #3 (Public Review):

      In this study Borg et al. explore the mechanism of PcTx1 inhibition of ASIC1a using TEVC fluorometry. They detected a robust change of a fluorescence signal when PcTx1 was added, and based on this finding, propose that the toxin has three different binding modes: 'loos', 'global' and 'ECDonly'. In addition, using concatamers they conclude that damage of a single PcTx1 binding site out of the three sites present in ASIC1a destabilizes the conformational changes but disruption of two or three binding sites is required to prevent PcTx1-mediated inhibition.

      The main weakness of the study is the lack of additional experiments to confirm that the proposed three PcTx1 binding modes are actually happening.

      We thank the reviewer for the constructive feedback on our work. While PcTx1 modulation of ASIC is no doubt complex and other methods might reveal additional binding modes in the future, we believe that our contribution has indeed provided insights that go beyond the knowledge gained from functional experiments using only electrophysiology experiments, as well as structural efforts. The strength of the VCF method lies in the simultaneous measurement of function and conformation. Here, we have used VCF to uncover three distinct conformational states, of which only one was previously known. We have now included several new experiments, along with changes to the text that hopefully alleviate some of the concerns regarding the existence of the binding modes. Further, we have included additional text in the discussion to acknowledge that other methods might uncover additional or distinct binding modes in the future.

    1. Author Response:

      Reviewer #1 (Public Review):

      Wang et al., investigated the role of RNA m6A modification in intestinal epithelial cells (IECs) in the context of rotavirus infection. The authors found that the mice which specifically lacks METTL3 in IECs show resistance to rotavirus infection. They attributed this effect to increased IFN and ISG expression presumably via IRF7 upregulation. Further genetic IRF7 ablation in IECs led to the sensitivity rotavirus infection. They also found that ALKBH5 is suppressed by a rotaviral protein, although the knockout of ALKBH5 in IECs did not influence viral infection.

      Overall, although the resistance of IEC-specific METTL3-deficient mice upon rotavirus infection via the control of IRF7 is a novel and interesting finding, the proposed model is not fully supported by the findings here. Especially, the following points need to be addressed:

      We are grateful to the reviewer for the complimentary summary of our research. We also appreciate the valuable experiments suggested by the reviewer to improve our manuscript. We have added additional important controls and mechanistic data to further support our conclusions.

      1) The m6A dot blot used in Figure 1 is not a good measurement system of total m6A modification levels, because the antibody used here also detects other RNA modification, m6Am (PMID: 31676230). Therefore, it is unclear if the increase of m6A dot blot intensity is due to the increase of m6A in RNAs mediated by METTL3 in IECs. The authors should investigate the m6A levels in IECs, not BMDMs, under METTL3 deficiency. Ideally, this analysis should be done using mass spectrometry.

      We thank the reviewer for raising a critical point. We have tried several methods to avoid the potential non-specific detection of the previous antibody (Synaptic System, #202003) we used, which was reported to detect m6Am as well.

      1.We have included Dot Blot data for m6A modification in Mettl3^△IEC and WT IECs during RV infection by using another m6A antibody (Anti-N6-methyladenosine (m6A), Sigma-Aldrich, Cat. No. ABE572-I). (see below and also Fig. 1d, 1e)

      2.We have included mass spectrometry data for m6A modification in IECs during development (see below and also Fig. 1c) or RV infection (see below and also Fig. s3a).

      These data suggested m6A modifications in IECs are indeed regulated during the development or RV infection. We have included the descriptions in the text.

      Figure 1. Rotavirus infection increases global m6A modifications, and Mettl3 deficiency in intestinal epithelial cells results in increased resistance to rotavirus infection. (c) MS analysis of m6A level in ileum tissue from mice with different ages. (mean ± SEM), Statistical significance was determined by Student’s t-test (\P < 0.05, NS., not significant). (d) WT and Mettl3^△IEC mice were infected by rotavirus EW strain at 8 days post birth. m6A dot blot analysis of total RNA in ileum IEC at 2 dpi. Methylene blue (MB) staining was the loading control. (e) Quantitative analysis of (d) (mean ± SEM). Statistical significance was determined by Student’s t-test (*P < 0.05, ***P<0.001, NS., not significant). The quantitative m6A signals were normalized to quantitative MB staining signals.*

      Figure s3. MS analysis of total m6A level in mice ileum. (a) WT and Mettl3 △IEC mice were infected by rotavirus EW strain at 8 days post birth. MS analysis of m6A level in ileum tissue from mice at 2 dpi (mean ± SEM), Statistical significance was determined by Student’s t-test (\*P < 0.005)*

      2) The authors show that Alkbh5 expression is increased when the mice grow up to 3 weeks old. However, the Alkbh5 protein expression changes are missing.

      We thank the reviewer for raising this point. We have included the protein expression of ALKBH5 in intestine during the development (see below and Fig. s1). The ALKBH5 protein levels are increased in the intestine along with the age (Fig. s1a, s1b), which is consistent to the changes of mRNA levels of ALKBH5 during the development (Fig. 1d).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (a) Immunoblotting with antibodies target ALKBH5 and TUBULIN in ileum tissues from mice with different ages. (b) Quantitative analysis of (a) (mean ± SEM), Statistical significance was determined by Student’s t-test (\P < 0.05, NS., not significant).*

      3) The authors claim that m6A declined from 2 to 2 weeks post birth is caused by increased Alkbh5 (Line 110). However, it is not clear if the subtle increase in Alkbh5 mRNA leads to the change in global m6A levels. The author can use ALKBH5-deficient mouse cells to confirm this point.

      We thank the reviewer for pointing out an important point. We have included the ALKBH5 over-expression or knock-down data in a mouse IEC cell line MODE-K, to test whether the regulation of Alkbh5 mRNA in IECs leads to the change in global m6A levels.

      Over-expression of ALKBH5 in MODE-K cells largely reduced the global m6A level (see below and Fig. s1d). 1. Crispr-mediated knock down of ALKBH5 in MODE-K cells augmented the global m6A level while knock down of another m6A eraser FTO in MODE-K cells didn’t affect the global m6A level (see below and Fig. s10b).

      Figure s1. ALKBH5 regulate total m6A level in intestine. (d) Immunoblotting with antibodies target ALKBH5 and TUBULIN in MODE-K cells transfected with pSIN-EV or pSIN-mAlkbh5-3xFlag for 24h. m6A dot blot analysis of total RNA in indicated samples. Methylene blue (MB) staining was the loading control.

      Figure s10. Alkbh5 is the dominant m6A eraser in intestine. (b) m6A dot blot analysis of total RNA in different MODE-K cells. Methylene blue (MB) staining was the loading control.

      4) The authors should describe the overall phenotype of IEC-specific METTL3-deficient mice at the steady state. It is important to clarify if the augmented expression of ISG upon METTL3 deficiency is dependent on rotavirus infection. Also, the authors should describe any detectable abnormalities or changes without stimulation.

      We actually collaborated another group and found there is a defect in intestinal stem cells in IEC-specific METTL3-deficient mice. However, as RV normally infected IECs in the villi but not in the crypt, and stem cells are not the major producers of IFN/ISGs (Sue E. Crawford et al. Nature reviews disease primers, 2017). The defect in intestinal stem cells will less likely affect the RV infection phenotype. As it is another story that are under review, we tend to not include this part of the data in our manuscript. Moreover, we have crossed Irf7^−/− mice to Mettl3^ΔIEC mice and verified Irf7 mediated induction of ISGs is critical for the anti-viral phenotype in Mettl3^ΔIEC mice.

      Our bulk RNA-seq data in IECs showed the augmented expression of ISGs upon METTL3 deficiency in steady state (Fig. 2a). We also found an augmented ISG expression in intestine of METTL3-deficient mice in steady state or early infection of RV (2d) by qPCR. However, as the RV loads in METTL3-deficient mice during the late infection stage are significantly lower than WT mice, thus the inducible ISGs expressions are consequently lower in intestine of METTL3-deficient mice than WT mice in day 4 post infection (Fig. 3f).

      5) The finding that IRF7 is targeted by METTL3 is not convincing. First, the authors performed MeRIP-seq and -qPCR experiments only using RNAs from wild-type IECs not from METTL3-deficient cells. It is necessary to show that the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency. Second, it is unclear if MeRIP-seq is properly performed or not, because there is no quality checking figure shown. For instance, the authors can generate metagene plots or gene logos of m6A modified sites to see if there is any consistency with previous reports. Third, in Figure 2h, the authors should show that the change in luciferase activity between wild-type and mutant Irf7-3'UTR reporters is dependent on METTL3 activity by performing METTL3 knockdown or knockout. Also, the authors should describe how they mutagenize the sequences for clarification. Fourth, in Figures 2F and 3C, they showed that IRF7 is upregulated in METTL3-deficient IECs while in Figure 3F, IRF7 is conversely downregulated in METTL3-deficient IECs. This is apparently contradictory to each other.

      We appreciate the valuable suggestion provided by the reviewer to improve our manuscript.

      1. We have done RIP-qPCR in Mettl3 knock-down and WT MODE-K cells to verify the m6A modification on IRF7 mRNA, the modification levels on IRF7 mRNA is indeed reduced upon METTL3 deficiency (see below and Fig. s5c, s5d). We have added the description of the experiment in the manuscript.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (c) m6A-RIP-qPCR confirms Irf7 as an m6A-modified gene in IECs. Fragmented RNA of sgEV and sgMettl3 MODE-K cells was incubated with an anti-m6A antibody (Sigma Aldrich ABE572-I). The eluted RNA and input were processed as described in ‘RT-qPCR’section, the data were normalized to the input samples (n=3, mean ± SEM, Statistical significance was determined by Student’s t-test (\P < 0.05, **P < 0.005, NS., not significant). Tlr3 and Rps14 were measured with m6A sites specific qPCR primer as positive control and negative control, Irf7 was measured with predicted m6A sites specific qPCR primers. (d) Knock down efficiency of METTL3 in MODE-K cells.*

      1. We have performed metagene plots as suggested. As shown in figure s5b, the m6A peak is enriched near the stop codon and 3’UTR region, which is consistent with previously study (Xuan et al. 2018; Dominissini et al., 2012; Yang et al., 2019). We have added the description in the manuscript.

      Figure s5. Characterization of m6A modfications on Irf7 mRNA. (b) Metagene plots of m6A modified sites.

      1. We have performed the luciferase assay in WT and METTL3 knockdown 293t cell, and found increased luciferase activity in mutant Irf7-3'UTR reporters is dependent on METTL3 activity (see below and fig. 2h, s5e). We have added the description of the experiment into the manuscript.

      Figure 2. Mettl3 deficiency in intestinal epithelial cells results in decreased m6A deposition on Irf7, and increased interferon responses. (h) Relative luciferase activity of sgEV and sgMettl3 HEK293T cells transfected with pmirGLO-Irf7-3’UTR (Irf7-WT) or pmirGLO-Irf7-3’UTR containing mutated m6A modification sites (Irf7-MUT). The firefly luciferase activity was normalized to Renilla luciferase activity (n=3, mean ± SEM). Statistical significance was determined by Student’s t-tests between genotypes (\P < 0.05, NS., not significant).*

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (e) Knock down efficiency of METTL3 in 293t cells used for luciferase assay.

      1. IRF7 is an ISG. The expression of IRF7 is controlled by both PAMP (such as virus component)-induced transcription and post-transcriptional regulation like m6A modification mediated mRNA decay. In steady state or early stage (2d) of rotavirus infection, there is no virus or the viral loads is comparable in both Mettl3^△IEC mice and WT mice, thus, IRF7 expression is mainly regulated by m6A and is higher in IECs from Mettl3^△IEC mice in comparison with that from WT mice. However, as the RV loads in Mettl3^△IEC mice during the late infection stage are significantly lower than WT mice, in this case, IRF7 expression is mainly regulated by the PAMP from virus, thus the inducible IRF7 expressions is consequently lower in intestine of Mettl3^△IEC than WT mice in day 4 post infection (Fig. 3f).

      6) It is unclear if the augmented expression of IRF7 per se upregulates IFN and ISG expression. Since IRF7 exerts its transcriptional activity upon phosphorylation, the authors should examine IRF7 phosphorylation and total protein levels in METTL3-deficient IECs. Also, it is interesting to see if the phosphorylation of TBK1 is augmented or not.

      We have provided the phosphorylation and total protein levels of IRF7 and TBK1 in MODE-K cells treated with poly I:C. Both total IRF7 and phosphorylated IRF7 are upregulated in Mettl3-knock down cells compare to control cells (see below and Fig s5f). However, Both total TBK1 and phosphorylated TBK1 remain unchanged (Fig s5f), suggesting the augmented ISGs are less likely due to the activation of the upstream signal of IFN.

      Figure s5. Characterization of m6A modifications on Irf7 mRNA. (f) Western blot analysis of sgEV and sgMettl3 MODE-K cells transfected by lipo3000 with 2ug/ml poly I:C at indicated hours post transfection, at least three replicate experiments were performed.

      7) In Figure 3, the authors utilized METTL3 and IRF7 deficient mice to show the contribution of METTL3-mediated IRF7 regulation in rotavirus infection. However, if IRF7 is totally abrogated, IFN production should be greatly impaired as shown in Figure 3A. Thus, it is not surprising to see that the IFN response is diminished. The authors can use heterozygous IRF7 deficient mice instead to check if upregulation of IRF7 under METTL3 deficiency is critical to control rotavirus infection.

      We thank the reviewer for pointing out an important issue. However, we checked the IRF7 expression levels in IECs from Irf7^+/+ , Irf7^+/- and Irf7^-/- mice and found that there is no difference between IRF7 levels in IECs from Irf7^+/- mice and that in IECs from Irf7^+/+ mice. Thus, it is not feasible to use heterozygous IRF7 deficient mice to test the idea (Supporting Figure 1).

      Supporting Figure 1. WT and Irf7 Heterozygous mice show same IRF7 expression level in IECs. (a) IECs from 2-weeks-old Irf7^+/+ , Irf7^+/-, Irf7^-/- mice were isolated. Western blot analysis show IRF7 expression level in different mice. (b) Quantitative analysis of (a) (mean ± SEM), statistical significance was determined by Student’s t-test ( \**P < 0.001, NS., not significant).*

      8) Given no effect of ALKBH5 knockout on rotavirus infection as shown in Figure 4, it is questionable if ALKBH5 has a profound role in the regulation of m6A in IECs. The authors should determine if m6A modification levels are increased in IECs under ALKBH5 deficiency.

      We performed the m6A dot blot assay to detect m6A modification levels in ALKBH5-knock down MODE-K cells and we do find an increase of m6A modification level under ALKBH5 deficiency (see above and Fig s10). No effect of ALKBH5 knockout on rotavirus infection actually puzzled us as well before (Fig.4c, 4d and 4e), until we found RV infection down-regulated ALKBH5 expression in the intestine of WT mice (Fig.4a).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this paper, the authors describe a MRI-based functional connectivity mode for the striatum and attempt to show that it is related to dopaminergic input from the midbrain. Currently, dopaminergic input can only be assessed in humans with radionuclide imaging modalities (PET and SPECT), which have poor spatial resolution, relatively long acquisition times, and require radioactive tracers. The MRI-based method would provide higher resolution and greater accessibility, and moreover, can be applied retrospectively to data that has already been collected. The authors use multiple lines of study to build the case: comparison to DaT SPECT, which shows the distribution of dopamine transporters; alteration in Parkinson's Disease, where dopaminergic input is known to be reduced; and relation to alcohol and tobacco use in healthy volunteers, where dopamine signalling in the brain's reward processing pathway is altered. The combination of clinical, behavioral and imaging experiments to validate the MRI biomarker of dopamine input is the major strength of this study. Not only is the biomarker altered as expected in each case, but the alterations also exhibit regional specificity that is consistent with prior reports often obtained with invasive measurements. A direct validation of the biomarker would require invasive histology that is clearly impossible in healthy humans, but while any single finding from one modality would be less convincing, taken together, they provide sufficient circumstantial evidence to motivate further use and investigation of the biomarker. The authors use quantitative techniques to characterize the change in the functional connectivity mode and find truly impressive correspondence with the SPECT measurements of DaT at the group level. As expected, the correspondence is weaker at the individual level, but still respectable. The authors show substantial individual data throughout the manuscript in addition to the group data, which increases confidence in both their results and the potential utility of the biomarker in the clinic. For example, the relationship between symptom severity after L-DOPA and changes in the biomarker at the individual level is very encouraging. The least convincing aspect of the manuscript was the relationship between the connectivity mode and the amount of tobacco use (Fig 6, top) where the line fit looks as if it may have been driven by two very high use points. Given the strength of the other findings, even if the relationship with tobacco use does not completely hold up, it detracts very little from the overall study. The lack of a difference in the biomarker between the left-dominant Parkinson's group and the control group is also a bit surprising. Given the discussion about flooring effects, it may be a power effect, but it definitely warrants more investigation in the future.

      We thank the reviewer for providing feedback to improve this manuscript and for acknowledging the many strengths and importance of our work. To make sure that the relationship between the connectivity mode and the amount of tobacco use (Figure 6) was not driven by outliers, we first determined if the two high use points of 175 and 195 cigarettes a week with TSM (linear X) values of 1.395 and 1.440 respectively, are outliers. The median and interquartile range (IQR) of this distribution are 1.272 and 0.122 respectively. Accordingly, both high-value points just fall outside the Q1-Q3 IQR of 1.150 – 1.394, but the first datapoint of 1.395 is still within 2 standard deviations from the mean (1.268+2*0.0803=1.429) and the second datapoint (1.440) is still within 3 standard deviations from the mean (1.509). As such, we do not consider these data points as extreme outliers that need to be removed from our analysis. We nevertheless repeated the GLM analyses testing for associations between the amount of tobacco use and second-order connectivity mode without these two subjects and the association was still significant (X 2 =46.14, p=0.004). It is also important to keep in mind that our sample is population-based. While the corresponding usage of cigarettes (175 and 195 cigarettes a week corresponding to 25 and 28 cigarettes a day) is at the high end in this particular population-based sample, this amount of use is not uncommon in regular smokers. As for the lack of findings with respect to left-dominant PD patients we agree that here we may be suffering from a lack of power. Nevertheless we feel that this is worth reporting for the sake of completeness, if only to indicate that as a straight-forward hypothesis, it did indeed get tested.

      Reviewer #2 (Public Review):

      This is an excellent paper with an excellent outstanding methodology and sequence of steps which contains many strengths

      • First, they apply a novel fMRI resting state functional connectivity method, connectopic mapping (CM). This is validated in a large standard data set, the HCP fMRI, in around 800 healthy subjects.
      • Secondly, they use the measurement of a striatal DA transporter, DaT SPECT, in a large number of subjects (around 200) to establish spatial correlation with fMRI connectopic mapping.
      • Thirdly, they measure subjects where striatal dysfunction is known to be altered. Parkinson disease (PD) with L-Dopa therapy; this serves the purpose to the direct impact of dopamine deficiency (D2-receptors) and dopamine replacement therapy (L-Dopa) on striatal connectopic mapping
      • Fourthly, they further support that by scanning people with daily alcohol or nicotine consumption whose degree of substance use corelates with the striatal connectopic mapping.

      We thank the reviewer for providing feedback to improve our manuscript and for acknowledging the excellence of our methodology and the strengths of our work.

      Some weaknessness shall be mentioned.

      • I was wondering how their striatal DA connectopic marker stands in relation to others like melatonin-sensitive MRI (Cassidy et al.PNAS and others). This should at least be discussed. Ideally, they do melatonin scanning in their sample and correlate it with their striatal connectopic marker. This would provide the opportunity to more directly validate their marker.
      • Another issue is the biochemical specificity. The striatum contains also many glutamatergic (medium spiny) and gaba-ergic neurons which are key in mediating DA effects as the latter (as far as I know) terminate on the former. Moreover, it is known that rsFC is related to excitation-inhibition balance and thus to glutamate-GABA. How can the authors make sure that their cortical conenctopic maps are really related specifically to DA rather than glutamate and/or GABA? This is even more urgent given that we know glutamate changes in alcohol and/or smoking and also in PD to be prevalent.
      • It would be good if this issue of specificity could be addressed. Like in people who receive ketamine (anti-NMDA): if the authors' connectopic marker is specific for striatal DA, it should not be changed under NMDA treatment.
      • Another way is to conduct computational modelling: modulation of glutamate/GABA should ideally not affect the striatal cortical connectopic marker....
      • Some key literature should be cited and discussed: Conio et al. 2020 establishes a model of DA projects and their implications for psychiatric disorders
      • Yet another issue is the question for serotonin. Various papers by Marinto/Magioncalda in especially bipolar disorder recently established modulation of nigral D2 by raphe-based serotonin. This should be discussed at least: Could the connectopic marker be related to such modulation? How could they make sure that their marker is related exclusively to cortical D2 projections rather than cortical serotonine effects? I am aware that these are tough questions but they should at least be addressed in the discussion...
      • Moreover, the striatum is a complex region with subdivisions like dorsal and ventral which again can be featured by different dopamine systems (D2 vs D1/5) - this should be probed in their data to enhance specificity for nigral-based D2 of their connectopic marker....

      The above points nearly all relate to the specificity of the second-order connectivity mode to dopaminergic projections. We refer the reviewer therefore to our response to comment 1 of the essential revisions. Here we conducted additional analyses demonstrating that the mapping of the second-order connectivity mode onto the DaT SPECT scan is far superior compared to the PET tracers available for other neurotransmitter systems, such as the serotonin and GABA system. Further and in addition to our response above it appears that our sensitivity analysis does not suggest a strong differentiation of the second connectivity mode relative to D1 or D2 receptor distribution but instead segregates either of these from the distribution of the DaT. We unfortunately do not have access to melatonin-sensitive MRI data or high-resolution fMRI data of patients. While the reviewer has many excellent suggestions these therefore need to remain the subject of future studies.

      Reviewer #3 (Public Review):

      The study provides an impressive breadth of analyses, including comparisons to SPECT imaging, Parkinson's patients, drug manipulation and behavior, which build to form a compelling case that the identified patterns of functional connectivity. The surface modeling approach employed provides an interesting alternative to more standard parcellation approaches, which highlights the possibility that organization with the striatum occurs along gradients, rather than within functionally or anatomically circumscribed regions. Importantly, the findings have potentially wide-ranging implications and applications, since striatal dopamine (DA) and cortico-striatal connectivity are of great interest across a wide variety of fields, including their variation across the lifespan, disruption in various clinical populations, and contribution to normative behaviors.

      We thank the reviewer for providing feedback to improve our manuscript and for commenting on the breadth of analyses and potential wide-ranging implications of our work.

      While the surface modeling approach has some appealing features, it is a rather complex approach that is hard to understand intuitively. The difficultly to grasp its nuances limited my ability to follow some of the interpretations provided. For example, an important aspect of the results is that only the second order mode of the functional connectivity profile (and not the 0th or 1st order modes) are associated with dopamine measures and manipulations, but I found it difficult to assess what these different modes are capturing. Are these overlapping modes of distinct aspects of connectivity (each of which is expressed to a different extend), or different characterizations of the same pattern? Do the modes represent the extent to which different striatal regions exhibit the same pattern of cortical connectivity, or is the connectivity pattern also shifting? Some additional clarity on these patterns would have greatly helped me understand the subsequent results. Similarly, in the results of PD patients, it is stated "we can interpret the observed alteration in the connection topography as a decrease in dopaminergic projections to striatum." (l. 242). A decrease in the quadratic term of the TSM would seem to indicate less spatial variability, but not obviously an overall decrease, which would seem instead to be reflected by the 0th order term (if I understand these modes correctly). Some clarification on this interpretation, and more description of the modes in general, would be helpful.

      We acknowledge that our connectopic mapping method and the subsequently applied trend surface modeling (TSM) approach might not be as intuitive and easy to understand as more traditional functional connectivity approaches. This is largely due to classical approaches neglecting the presence of functional multiplicity, i.e., the fact that within brain regions neural computations can contribute to multiple cognitive processes. In short, connectopic mapping yields a set of overlapping, but independent connection topographies or “connectivity modes” that together describe the functional organization of a brain region. In Haak et al 2016, we demonstrated for example that in V1 we can detect separate gradients that reflect sensitivity to orientation and eccentricity– cortical organisations that can also be probed experimentally using retinotopic mapping procedures. Likewise, when applying connectopic mapping to the striatum, the obtained connection topographies indicate how the connectivity profile with the rest of the brain changes across striatum. Voxels that have similar colours in these connectivity modes have similar connectivity patterns with the rest of the brain. Which aspects of functional connectivity these modes are precisely capturing depends on the region of interest investigated and is furthermore difficult to predict beforehand, especially for the higher-order connectivity modes. Regarding the striatum, we showed in previous work (Marquand et al., 2017) that the dominant (zerothorder) mode represents its basic anatomical subdivisions, while the first-order mode maps on to a ventromedial-to-dorsolateral gradient associated with goal-directed behaviour in cortex that has been described previously on the basis of tract-tracing work in non-human primates. In this manuscript we subsequently provide evidence that the second-order striatal connectivity mode maps onto dopaminergic projections.

      We have now clarified our approach in the legend of Figure 1: “Then similarity between voxels is computed using the η2 coefficient, resulting in matrix S. Manifold learning using Laplacian eigenmaps is then applied to this matrix, yielding a set of overlapping, but independent connection topographies or “connectivity modes” that together describe the functional organization of the striatum. These connection topographies indicate how the connectivity profile with the rest of the brain changes across striatum. Voxels that have similar colours in these connectivity modes have similar connectivity patterns with the rest of the brain.”

      Further, we have also clarified the trend surface modeling (TSM) approach in the Materials and Methods section:

      “Finally, to enable statistical analysis over these connection topographies, we fitted spatial statistical models to obtain a small number of coefficients summarizing the second-order connectivity mode of each striatal subregion in the X, Y, and Z axes of MNI152 coordinate space. For this, we use ‘trend surface modelling’ (TSM; 27), an approach originally developed in the field of geostatistics, but that has wide ranging applications due to its ability to model the overall distribution of properties throughout space as a simplified surface. Here we use the TSM approach to predict each individual subject’s connection topography by fitting a set of polynomial basis functions defined by the coordinates of each striatal location. …. This criterion strongly favoured a polynomial of degree 2 for the putamen subregion and a polynomial of degree 4 for the caudate-NAcc subregion. This means that the connectivity mode in putamen was modelled with linear and quadratic functions in the X, Y, and Z direction of MNI152 coordinate space (6 TSM coefficients) and the connectivity mode in the caudate-NAcc region with linear, quadratic, cubic and quartic functions in the X, Y and Z direction of MNI152 coordinate space (12 TSM coefficients). The TSM coefficients of the fitted polynomial basis functions describe the rate at which the connectivity modes changes along a given spatial dimension and can be used for statistical analysis.”

      Regarding the following statement: "we can interpret the observed alteration in the connection topography as a decrease in dopaminergic projections to striatum.", we would like to point out that we first used a GLM omnibus test of all TSM coefficients modelling the second-order connectivity mode to investigate whether an association with UPDRS symptom severity was present. Post-hoc Pearson correlations then revealed that this association was driven by the quadratic TSM coefficients modelling the putamen region in the Y and Z direction of MNI space. Next, we plotted the association with UPDRS symptom severity for the quadratic Y coefficient as well as show five of the connectivity modes at varying UPDRS symptom severity for visualization and interpretation purposes in Figure 4B. The interpretation above is based on visual inspection of these five connectivity modes shown in figure 4B (in light of the similarity between the second-order connectivity mode and the DaT SPECT scan shown in Figure 2). We hope that this answer sufficiently clarifies our interpretation.

      Several common confounds for rsFC analyses, especially head motion, are not sufficiently well addressed as to ensure that they do not contribute to the spatial patterns reported. Specifically, the second-order fit would seem to capture some sense of the "sharpness" of the spatial connectivity profile in the striatum. This seems like it could be driven either by neurophysiological features regarding the functional segregation of these regions, or data quality features regarding the smoothness of the data. Since one effect of head motion (in both resting state fMRI and other domains such as PET/SPECT) can be to change the spatial smoothness of data, it would be important to characterize how much of the variance in this measure can be accounted for by head motion (or other confounds). This is especially true since such confounds are known to be greater in, e.g., patient populations, which could affect the analyses performed later.

      We agree that head motion can indeed have a profound impact on resting-state functional connectivity analyses. We have now added several post-hoc sensitivity analyses to the supplementary materials strongly demonstrating that our findings are unlikely to be confounded by head motion. For a more extensive description, we refer to our detailed response to comment 2 in the essential revisions section of this document.

      Finally, the findings are at various points referred to as a potential biomarker for dopamine (dys)function. While this term has been used in a wide range of contexts, such claims generally require a greater burden of proof than the presence of statistically significant associations, e.g., including classification and/or sensitivity/specificity analyses. These assertions do not yet seem well supported by the included statistics, and may need clarification.

      Indeed, to ultimately proof that our connectivity mode can be used as a biomarker for dopamine (dysfunction) would require invasive histology, which is impossible in healthy humans and in the context of this study. As such, we cautiously refer to our connectivity mode as a ‘potential’ biomarker for dopaminergic (dys)function and also state in the discussion that more research and out of sample replication is needed. We believe that, while each of our findings in isolation would be insufficient to claim that the second-order striatal connectivity mode could be used a potential biomarker, all our findings together provide sufficient circumstantial evidence to motivate the further use and investigation of this connectivity mode as a biomarker. In particular, the direct within-subject mapping of the connectivity mode onto the DaT SPECT scan (acknowledged as being a biomarker) and the finding that our connectivity mode is sensitive to acute dopaminergic modulation suggest specificity to dopaminergic function. Furthermore, we also conducted an additional analysis (see Figure 2–figure supplement 1) comparing the spatial mapping of DaT SPECT to the second-order striatal connectivity mode, to various other PET derived neurotransmitter systems. This analysis revealed that the TSM coefficients describing the DaT SPECT scan provide a much better fit to the data than TSM coefficients describing any other PET derived neurotransmitter system.

    1. Author Response:

      Evaluation Summary:

      This study examines genetic and non-genetic factors influencing immune responses in type 1 diabetes Key findings are: 1) age and season affect immune cell traits and cytokine production upon stimulation; 2) certain genetic variants that determine susceptibility to T1D significantly affect T cell composition, notably the CCR region that is associated with CCR5+ regulatory T cells; and 3) 15 genetic loci that influence immune responses in T1D, most of which have not been seen previously in healthy populations. The results suggest mechanisms of T1D-specific genetic regulation.

      We thank the reviewer for the appreciation of our data quality and approach. We have tried to bring more focus in the conclusions, partly by taking out the whole non-genetic section.

      Reviewer #1 (Public Review):

      Strengths of the manuscript include the important research question addressed, the robust functional genomics methodology used, the relatively large sample size, and the translational implications of the study findings that pinpointed new potential drug targets in autoimmune diabetes. Weaknesses include the analysis of immune responses at a certain time point that may not represent the dynamic immune phenotype of the disease over time, the testing of immune responses in peripheral blood mononuclear cells (PBMC's) that may not represent the islet infiltrating immune cells that cause autoimmune diabetes, using generic stimulants to activate PBMC's instead of beta-cell autoantigens, and that the QTL analysis may not be relevant to the etiology of autoimmune diabetes as it identified QTLs associated with immune cell proportion and cytokine production, but these do not necessarily influence the development of autoimmune diabetes.

      We thank the reviewer for the fair assessment of our manuscript. We fully agree with the reviewer that the study in relevant tissue at different time points could be very important for understanding type 1 diabetes, however, tissue immunity could be partly reflected by the changes in circulating level of immune cells and cytokine production capacity, since the islet infiltrating immune cells do originate from circulating blood cells. We have modified the manuscript by adding more discussion about this topic.

      “The data presented in our study are generated from PBMC. While these likely reflect overall immune function, some immune cell types may not be captured and all over the findings refer to changes in circulating factors that may not necessarily reflect changes occurring in relevant immune organs, such as pancreatic islets, gut or lymph nodes. Still, islet infiltrating immune cells do originate from circulating blood cells, and circulating chemokines/cytokines are important in activating and recruiting immune cells. Hence, the circulating level of immune cells and cytokine production capacity is probably relevant for local tissue immunity.

      Reviewer #2 (Public Review):

      This manuscript presents data collected from two cohorts of individuals, one including patients with type 1 diabetes, the other encompassing non-diabetic persons. Of note, the cohorts are not contemporary and samples from the two groups were collected several years apart (2013/14 for controls, 2016/17 for the diabetic group). This is not an issue for any genetic comparisons. However, comparing immune phenotypes in non-contemporary cohorts, particularly with respect to seasonal variations as the authors attempt in some of their analyses, is not useful as it lacks the rigor of collecting samples under identical conditions.

      We thank the reviewer for raising this point. In order to focus on “genetics part” as suggested by the reviewers, we have taken the non-genetic associations, including seasonal effects and age, completely out of the paper and have rewritten the paper accordingly. Hence also one figure was removed, the others were renumbered.

      This caveat aside, the overall aim of the study was to compare the function of immune cells, with a focus on the distribution of various cell populations and their cytokine secretion, between individuals with and without type 1 diabetes. Many of the analyses are difficult to interpret because the authors use measures and correlations for which the rationale is not well explained and whose presentation in the rather busy figures lacks detailed descriptions. There is no doubt that the authors amassed a substantial amount of data in what appears to be an ambitious study of hundreds of blood samples. However, the authors do not do their data justice by failing to present it in a easily comprehensible and interpretable data. Much of the description of the results makes the assumption that readers are familiar with the very particular way the authors analyzed the data (e.g. refering to parental and grandparental percentages, where it is entirely unclear what the authors are refering to).

      We thank the reviewer for the suggestions. We have modified the figure legends by adding more information to explain the results and help the readers to acquire a better understanding. We have included a supplementary table S6 describing how parental and grandparental percentages were defined and the immune cell gating method could be found in our previous study (Aguirre-Gamboa et al., 2016) and in Figure 1 - Figure supplement 2.

      Many of the observations presented are trivial and could have been omitted from the manuscript, for example showing that the immune system acquires more memory lymphocytes as people age, with no apparent difference between the groups studied. The fact that our immune system gets more experienced as we age is both unsurprising and a well known phenomenon. Similarly, the correlations between immune cells and cytokine secretion compared between groups yield no discernable differences and this could have been summarized much more succinctly in the interest of clarity. The more interesting data relating to gene variations that appear to impact immune phenotypes could have been given more weight in the overall manuscript to better describe them and discuss possible implications.

      In sum, this is a manuscript with a very large data set whose presentation lacks focus on the key points that would emphasize the novelty of the findings put forward by the authors. As such, it is not very accessible to a general readership.

      We thank the reviewer for the comments and suggestions. We agree with the reviewer that the genetic regulation is probably the most interesting and novel, hence we have modified the manuscript to focus more on genetics.

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper clearly indicates that by using parallel fMRI and ECoG experiments, the authors are able to detail the hierarchy of predictive coding in the cortical and higher subcortical areas of the auditory pathway. The methodology is well detailed and I didn't spot any major concerns.

      The scientific methodology detailed in this paper appears to be sound. Further, the main conclusions appear to be well argued.

      We thank the reviewer for the positive comments.

      The statistical analysis, however, is not reported clearly in the main text. For instance, I'm unsure how multiple comparison correction was addressed. A more detailed primer on the statistical methods used in the results section is warranted.

      In the fMRI analysis, to assess the novelty responses as a function of the different number of sequence types, we performed a within-subject one-way ANOVA design in SPM, where the single-session contrast images corresponding to trial types were introduced as within-subject factor. To directly compare the responses of different novelties, we defined each type of contrast using the pair-wise t-test. We initially observed the results with a threshold of uncorrected p < 0.001 at the voxel level, and then considered the results as significant at p < 0.05 with false discovery rate (FDR) corrected for multiple comparisons across the brain (Cacciaglia et al., 2019; Uhrig et al., 2014). If no voxel survived FDR correction, then a threshold of uncorrected p < 0.001 was used. In the ECoG analysis, we performed the independent-sample t-test for each comparison in TFRs. The multiple comparison problem originates from the fact that ECoG data are multidimensional. For ECoG data from a single electrode, the signals are sampled at multiple frequencies and multiple time points. Therefore, we used a nonparametric cluster-based permutation test for multiple comparisons over frequency and time (Chao et al., 2018; El Karoui et al., 2015). To report the statistical analysis more clearly, we have added the details about statistical methods of fMRI and ECoG analyses, and multiple comparison correction, in the results of the main text. Please see the section of 1 st -level (local) novelty (xY sequences).

      My largest concerns are to do with communication, and language overreach. At one point the term "lower auditory pathways" is used, but the lowest portion investigated in this study is the IC, and this usage was in reference to the thalamus. There's a lot of brain between the IC and cochlea, to say nothing of the thalamus. There are also concerns about both the temporal and spatial resolution of fMRI and ECoG - the text at times implies that the resolution for these techniques is far greater than it is. However, these are communication issues that should be easily addressed.

      We are grateful for the reviewer’s suggestions. The temporal and spatial terms we used in the last version were based on our paradigm and recording methods. With 9.4 T fMRI, the lowest area of the auditory pathway that we can assess is the midbrain. Thus, we limited the observed range of the auditory pathway from midbrain to frontal cortex (as shown in Figure 1C). In the design of our paradigm, the local auditory information focus on the millisecond timescale, and the global auditory information refer to the second timescale. Therefore, we defined the temporal range from millisecond to second. However, we have realized that the temporal and spatial terms we used were not rigorous and may cause ambiguity. Throughout the revised manuscript, we have rewritten them as lower- and higher-level areas, shorter- and longer-time scales.

      Reviewer #2 (Public Review):

      In this study, Jiang et al. combined whole-brain 9.4 T functional magnetic resonance imaging and large-scale electrocorticography to study brain wide activation patterns in response to different pattern violations in marmosets. The authors confirm previous results of a cortical hierarchy for auditory predictive processing and expand on these results by quantifying subcortical responses in MGB and IC as well as using omission to confirm previous results obtained with mismatches. The results highlight the existence of the two levels of auditory prediction signals in the marmoset brain that can be interpreted in a hierarchical predictive processing framework.

      The paradigm used to assess the hierarchical depth of predictive auditory sequences for processing predictions errors and prediction updates at two distinct timescales is well designed, and presumably based on one of the authors earlier studies (Chao et al., 2018). Unfortunately, the current study fails to highlight the novelty of this work (as far as we can tell, mainly the omission responses) and give adequate credit to previous work on the topic. However, this can be easily fixed by rephrasing the relevant passages of the manuscript.

      We thank the reviewer for the positive comments. We have now revised the Results and Discussion to provide more details about the omission responses and discuss the contribution and novelty of omission sequences in the hierarchical predictive coding. Please also see the reply to Q1 of the Main concerns.

      Main concerns:

      1) It would be good to clarify what the novelty of the present manuscript is (omission responses) in comparison to the previous work (Chao et al., (2018)). The authors do argue that their higher resolution fMRI, allows them to also study subcortical response - which is correct - but the authors make no use of them in any meaningful way in the manuscript. The emphasis on novelty is likely better placed on the omission responses.

      We thank the reviewer for the constructive suggestion. In the revised Discussion, in comparison to the previous work (Chao et al., 2018), we have specifically emphasized the novelties of the present study.

      -First, the model described the 1st - and 2nd - levels of violations (prediction and error) in the present study is novel and more straightforward. Instead of using the partial- or full-global predictions in the Chao et al. model, which is challenging to interpret, we first introduce the sequences with xx and xY as separate internal templates. Similarly, as we mentioned in the discussion, although the local-global paradigm has been intensively studied in humans and macaque monkeys (Chao et al., 2018; Uhrig et al., 2014; Wacongne et al., 2011), most studies tested the global violation by combining xx|xY and xY|xx novelties, which, in fact, contain two different types of predictions. Our study is the first to separate the two novelties and search for their neural representations, respectively. This is important because xx|xY novelty was only involved in the 2nd-level signal with the xY sequence as the internal sequence template, and the xY|xx novelty was involved in both 1st - and 2nd -level signals (the 1st-level novelty triggers the 2nd-level novelty), where the xx sequence was the internal template (see Discussion).

      -Second, this is the first study to construct the hierarchy of predictive auditory sequences in the marmoset brain using fMRI. Our results extended the hierarchical organization of predictive coding from the cortex to the subcortical regions. To emphasize the importance of this animal model, we have added a section of Marmosets as an animal model for auditory sequences in the Discussion.

      -Third, most importantly, as suggested by the referee, the omission responses is indeed novel. To highlight it, we have expanded the results of omission and provided more discussion of its contribution to the hierarchical predictive coding.

      2) Figure 3C (and all similar figures). We fear this figure is not interpretable without a substantially improved explanation. Both what the arrows mean (i.e. how they are computed), and what the values indicate that are listed next to the arrows is not explained (arrows appear randomly bi- or unidirectional and the legend at the bottom of the figure is not very helpful).

      We apologize for the missing details in Figure 3C. The color dots in the brain diagrams indicate the electrodes with significant responses found in corresponding comparisons, which were subsequently used in the functional correlation test (see Materials and Methods). Lines represent significant functional correlations between signals from the paired brain diagrams. Labeled values close to lines provide the Pearson correlation coefficient (p-value) of the corresponding correlations. Unidirectional arrows indicate relative temporal orders at which the signals appear, while bidirectional arrows indicate uncertain temporal orders of the signals. Figure 3D, 5C and D, Figure 3-figure supplement 1C and D, Figure 5-figure supplement 1C and D have the same format as Figure 3C. Accordingly, we have revised the legend of Figures 3C and D, 5C and D and added more explanations in the Results.

      References:

      Cacciaglia, R., Costa-Faidella, J., Zarnowiec, K., Grimm, S., & Escera, C. (2019, Feb 1). Auditory predictions shape the neural responses to stimulus repetition and sensory change. Neuroimage, 186, 200-210. https://doi.org/10.1016/j.neuroimage.2018.11.007

      Chao, Z. C., Takaura, K., Wang, L., Fujii, N., & Dehaene, S. (2018, Dec 5). Large-Scale Cortical Networks for Hierarchical Prediction and Prediction Error in the Primate Brain. Neuron, 100(5), 1252-1266.e1253. https://doi.org/10.1016/j.neuron.2018.10.004

      El Karoui, I., King, J. R., Sitt, J., Meyniel, F., Van Gaal, S., Hasboun, D., Adam, C., Navarro, V., Baulac, M., Dehaene, S., Cohen, L., & Naccache, L. (2015, Nov). Event-Related Potential, Timefrequency, and Functional Connectivity Facets of Local and Global Auditory Novelty Processing: An Intracranial Study in Humans. Cereb Cortex, 25(11), 4203-4212. https://doi.org/10.1093/cercor/bhu143

      Uhrig, L., Dehaene, S., & Jarraya, B. (2014, Jan 22). A hierarchy of responses to auditory regularities in the macaque brain. J Neurosci, 34(4), 1127-1132. https://doi.org/10.1523/jneurosci.3165- 13.2014

      Wacongne, C., Labyt, E., van Wassenhove, V., Bekinschtein, T., Naccache, L., & Dehaene, S. (2011, Dec 20). Evidence for a hierarchy of predictions and prediction errors in human cortex. Proc Natl Acad Sci U S A, 108(51), 20754-20759. https://doi.org/10.1073/pnas.1117807108

    1. Author Response:

      Reviewer #1 (Public Review):

      This is a clearly written manuscript describing an elegant study that demonstrates how microsaccades are not the triggers of attentional effects, and that attentional modulations can be observed in the absence of microsaccades. This is a very much needed work, especially in the light of the recent debate regarding whether or not microsaccades are the cause of peripheral attentional effects. By explicitly comparing and quantifying the effects of attention on neuronal responses in the presence and in the absence of microsaccades, this work provides important insights on this debate. I think the work is well conducted and the results are solid.

      We thank the reviewer for their supportive comments!

      I only have few comments/suggestions:

      1. Lines 125-126, the authors report that monkeys generated frequent microsaccades but their overall direction was not systematically biased towards the cue location. This seems to be in contrast with what previously reported in the literature in humans and monkeys. I think this discrepancy should be discussed in the discussion. Is this simply the result of different experimental paradigms (maybe exogenous vs endogenous attention, or the presence of the cue for the entire duration of the trial, ect)?

      As suggested, we discuss three main factors which may contribute to this discrepancy:

      The first factor is the difference in the time window used for microsaccades analyses. Previous reports focused their analyses of microsaccades on the time window immediately after cue onset. In our analyses, the time window focused on is the ‘delay period’ which is hundreds of milliseconds after the cue and the time epoch used in most electrophysiology studies about attention.

      A second factor is how the spatial cues were presented. In our paradigm the cue ring appeared in the periphery and then disappeared. In contrast, previous paradigms used a cue presented near fixation that persisted throughout the trial. Our brief peripheral cue provides less of an impetus to generate small saccades directed towards the cue, compared to the case when the cue is continuously near the center of gaze.

      A third factor is that monkeys in our task were trained to release a joystick to report their detection of stimulus events, rather than make a saccade. Because human and monkey subjects tend to make microsaccades in the same direction as their upcoming saccadic choices (Yu et al., 2016), attention tasks using saccade reports will tend to introduce this direction bias on microsaccades. By using a joystick release, we minimized these lateralized effects related to saccade preparation.

      These points are now addressed in the second paragraph of discussion.

      1. It is very interesting that microsaccades modulate neural responses for stimuli that are much further away from their landing location. However, the stimulus used in these cueing tasks is also unnatural. Normally we are not fixating on a meaningless dot while all the interesting stimuli are presented in the periphery. In normal conditions the foveal input is rich in detail and it is generally relevant (that's why we are foveating certain stimuli in the first place). I wonder if the authors can comment on whether the modulations reported here would also occur in more natural conditions when an interesting and maybe salient/relevant stimulus is presented at the center of gaze, while subjects are also attending to a peripheral target. Will the neural response be modulated selectively for neurons for which the receptive field is on the peripheral target or will it also affect neurons where the receptive field aligns with the microsaccade target location in the fovea?

      The reviewer raises a very good point. In our study, the relationship between microsaccades and attention-related modulation was examined when monkeys selectively attended a stimulus located in the near peripheral visual field while maintaining central fixation. We agree that under more natural conditions, the monkey would just look directly at the peripheral stimulus. As in many attention studies with this type of design, our experiments hold the system in a state of sustained peripheral attention which would otherwise be much shorter.

      We believe that similar modulation at the peripheral location would be briefly observed if the monkey were allowed to satisfy the natural tendency to look at the stimulus, although this would make it more difficult to examine the relationship with microsaccades. This would be consistent with the documented pre-saccadic modulation of attention (e.g., documented by the Carrasco lab, Li, Hanning, & Carrasco, 2021).

      Once the attended stimulus is foveated, there is strong behavioral evidence from several recent studies demonstrating that attention can be selectively distributed even within the fovea (Poletti, Rucci, & Carrasco, 2017). Considering the now substantial evidence that the foveal portion of the SC map is activated when the behaviorally relevant location is at the center of the visual field (e.g., during parafoveal smooth pursuit as in Hafed & Krauzlis, 2008), we expect that SC neurons with foveal RFs would display similar attention-related modulation as we found here. However, to the best of our knowledge, there have not yet been studies documenting the attention-related modulation of neurons with foveal RFs and the possible influence of microsaccades.

      We agree with the reviewer that these are interesting points, and have now added a new paragraph in the discussion (final paragraph) to address this point.

      1. The authors do not report behavioral performance. Presumably the task is very easy, but I wonder if reaction times and performance correct was related with the attentional effects and how did it change with respect to microsaccade direction, e.g., were subjects' reaction times shorter at the cued location also when microsaccades were directed at the opposite location? I think this information would be very valuable.

      We agree it is valuable to document the behavioral performance; we had omitted this because this is the same task we have used in previous studies which do include such behavioral documentation.

      To address the reviewer’s comments, we added an analysis and plot documenting the hit and false alarm rate for each subject in each experimental session. To accommodate this new plot, we have now divided the original Figure 1 (which included task, neuronal data and microsaccades) into a new Figure 1 (task, behavior, and neuronal data) and a new Figure 2 (microsaccades). The new plot showing hit and false alarms is Figure 1b in the revised manuscript.

      The task was not especially easy – we adjusted the amplitude of the color saturation change to be just slightly above the threshold for detection; hence, the hit rates were generally between 75-90%. The performance was very consistent across sessions in our well-trained monkeys, and the low rate of false alarms for ‘foil’ changes provides behavioral confirmation that they attended to the correct stimulus location.

      To address the comments about reaction time, we have added a new plot to our new Figure 2 (Figure 2c) showing the monkeys’ hit rates (top) and joystick release times (bottom) subdivided based on whether there were no microsaccades, microsaccade towards, and microsaccades away from the cued location (-50 to 50ms relative to cued stimulus change onset). These plots show that when there were no microsaccades, behavioral performance was at least as good as with microsaccades. When there were microsaccades, reaction times were slower when microsaccades were directed away from the cued location. As the reviewer may have anticipated, these effects again confirm that differences in attentional state as evident in task performance covary with the direction of microsaccades, and we thank them for the suggestions. We now added a new paragraph in the results to describe these findings.

      1. Another important difference in the paradigm used in Lowet et al vs the one described in this manuscript is that in Lowet et al monkeys were instructed to saccade toward the target position at some point during the trial after the cue and the target presentation. Hence, monkeys presumably prepared the saccade and held off its execution during the time the cue and the target were presented. This was not the case in the current paradigm, where the monkey is instructed to maintain fixation as in a standard spatial cueing paradigm. I wonder if this difference may explain some discrepancies in the results.

      This is a very good point. As mentioned in our reply to point #1 above, previous studies (Yu et al., 2016) have shown that human and monkey subjects tend to make microsaccades in the same direction as their upcoming saccadic choices. As pointed out by the reviewer, in the Lowet et al. study the directions of microsaccades might be related to the motor preparation of the upcoming choice saccade as well as related to the allocation of attention. In contrast, in our experiments, monkeys reported their choice by releasing the joystick and were prohibited from making larger saccades.

      We agree this can be an important factor for the differences in the results, and we now address these points in the second paragraph of discussion.

      Reviewer #2 (Public Review):

      This is a correlative study with the main result that microsaccades do not alter attention-related modulations of neuronal activity. This is an important question, speaking to the origin of one of the mind's most fundamental processes. The experimental manipulations and analyses are well chosen, carefully conducted and visualized. They include critical controls for alternative explanations.

      Thank you for your constructive comments.

      To ascertain their claims, however, it is important that the authors cover their ground. In pursuit of that, a few important analyses are required.

      1. Did the manipulation of attention work? In the present version of the manuscript, the authors do not report behavioral results, which is necessary to confirm that the cue was successful in manipulating attention. That is, the observed modulation in firing (in RF vs outside of RF) should be related to a behavioral advantage in sensitivity to changes at the cued location. To confirm the link of the neural results to attention (rather than, say, just the cue), the behavioral results provide opportunities for critical tests. One way to do this would be to analyze neural firing rates as a function of response rather than cue location (provided subjects made enough errors). Note: A detailed discussion of why the cue cannot be equated to attention can be found in Laubrock et al. (2010, Atten Percept Psychophys; https://doi.org/10.3758/app.72.3.683).

      Yes, the manipulation of attention worked. As suggested, we now document the effectiveness of the attention manipulation by plotting the hit and false-alarm rates for each subject in each experimental session (new Figure 1b). We also confirmed that the SC neuronal attention-related modulation depended on subjects’ behavioral response (new Figure 1d). We also note that these same attention manipulations have been used in previous studies examining the neuronal mechanisms of attention.

      1. Were all microsaccades detected? One of the main results of the study is that attention-related modulations were observed even in the absence of microsaccades. These results hinge on successful detection of all microsaccades, even at a very small scale. Given the video-based eye tracking the authors will have missed a (possibly large) number of smaller microsaccades (Poletti & Rucci, Vision Res, 2016; https://doi.org/10.1016/j.visres.2015.01.018). This concern is exacerbated by the fact that eye tracking was monocular, such that a validation of detected microsaccades based on the signal in the other eye could not be performed.

      We have performed additional microsaccade detection analyses using both more stringent and more lenient thresholds (the "lambda" value of Engbert & Kliegl, 2003). We have verified that our findings are robust over a range of detection thresholds and include a new supplemental figure to demonstrate this point (Figure 4 – figure supplement 2).

      1. Relation to previous claims of causality Hafed (2013, Neuron) reported perceptual changes in attentional cueing that covaried with the occurrence of microsaccades. Hafed (2013) argued that microsaccades might be underlying the performance changes commonly attributed to covert shifts of attention. This point seems central to the current paper's line of argument and should thus be discussed in detail with respect to the current findings. At present, the paper by Hafed (2013) is not cited in the current manuscript when its conclusions may need reconsideration based on the current results.

      We agree, and a similar point was raised by Reviewer #1. We have expanded the main text based on your recommendations.

    1. Author Response:

      Reviewer #2 (Public Review):

      In this manuscript, Johnson Jr, et al. investigated the potency and selectivity of NBI-921352, a novel Nav1.6 blocker, on different voltage-gated sodium channel (VGSC) isoforms as well as on epileptic Nav1.6 variants. NBI-921352 exhibited exquisite selectivity against Nav1.6 channels, preferentially acting on activated channels, and inhibited tested Nav1.6 variants at similar potency except for the R1617Q, a variant that is proximal to the predicted binding site of NBI-921352. Brain slice recordings revealed that NBI-921352 effectively attenuated AP firing in excitatory pyramidal neurons, but not in inhibitory interneurons. Seizure assays in three rodent models demonstrated the protective effect of NBI-921352 on electrically induced seizures in all three models.

      Nav1.6-selective blockers have been reported before, but their relative selectivity between Nav1.6 and Nav1.2 are not great; NBI-921352 is the first blocker that shows a high Nav1.6 selectivity over Nav1.2, making it a promising candidate for the development of therapeutics of Nav1.6-related disorders including early onset encephalopathies and mental disabilities. The study on epileptic variants of Nav1.6 further supports its potential use for the treatment on SCN8a-related diseases, which was confirmed by the seizure assays. NBI-921352 will also be a valuable pharmacological tool in VGSC-related basic research.

      Despite all the wonderful work the authors have completed, there are some issues should be addressed.

      First, different protocols were adopted to examine the selectivity of NBI-921352 on different VGSC isoforms. NBI-921352 is a state-dependent inhibitor, holding potential may alter the potency of NBI-921352 by changing channel activation/inactivation state, and therefore, difference in voltage-clamp protocols could introduce bias in the comparison of selectivity among VGSCs.

      Second, a depolarized holding potential (-45 mV) was used in the study to determine IC50 of NBI-921352 on most VGSCs, which is uncommon under physiological conditions. The selectivity of NBI-921352 on Nav1.6 vs other VGSCs under physiological conditions could be different compared to the values reported here. It is better to hold cells at physiologically-relevant membrane potentials or using action potential waveforms derived from real AP recordings in neurons. The authors should discuss these limitations, and possible impact on their assessment of selectivity against other VGSCs in their native cellular backgrounds.

      There are pros and cons to any method of determining selectivity and we acknowledge that none of them are ideal for all purposes. We chose to focus on what we refer to as “molecular selectivity,” the fundamental ability for a compound to bind to the channel and stabilize the high affinity conformation. We accomplish this by choosing voltages that promote the same fraction of channels to be in the high affinity (inactivated) state. This contrasts with “functional selectivity” that may be largely driven by the distinct state-dependence of different isoforms. Our approach avoids assumptions about what the physiologically relevant voltage is since that voltage can vary depending on the tissue or cell type. For any given isoform there may be multiple physiologically relevant voltages.

      Consistent with this philosophy, we bias all the channels to be in their highest affinity state (inactivated) and then use this maximal potency to compare selectivity. At more hyperpolarized, voltages, potency for all isoforms will tend to be somewhat less. We are adding more explanation of our rationale to the text, and we are adding supplemental data giving more insight into the impact of voltage on potency.

      Figure 1-figure supplement 2 shows the potency of NBI-921352 after holding at a membrane potential nearer the physiologic range (-62mV). Potency at this voltage (IC50 = 53 nM) was similar to that at fully inactivated potentials evaluated in the primary potency assay described shown in Figure 1. For this reason, we anticipate that the selectivity ratios described in the manuscript will be similar to those in physiologic conditions. A note to this effect has been added to the results section.

      Third, Nav1.6 is highly expressed in Purkinje neurons and motor neurons, and plays important roles in motor system. Did the authors observe any motor impairment in the behavior studies? It would be informative to examine the effect of NBI-921352 on AP firing and resurgent currents in Purkinje neurons.

      Fourth, wrong statistical test was used in the current-clamp study, and there is no description of statistical methods used for seizure assays. Please add a section of statistical analysis in Materials and Methods, and list the statistical analysis method used in each experiment.

      We have reanalyzed the data in figure 4 using an AUC based analysis and this is now described in the legend and p values shown in the data transparency file. We have added statistical analysis methods to the methods and the figures legends

    1. Author Response:

      Reviewer #3:

      Weaknesses:

      In utero electroporation as well as other in vivo gene manipulation techniques do not allow fine manipulations of expression gradients. Therefore some conclusions of the paper are not fully supported. Although the data presented in the paper clearly show that Nuropilin1 expression level is important for establishment of homotopic connections, it does not show directly that the gradient of expression indeed is in play, as suggested by the authors. Another week point is that there is no direct evidence that the Neuropilin1 protein level follows the mRNA expression gradient.

      Therefore it remains an open question, whether it is a gradient of expression or a sharp border of cellular response to higher-lower levels of Neuropilin1 that controls area specific connections within somatosensory cortex.

      Another weak point is that the paper relies on in utero electroporation solely. This technique with all its advantages, has some disadvantages too. One of them is high variability of individual experiments. On the other hand, it targets only subsets of cells, and therefore is not the best to address cell extrinsic mechanisms, especially those that involve expression gradients.

      The reviewer raised interesting comments. We would like to clarify that we never attempted to disrupt the gradient per se but to alter Nrp1 expression in individual cells. This evaluates how their projections are affected by the Nrp1 expression imposed by their location and this contributes to understand how the gradient contributes to connectivity. Nevertheless, the reviewer raised a fair point in that we realized it was important to revise the expression of Nrp1 to sharpen our interpretations. In this revised version we have performed in situ hybridization to investigate if we were dealing with gradients or sharp borders. Surprisingly we found unexpected patterns of expression. This is now shown in Figure 1. Interestingly, the expression pattern of Nrp1 in the postnatal brain is highly dynamic. At early stages of CC development in the cortex shows a discontinuity in L2/3 neurons of the SS, rather than a gradient. Nrp1 expression is upregulated after P7 in a manner that suggests a gradual activation from lateral to medial SS cortex. At P16, few cells are positive but they are equally distributed throughout the S1 and S2 cortex. Therefore, we have modified the text and avoided referring to the gradient. This gradient was described at embryonic stages and P0 (Zhou et al., 2013). The new version of the manuscript also adds results showing that changes in Nrp1 expression do not detectably modify contralateral innervation at P10 and that the S2 column is not formed at this stage. The quantification method suggested by reviewer #2 allows us to conclude that the reduction in the S2 columns in the shNrp1 condition, although is statistically significant. Together, the new data provides a better understanding of our phenotypes and explains why the phenotypes of CAG-Nrp1 and shNrp1 are so similar and both block innervation in S2, since they both disrupt the normal transient expression.

    1. Author Response:

      Evaluation Summary:

      In a set of in vitro and in vivo experiments the investigators demonstrated that coating of urinary tract catheters with fibrinogen-degrading substances reduced adhesion and colonization with a broad range of bacteria relevant in the pathogenesis of CAUTI. This approach might, therefore, be interesting for prevention of CAUTI as an alternative to catheters coated with antibiotics.

      We appreciate the summary done. However, this coating doesn’t aim to degrade fibrinogen, it simply reduces fibrinogen’s ability to adhere to the catheter. “Fibrinogen anti-fouling” would be a more accurate description. Additionally, this study not only focused on bacteria but also fungi and thus “microbial” would be a more accurate description of the scope of this study.

      Reviewer #1 (Public Review):

      The major strengths are a clear hypothesis and the consecutive description of a set of experiments, each time demonstrating the next step in the pathogenetic pathway.

      We thank the Reviewer for their supportive and enthusiastic response.

      The weakness is that the experiments stop where the clinical relevance would start. Are the in vitro and in vivo animal experiments representative of the in-human situation?

      We appreciate the insightful comments provided by the Reviewer. This works has a clinical potential based on our data that shows that our use of urine as a media to grow our pathogens for in vitro testing as well as our mouse model of infection recapitulates human CAUTI. Some of our findings are shown in Flores-Mireles, Mbio 2016; Flores-Mireles, J Urol 2016; Flores-Mireles, Nat Rev Microbio 2015; and Flores-Mireles, STM 2014. To emphasize the clinical relevance of this study, we have changed the introduction and discussion.

      Moreover, it does not become clear from the discussion whether this approach of coating is technically feasible. This step towards in-human testing will determine the impact and significance of the work.

      We thank the Reviewer for this feedback. To improve the clarity of the technical feasibility of this coating, we have addressed it in the introduction, results, discussion, and methods.

      Reviewer #2 (Public Review):

      This article provides a detailed account of both in vitro and in vivo experiments that: • Establish the role of fibrinogen (Fg) in the etiology of catheter-associated urinary tract infections (CA-UTI) • Investigate the prevention of CA-UTI with the use of LIS catheters, containing anti-fouling modifications (liquid infused silicone) to prevent the interaction between Fg and common uropathogens.

      The study follows up on previous (by the investigators) research on the role of Fg on the attachment of uropathogens and the formation of biofilms. It is a comprehensive article that contains a detailed description of the following experiments:

      1. In vivo experiments demonstrate the interaction between Fg and uropathogens in the bladder and the catheter lumen.
      2. The manuscript provides in vitro evidence that Fg-coated silicone catheters enhances the binding of uropathogens, compared to uncoated or bovine serum albumin coated catheters.
      3. The manuscript describes the development of the LIS catheter, in which a catheter is drained in silicone gel. It demonstrates the effects of this process on the catheter weight, length and inner and outer membrane diameter.
      4. The manuscript provides in vitro evidence that the use of a LIS-catheter reduces Fg deposition and uropathogens binding.
      5. Using in vivo mouse experiments, the study provides evidence that when introducing a variety of uropathogens and thereby inducing CA-UTI, the use of LIS-catheters reduces o Fg deposition and uropathogens binding on the catheter o uropathogens colonization of the kidneys, spleen and heart
      6. Finally, the manuscript demonstrates in mice that the LIS catheter reduces protein deposition on catheters in case of CA-UTI

      The study has a clear structure and there is little to criticize about the study methods. For steps 4 to 6, they used a control group of uncoated catheters, which they compared with a Mann-Whitney U test. The results, although not all statistical significant, provide convincing evidence for the efficacy of LIS catheters within this study. Another strength of the study is the simplicity of the development and (probably) the limited costs of a LIS catheter, so that it can also be applied in the future in less wealthy countries.

      I identified two potential weaknesses of this study. Addressing these would improve the replication of these findings, the set-up of follow-up studies, also outside your study group, and it would help in the translation and implementation of the LIS catheter in humans.

      First, it is insufficiently clear from the methods how the LIS catheter was developed exactly, and specifically the LIS-catheter that was used for the mice experiments. This complicates the understanding and replication of these study findings. It is not exactly clear for me if these catheters were drained in liquid infused silicone or whether liquid infused silicone was infused into the catheter tuber before insertion? For how long were the LIS-catheters that were finally used for the mice experiments incubated in silicone oil?

      We thank the Reviewer for pointing out where our explanation was lacking. This liquid infused modification was made by submerging the silicone tubing into silicone oil for at least 5 days (for in vitro assay catheter materials) or 30 min (for catheters used in in vivo assays). This information has been added to the results section as well as materials and methods section.

      Second, the article demonstrates that the drainage of a catheter in silicone gel increases the weight, length, inner and outer diameter of the mouse catheter. These results seem to stand alone and are not addressed in the discussion.

      We thank the Reviewer for pointing out this deficiency. These results now are discussed.

      What influence this could have on the urinary flow and the introduction/ascent of uropathogens?

      Currently, we are performing an in-depth characterization of the LIScatheter and their effect in urine flow. This evaluation is out of the scope of this study. This in-depth study will be part of a follow publication. Regarding, the introduction/ascending of uropathogens, our colonization studies have showed a decrease of colonization in the kidneys, suggesting that ascending of the pathogen to the upper urinary tract is affected. Our data shows (Fig. 4) that this modification reduces initial binding of both pathogens and deposition of Fg.

      Could it be that the effect of the silicone gel diminishes over time, which necessitates a catheter change? Do you have evidence on the stability of this polymer?

      We are so excited that the reviewer is thinking about the follow up steps to this study. Currently, we are investigating the long-term stability in urine conditions in vitro, in the bladder in vivo, and in prolonged CAUTI. However, these analyses are out of the scope of this study and will be part of further publications. A study done by Sotiri et al (2018) has shown that this modification has long-term stability in vitro.

      Would it be possible to infuse silicone oil when the catheter is in situ?

      We appreciate the Reviewer’s comments. Based on the time that is needed to fully infuse the catheter, it will be difficult to do it in situ. This will need further investigation under urine conditions.

    1. Author Response:

      Reviewer #1 (Public Review):

      The investigators' goals were to describe the epidemiology and kinetics of post-acute covid lung sequalae and to determine the risk factors predictive of persistent lung impairment. A major strength of the study is the longitudinal observation through 6 months with protocolized clinical assessments that included patient-reported outcomes, lung function tests, inflammatory marker testing, and computed tomography of the chest, in a reasonably sized cohort that reflects the spectrum of disease severity in the pre-vaccination era. We learn a great deal about the different patterns of recovery in this group of COVID-19 survivors. The primary epidemiologic finding is that 52% of survivors continued to have symptoms at 6 months, while up to 72% of those with severe COVID requiring ICU level care continued to have lung abnormalities by chest imaging. This confirms general observations of "long covid" which also encompasses non-lung effects. While lung disease is less common in those with milder disease, the proportion of patients who were never hospitalized but experienced persistent symptoms is striking (50%), with lung function impairment in 17% at 6 months. As expected, the patients who had the most severe disease-those who needed the ICU-had the highest degree of chest imaging abnormalities. The kinetics of recovery is a significant observation: Figure 3 shows that most of the post-acute recovery in structural lung abnormalities occurs in the first 3 months and slows down thereafter, particularly for the hospitalized non-ICU patients. The investigators then embarked on a sophisticated analysis to determine how to predict persistent lung abnormalities (as detected by chest CT) at 6 months. When analyzed individually, among 50 clinical characteristics or lab values, the strongest unfavorable risk factors were elevated IL-6 (an inflammatory cytokine that is the target of tocilizumab) and CRP (c-reactive protein). Other variables that were strongly associated with CT abnormalities included immunosuppressive therapy, ICU stay as well as pre-existing conditions. When machine learning techniques were applied, risk factors that correlated with each other could be grouped together, and the patients could be categorized as low, intermediate, and high risk for delayed pulmonary recovery. As expected, known factors for COVID19 infection (age, male sex, medical comorbidities) and disease severity (need for oxygen therapy, ICU care and antibiotics) were more frequent in the intermediate and high risk groups. These predictive factors at acute COVID and day 60 follow-up mostly held up when tested against part of the cohort that was not used for analysis. Interestingly lung function impairment as measured by pulmonary function tests were only weakly correlated with persistent and severe chest imaging abnormalities.

      The novelty of this study lies in taking the epidemiology a step further with a machine learning analysis to determine which clinical characteristics and chest imaging features at the onset of acute COVID-19 are predictive of later persistent disease. One limitation of this study, however, is that it was conducted on patients in the early part of the pandemic, prior to the widespread use of remdesivir and corticosteroids/anti-cytokine therapies, that are now considered standard of care. Based on these findings, we can now hypothesize that current treatments are likely to reduce the impact of long-covid.

      We would like to thank the reviewer for careful study of the manuscript and appreciation of our work. We agree, that our longitudinal cohort and its hospitalized, severe COVID-19 subset in particular encompasses the patients, for whom the therapeutic armamentarium was limited and far from the therapeutic options available now. Whether novel anti-viral and anti-inflammatory medication as well as, in case of the vaccinated patients, the immunization status may accelerate the recovery or reduce the pulmonary damage is a matter of current research also in our center. We address this issue in the Discussion section to support a clear interpretation of the data by the interested reader.

      Machine learning (artificial intelligence, AI) is now being increasingly used to answer clinical questions on limited cohorts; the application of machine learning in this study contributes to our conceptual understanding of how clinical characteristics and biological factors cluster together to contribute to long-term COVID outcomes. Namely, the profound inflammation that characterizes severe acute COVID-19 pneumonia and poor early outcomes also contributes to chronic lung damage in survivors. In addition, a robust antiviral immune response (as seen with elevated anti-viral antibodies) without elevated systemic inflammatory markers were associated with less severe chest imaging patterns, also supporting the notion that an individual's immune response to the virus is responsible for the trajectory of disease. As noted, a significant proportion of non-hospitalized patients also suffered from chronic lung impairments. Taken together, the impact of prolonged convalescence on the workforce, healthcare, and individual lives should not be underestimated. These results underscore the paramount need for continued public health measures and vaccinations to prevent COVID-19, particularly for the most vulnerable individuals (older, immunocompromised, and with preexisting health problems). These observations provide additional biologic justification for the use of agents directed at reducing lung inflammation early in the course of disease, and potentially at an early post-recovery time point (i.e 2 months). Machine learning algorithms may one day help clinicians decide which patients should be targeted for additional therapies after the acute phase. With further study, implementation of AI to real world medicine may be on the horizon.

      We agree with the Reviewer that machine learning algorithms can overcome limitations of ‘canonical’, ordinal and generalized regression methods in the multidimensional setting i. e. when the number of available clinical parameters approaches or exceeds the number of observations/patients. Consequently, machine learning or AI allows for serial screening of medical record data at low cost and supports diagnostic and therapeutic decisions. We discuss those two aspects in the revised manuscript in the context of acute COVID-19 course prediction and long COVID prediction and phenotyping in light of the recent literature [1–4,6].

      Reviewer #2 (Public Review):

      This is a potentially valuable manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. The cohort is well designed and the selected tools for analysis are appropriate.

      We thank the Reviewer for the careful study, critic and appreciation of our work.

      While this finding is potentially of high importance for clinical practice, the endpoints are inconsistently defined, and certain components of the machine learning and clustering analyses are difficult to interpret as presented. It is therefore challenging to understand whether the conclusions are justified by the analysis.

      We apologize for this unclarity. In the revised manuscript, we precisely define the analysis endpoints (any radiological lung findings at the 6-month follow-up, radiological lung abnormalities with CT score > 5, lung function impairment and persistent symptoms at the 6-month follow-up) of the analysis; see: Introduction and Methods/Study design. We also indicate the numbers of participants reaching those endpoints in Table 3.

      Several components of the analysis are confusing and would benefit from further elucidation:

      1) The authors do not clearly define "delayed pulmonary recovery". My sense is that they are using several radiologic based definitions rather than their functional definition (defined by FEV1, FEV:FVC & DLCO) of lung function but this is never explicitly stated. Are the functional outcomes and symptomatic recovery considered in any of the analyses other than correlations with radiologic findings in S1?

      As described above in our previous response, the prime focus and primary endpoint of the analysis was the presence of radiological lung abnormalities at the 6-month follow-up. Our motivation to focus on radiological endpoints was to focus on the potential development of persistent structural lung abnormalities, fibrosis and interstitial lung disease following COVID-19, as observed in SARS-CoV-1 patients [7,8]. Of note, lung function parameters were only weak correlates of radiological impairment as shown in Figure 3 – figure supplement 1 – 3 and our previous work [27]. This finding is in line with numerous studies in ILD patients which demonstrate a low sensitivity of lung function testing (especially FEV1 and FVC assessment) in patients with early interstitial lung disease (ILD) [10,11]. In addition, we could not exclude a pre-existing, COVID-19-independent impairment of lung function in a subset of the study participants suffering from pulmonary diseases, obesity and/or cardiovascular diseases (Table 1). Thus, lung function parameters only partially reflect COVID-19 mediated lung injury and convalescence.

      Nevertheless, we agree, that clinical and functional endpoints are of great interest for the scientific and clinical community. For this reason, we present additional results of univariable risk modeling for long-term (6-month follow-up) symptom persistence and lung function impairment (Figure 5, Appendix 1 – table 2), the results of machine learning modeling for those outcomes (Figure 9, Appendix 1 – table 5) and discuss the findings. We also present the prevalence of such long-term manifestations and lung function impairment in the Low-, Intermediate and High-Risk clusters of the study participants defined by non-CT and non-lung function clinical features (Figure 8).

      2) To this end, I was surprised that the functional definition and symptomatic recovery were not used as the primary endpoints. The functional definition and resolution of symptoms seem most important for the recovering patient so seems like the more important outcome. However, in Figures 5-7, it is often not clear whether the functional outcome is being considered at all.

      As mentioned above, the focus of the study was the assessment of structural lung impairment following COVID-19 and both, lung function parameters as well as symptom burden moderately correlate with structural lung damage (Figure 3 – figure supplement 1 – 3) – a phenomenon observed previously in SARS-CoV-1 [7,8]. Although the symptom burden and its resolution during follow-up are of major importance for the individual patient during post-acute recovery, these parameters are not a good marker for the potential long-term pulmonary outcome. E.g. younger patients with moderate to severe lung damage may demonstrate only mild pulmonary symptoms during post-acute recovery, but the structural damage may be associated with severe impairment at long-term follow-up due to progression of lung fibrosis or age-related decrease of functional pulmonary capacity [11]. Still, we agree with the reviewer that the follow-up on symptoms and lung function is of interest for the reader and additionally included those outcomes in the univariate and multi-parameter risk modeling. In addition, we present the frequencies of symptom persistence and lung function impairment in the low-, intermediate- and high-risk participant clusters defined solely by non-CT and non-lung function clinical parameters. See previous issue for more details.

      3) For the clustering in figure 5, I am uncertain how CT severity score >5 & CT abnormalities cluster separately, when these 2 outcomes appear to logically overlap. Specifically, does the CT abnormalities outcome include patients with the high severity score outcome? In other words, are patients in the "high severity" group a subset of patients with "CT abnormality"? If not a subset, then the CT abnormality should be labeled "non-severe CT abnormality". This could all be clarified by listing the number of patients in each group and showing with a Venn diagram whether there is any overlap.

      We apologize for the lacking clarity in this matter. As pointed by the reviewer, the patients with CT abnormalities scores > 5 points were a subset of the participants with any CT abnormalities. The same was true for the GGO-positive subgroup. We agree, that the overlap between the radiological outcomes obscures the message of the clustering and modeling results. To overcome this, we removed the GGO outcome variable from the analyses in the revised manuscript. In the revised manuscript, we clearly differentiate between mild (CT severity score ≤ 5) and moderate-to-severe radiological abnormalities (CT severity score > 5) in feature (Figure 6) and participant clustering (Figure 8). Frequencies of mild and moderate-tosevere CT abnormalities in the study collective stratified by the severity of acute COVID-19 are presented in Figure 3 – figure supplement 3B. Numbers of the study participants with any, mild or moderate-to-severe CT abnormalities at the subsequent follow-up visits are listed in Table 3.

      4) For the same reason, figure 4 is hard to interpret. Are CT severity >5 being compared to those with normal CTs only or those with normal or mild / moderate CTs? Please provide more specific definitions of normal, "CT abnormality" and "severe CT abnormality" and provide the number of people in each category and specify the comparator groups in all analyses.

      We are sorry for the confusion. In Figure 4 of the initial manuscript, any CT abnormalities, GGO-positivity and abnomalities with CT severity score > 5 were analyzed as separate outcome variables. The baseline was specific for the given explanatory variable, e. g. for the ICU stay this was the mild COVID-19 group or for the elevated IL-6, normal serum IL-6 levels. In the revised manuscript we present the modeling results in an abbreviated form for the 5 strongest co-variates of any CT abnormalities, moderate-to-severe CT abnormalities (CT severity score > 5), persistent symptoms and lung function impairment each (Figures 4 – 5). We indicate the baseline and the n number in the plots. The complete summary of univariable risk modeling with the requested information is provided in Appendix 1 – table 2.

      5) Similarly, how can GGO @V3 be used a potential explanatory variable for the outcome CT abnormalities @V3 when these 2 variables are clearly non-independent. Inclusion of highly related and likely correlated variables may throw off the overall conclusions of the clustering analysis.

      We agree with the editor and the reviewer that this representation was confusing. For this reason and the reasons described in Response 4, we removed the GGO variable from the revised analysis pipeline and differentiate between mild (CT severity score ≤ 5) and moderate-tosevere (CT severity score > 5) radiological lung abnormalities in modeling and machine learning classification. In addition, we define symptom and participant clusters solely with the non-CT parameters (Figure 6 – 7). To investigate the association of mild and moderate-to-severe CT abnormalities with other non-CT variables (Figure 6, Supplementary Figure S5), the CT features are assigned to the no-CT clusters by a k-NN-based label propagation algorithm, i. e. semi-supervised procedure [12,13,26] employed in our recent paper as well [6].

      6) In Figure 6, the criteria for the low, medium, and high-risk subsets are unclear. Is this high risk for persistent functional abnormality, radiologic abnormality, or both? Why were 3 sub populations selected? Was this done subjectively based on the clustering algorithm?

      This is an important issue. The study subject clusters were named according to the increasing frequency of any radiological lung abnormalities in the respective cluster (Figure 8A). We stress this more clearly in the revised manuscript. In addition, as suggested by the reviewer above, we show the frequency of functional lung impairment and persistent symptoms in the study participant clusters. There are multiple criteria for choice of the optimal clustering algorithm and the optimal number of clusters. In our cohort, two criteria for the choice of optimal clustering algorithm were applied:

      1. High fraction of the data set variance ‘explained’ by the cluster assignment (ratio of between-cluster sum-of-squares to the total sum-of-squares, Figure 6 – figure supplement 1A and Figure 7 – figure supplement 1A)
      2. The relatively highest cluster stability or reproducibility of the clustering structure in 20-fold cross-validation (Figure 6 – figure supplement 1B and Figure 7 – figure supplement 1B) [15] The optimal number of clusters of the study participants based on non-CT study variables was based on the algorithm (SOM + hierarchical clustering algorithm, see Reviewer 2, Issue 4) [17,18], as done usually in the unsupervised or semi-supervised setting. The prime criterion for the optimal cluster number was the bend of the curve of within-cluster sum-of-squares versus cluster number as presented in Figure 7 – figure supplement 1D. In addition, this decision was supported by a visual analysis the SOM node dendrogram (Figure 7 – figure supplement 1E) and the curve of the crossvalidated stability statistic (classification error) vs cluster number (Figure 7 – figure supplement 1F) [15].

      7) The accuracy and sensitivity of the machine learning approaches shown in S5 & S6 are somewhat limited. Please comment on why such highly granular data can only provide limited prediction about degree of lung damage post infection. Are there missing data types that might make the algorithm more predictive?

      This is an important issue that deserves more discussion in the revised manuscript. Each of the machine learning classifiers presented in the previous and the revised version of the manuscript was extremely sensitive and specific at predicting the outcomes in the training data encompassing the entire cohort (Supplementary Figure S11), as expected. However, their performance was way worse in repeated holdout (previous version) or 20-fold cross-validation (revision, Figure 9) used here as surrogate tools used to check the sensitivity and specificity with ‘unseen’ test data. We believe that there are two prime sources of such suboptimal performance: the size of the training set and the choice of the classifier. To address the first limitation, the following alterations to the analysis pipeline were introduced:

      1. We do not restrict the analysis to the subset of the CovILD study with the complete set of all variables. Instead, the non-missingness criterion is applied to each outcome variable separately (any CT abnormalities: n = 109, moderate-to-severe abnormalities: n = 109, lung function impairment: n = 111, persistent symptoms: n = 133).
      2. We altered the internal validation strategy. Instead of the repeated holdout approach applied to the machine learning classification, which strongly limits the size of the training data set, we switched to 20-fold cross-validation both for the cluster algorithms (Figure 6 – figure supplement 1BD and Figure 7 – figure supplement 1BF) [15] and the machine learning models (Figure 9, Appendix 1 – table 5) [19]. To address the second issue, the following changes were introduced:
      3. We compare the performance of a broader set of classifiers representing different classes of machine learning algorithms provided by the R package caret [19] (tree model: C5.0 [20], bagged tree model: Random Forests [21], support vector machines with radial kernel [22], shallow neural network: nnet [23], and elastic net regression: glmnet [24]) (Figure 9, Appendix 1 – table 4).
      4. Finally, a model ensemble representing a linear combination of the classifiers presented above developed with the elastic net regression algorithm (Figure 9, Figure 9 – figure supplement 2) and tools provided by caretEnsemble package [25]. Such model displayed better performance at predicting any CT abnormalities and persistent symptoms than single classifiers (Figure 9, Appendix 1 – table 5). Finally, we agree with the Reviewer, that the input variable set, despite its size, was still not complete. We believe that inclusion of other inflammatory markers recorded during acute COVID19 and at the 60-day follow-up may additionally improve the prediction of the radiological abnormalities at the 6-month follow-up visit. Of note, our data set missed important readouts of cellular immunity such as neutrophil levels or neutrophil: lymphocyte ratio (NLR) and blood parameters for the mild COVID-19 subset. We discuss this issue in more detail in the revised Discussion section.

      8) The authors state that "the sole application of a lung function measurement at screening for subjects at risk of delayed lung recovery may bear insufficient sensitivity". I am not sure that I agree with this assessment. From the perspective of a patient, full recovery of lung function with limited or no residual symptoms, even in the presence of residual chest CT abnormalities, seems like a favorable outcome. I would suggest either changing this statement or providing citations that associate residual chest CT abnormalities (in the absence of residual functional lung dysfunction) with adverse long-term outcomes. Do the authors hypothesize that persistent radiologic abnormalities may predate organizing pneumonia which will ultimately become symptomatic?

      We thank the reviewer for the interesting point of discussion. We agree with the reviewer that the functional status and symptom burden is of major importance for the individual patient in the postacute phase of COVID-19. Still, prioritizing lung function over mild structural lung abnormalities may pose two major problems. First, as previously discussed, lung function testing has a rather low sensitivity to detect early ILD [10,11], is not a good prognostic marker for long-term clinical outcomes and may not correlate well with patients' symptom burden. For instance, a patient with a normal lung function status may still be highly symptomatic (e. g. due to reduced capacity of respiratory muscle function) [7] and/or demonstrate structural lung abnormalities (e.g. it has been shown for various ILD that lung function test such as FVC and FEV1 may be normal even in pronounced disease and lung function testing is not sufficient to rule out ILD [10]). Second, to date, it is not known if persistent structural lung abnormalities following COVID-19 (even when mild) are at risk for progressing at long-term follow-up. Especially, sub-clinical structural changes may behave like incidentally detected interstitial lung abnormalities (ILAs) and develop to symptomatic progressive fibrotic interstial lung disease including IPF [11]. For this reason, we think that further pulmonary follow-up is necessary for patients with structural lung abnormalities due to COVID-19 and a sole focus on lung function is not sufficient to assess pulmonary COVID-19 outcomes [9].

      9) The authors note selection bias against ordering CT and perhaps inflammatory markers early during infection as a limitation. I would suggest a sensitivity analysis to understand whether this misclassification will impact the model's predictions.

      We now address this issue in a more detailed way. As shown in Figure 1, there was indeed a significant dropout of participants during the study due to missing the longitudinal visits and missingness of the longitudinal variable set. This phenomenon was indeed the most evident for the mild COVID-19 patients, who lost interest at the participation most likely because of subjective complete convalescence. This issue is discussed now as a limitation in the revised manuscript. In the revised manuscript, we investigated highly influential factors for clustering and machine learning classifiers. To determine, which variables played the most important role for the clustering of the study individuals, we applied the explanatory variable ‘noising’ procedure initially described by Breiman for the random forest algorithm [21] and compared the ‘explained’ variance (ratio of between-cluster sum-of-squares to the total sum-of-squares) of the initial clustering structure with the clustering structures generated in the datasets with noised variables. Although this algorithm is not free from shortages such as blindness to tight correlations, it may provide a coarse measure of the variable’s impact on the cluster formation (Figure 7 – figure supplement 2). For three of the machine learning algorithms tested importance statistics were extracted from the models: (1) for the C5.0 algorithm, the percentage of variable usage in the decision tree, (2) for the Random Forests algorithm, the delta of Gini index obtained by variable noising [21] and (3) for the elastic net/glmNet procedure, the absolute values of regression coefficients β [24] (Figure 9 – figure supplement 4 – 7). The technical details are provided in Methods, the cluster and model importance data are discussed in the manuscript text.

      References

      1. Gutmann C, Takov K, Burnap SA, et al. SARS-CoV-2 RNAemia and proteomic trajectories inform prognostication in COVID-19 patients admitted to intensive care. Nat Commun 2021;12. doi:10.1038/S41467-021-23494-1
      2. Benito-León J, Castillo MD Del, Estirado A, et al. Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study. J Med Internet Res 2021;23. doi:10.2196/25988
      3. Demichev V, Tober-Lau P, Lemke O, et al. A time-resolved proteomic and prognostic map of COVID-19. Cell Syst 2021;12:780. doi:10.1016/J.CELS.2021.05.005
      4. Estiri H, Strasser ZH, Brat GA, et al. Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Med 2021;19. doi:10.1186/S12916-021-02115-0
      5. Sudre CH, Murray B, Varsavsky T, et al. Attributes and predictors of long COVID. Nat Med 2021;27. doi:10.1038/s41591-021-01292-y
      6. Sahanic S, Tymoszuk P, Ausserhofer D, et al. Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey. Clin Infect Dis Published Online First: 26 November 2021. doi:10.1093/CID/CIAB978
      7. Hui DS, Wong KT, Ko FW, et al. The 1-Year Impact of Severe Acute Respiratory Syndrome on Pulmonary Function, Exercise Capacity, and Quality of Life in a Cohort of Survivors. Chest 2005;128:2247–61. doi:10.1378/CHEST.128.4.2247
      8. Ng CK, Chan JWM, Kwan TL, et al. Six month radiological and physiological outcomes in severe acute respiratory syndrome (SARS) survivors. Thorax 2004;59:889–91. doi:10.1136/THX.2004.023762
      9. Raghu G, Wilson KC. COVID-19 interstitial pneumonia: monitoring the clinical course in survivors. Lancet Respir. Med. 2020;8:839–42. doi:10.1016/S2213-2600(20)30349-0
      10. Suliman YA, Dobrota R, Huscher D, et al. Pulmonary function tests: High rate of falsenegative results in the early detection and screening of scleroderma-related interstitial lung disease. Arthritis Rheumatol 2015;67:3256–61. doi:10.1002/ART.39405/ABSTRACT
      11. Hatabu H, Hunninghake GM, Richeldi L, et al. Interstitial lung abnormalities detected incidentally on CT: a Position Paper from the Fleischner Society. Lancet Respir Med 2020;8:726. doi:10.1016/S2213-2600(20)30168-5
      12. Leng M, Wang J, Cheng J, et al. Adaptive semi-supervised clustering algorithm with label propagation. J Softw Eng 2014;8:14–22. doi:10.3923/JSE.2014.14.22
      13. Lelis L, Sander J. Semi-supervised density-based clustering. Proc - IEEE Int Conf Data Mining, ICDM 2009;:842–7. doi:10.1109/ICDM.2009.143
      14. Huang C, Huang L, Wang Y, et al. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet 2021;397:220–32. doi:10.1016/S0140- 6736(20)32656-8
      15. Lange T, Roth V, Braun ML, et al. Stability-Based Validation of Clustering Solutions. Neural Comput 2004;16:1299–323. doi:10.1162/089976604773717621
      16. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Appl Stat 1979;28:100. doi:10.2307/2346830
      17. Kohonen T. Self-Organizing Maps. Berlin, Heidelberg: : Springer Berlin Heidelberg 1995. doi:10.1007/978-3-642-97610-0
      18. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Networks 2000;11:586–600. doi:10.1109/72.846731
      19. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26. doi:10.18637/jss.v028.i05
      20. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: : Morgan Kaufmann Publishers Inc. 1993. doi:10.5555/152181
      21. Breiman L. Random forests. Mach Learn 2001;45:5–32. doi:10.1023/A:1010933404324
      22. Weston J, Watkins C. Multi-Class Support Vector Machines. 1998.
      23. Ripley BD. Pattern recognition and neural networks. Cambridge University Press 2014. doi:10.1017/CBO9780511812651
      24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22. doi:10.18637/jss.v033.i01
      25. Deane-Mayer ZA, Knowles JE. Ensembles of Caret Models [R package caretEnsemble version 2.0.1]. 2019.https://cran.r-project.org/package=caretEnsemble (accessed 13 Dec 2021).
      26. Glennan T, Leckie C, Erfani SM. Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 2016;9723:493–501. doi:10.1007/978-3-319-40367-0_33
      27. Sonnweber T, Sahanic S, Pizzini A, et al. Cardiopulmonary recovery after COVID-19 - an observational prospective multi-center trial. Eur Respir J Published Online First: 10 December
      28. doi:10.1183/13993003.03481-2020
    1. Author Response:

      Reviewer #1:

      Weaknesses:

      For me, most of the weaknesses of this manuscript are related to the cluster detection:

      1. There is no consensus on the definition of transmission clusters in the field. However, the rational of taking the union (rather than the intersection) of two different methods (HIV-TRACE and cluster picker) did not become clear to me.

      2. HIV-TRACE defines clusters based on pairwise genetic distances and cluster picker identifies clusters using pairwise genetic distance with the guidance of a phylogenetic tree (and node support / bootstrap values). Given the underlying sample size and that the phylogeny was constructed already, the rationale for the purely distance related criterion of HIV-TRACE did not become clear.

      We thank the reviewer for their comments and are happy to provide additional results that motivate our decision to use the union of clusters detected with HIV-TRACE and Cluster Picker to estimate HIV transmissions within and between demographic sub-groups in the Botswana - Ya Tsie trial population. The primary motivation was that a filtering step was required to save time and computational resources from evaluating sequences that were too distantly related, before applying the “gold standard” of Phyloscanner to detect directed (when possible) transmission pairs. Accordingly, clustering algorithms plus a distance threshold helped to achieve this filtering. Because we shared what we take to be the reviewers’ concerns about either of the algorithms alone, we sought to maximize the number of transmission pairs that could be identified between participants in the Botswana – Ya Tsie trial with Phyloscanner by using the union of clusters detected with HIV-TRACE and Cluster Picker. This also served as a sensitivity analysis that allowed us to evaluate the extent to which the clustering patterns observed were specific to a single algorithm.

      Furthermore, a previous study done by Rose and colleagues (PMID: 27824249) to compare the number and size of clusters identified with HIV-TRACE and Cluster Picker clustering algorithms revealed that HIV-TRACE generally identified larger but fewer clusters, compared with clusters identified with Cluster Picker that were typically more numerous and mostly small 2-person clusters (Please see Figure 3B below extracted from Rose and colleagues (PMID: 27824249)). This suggested that HIV-TRACE would be helpful in detecting potentially larger transmission chains and Cluster Picker would be valuable in revealing potential transmission events between pairs of individuals.

      Of the 236 genetic clusters detected with the two algorithms, we identified 19 full or partial clusters (including 41 sequences) that included members that were only detected with HIV- TRACE and 122 full or partial clusters (including 242 sequences) that were unique to Cluster Picker. Moreover, of the 82 directed male-female transmission pairs inferred from the sample, (n = 5) were from genetic clusters that were unique to HIV-TRACE compared with (n = 27) that were from clusters unique to Cluster Picker. Of the five transmission events unique to HIV- TRACE clusters, three occurred in intervention communities originating from control communities. By contrast, four of the twenty-seven transmission events unique to Cluster Picker clusters occurred in intervention communities from control communities.

      In summary, estimates of HIV transmissions in the trial population based on the full overlap of clusters detected with HIV-TRACE and Cluster Picker would have excluded 32 of the 82 male- female pairs used for the primary analysis.

      1. For a phylogeny of this size it is feasible to calculate real bootstrap values instead of using (in my experience more liberal) Shimodaira-Hasegawa support values.

      We value the reviewer suggestion and agree that real bootstrap values could be ideal. However, the likely benefit of computing the suggested bootstrap values and thereafter repeating the entire analysis inferring transmission pairs with Phyloscanner and estimating transmission flows would be minimal. As noted above, liberality in a filtering step is a virtue (avoiding filtering out pairs of interest) as long as it does not lead to unfeasibly large computational burden, as this did not.

      1. In Supplementary Note 2.5 it is described how the linkage and direction of transmission score threshold of 57% was chosen. However, the finding that almost half of the accordingly selected probable source-recipient pairs were same-sex and had to be excluded from the analysis questions the reliability of the threshold.

      We apologize for the insufficient clarity in our description and would like the reviewer to kindly note that the threshold in of itself is insufficient to distinguish between Female-Female pairs separated by a single Male intermediate, but rather by design can distinguish between direct Male-Female pairs and Male-Female pairs separated by several intermediates. Once again, the threshold was meant to be a filter that would allow us to run Phyloscanner on a feasible number of sequences, thus appropriately should let through some pairs that are rejected by later steps in the pipeline. Also, kindly note that all previous Supplementary Notes are now presented in the methods section in line with the reviewer’s suggestions.

    1. Author Response:

      Reviewer #1 (Public Review):

      The lateral entorhinal cortex (LEC) receives direct inputs from the olfactory bulb (OB) but their odor response properties have not been well characterized despite a recent increase in interests in the role of LEC in olfactory behaviors. In this study, Bitzenhofer and colleagues provide unprecedented details of odor response properties of layer 2 cells in LEC. The authors first show that LEC neurons respond to odors with a rapid burst of activity time-locked to inhalation onset, similarly to the piriform cortex (PCx), but distinct from the OB. Firing rates of LEC ensembles conveyed information about odor identify whereas timing of spikes odor intensity. The authors then examined the difference between two major cell types in LEC layer 2 - fan cells and pyramidal neurons, and found that, on average, fan cells responded earlier than pyramidal neurons, and pyramidal neurons, but not fan cells, changed their peak timing in response to changes in concentrations, providing a basis for temporal coding of odor concentrations. Additionally, the authors show that inactivation of LEC impairs odor discrimination based on either identify or intensity, and demonstrate different cellular properties of fan cells and pyramidal neurons. Finally, the authors also examined the odor response properties of hippocampal CA1 neurons, and showed that odor identify can be decoded by firing rate responses, while decoding of odor concentration depended on spike timing.

      The authors performed a large amount of experiments, and provide an impressive set of data regarding odor response properties of LEC layer 2 neurons in a cell type specific manner. The results reported are very interesting, and will be a point of reference for future studies on odor coding and processing in the LEC. The manuscript is clearly written, and data are well analyzed and presented clearly. I have only relatively minor concerns or suggestions.

      1. The authors infer the time at which "mice could discriminate odors" from the time at which d-prime becomes significantly different between baseline and odor stimulation conditions (line 111 and line 121). However, the statistical test applied to these data does not guarantee that an observer can accurately discriminate odors. For example, a small p-value can be obtained even when discrimination accuracy is only slightly above chance if there are many trials. The statement such as "mice could discriminate two odors by as early as 225 ms after inhalation onset" (line 111) can be misleading because this might sound as if mice can accurately discriminate odors at this timepoint, while this is not necessarily the case (as indicated by the d-prime value).

      We have added plots of performance accuracy over time under control conditions (LED off) to Figure 2-supplement 1. These plots of fraction of correct responses (binned every 50 ms) show that mice (n = 6) are making choices significantly different from chance within 200 ms of odor inhalation. We changed the wording in the Results to now say: “Moreover, by analyzing lick timing, we determined that the discriminability measure d’ became significantly different under control conditions as early as 225 ms after inhalation onset and performance accuracy increased within 200 ms of inhalation (Fig. 2b, Figure 2-supplement 1).”

      1. Optogenetic identification can be a little tricky when identifying excitatory neurons as in this study. Please discuss some rational or difficulty regarding how to distinguish those that are activated directly by light from those activated indirectly (i.e. synaptically). Do the results hold if the authors use only those that the authors are more confident about identification?

      We only used the cells that were confidently identified using a combination of two criteria. First, tagged cells had to show a significant increase in firing (p_Rate <0.01) during the 5 ms LED illumination period versus 100 randomly selected time windows before LED stimulation. Cells also had to respond with a fixed latency to reduce the chance of including cells recruited by polysynaptic excitation. Further, we used the stimulus associated spike latency test (SALT) as detailed in Kvitsiani et al., 2013. To be judged as tagged, units had to show significantly less spike jitter during the 5 ms LED illumination than 100 randomly selected time windows before LED stimulation (p_SALT<0.01). Only those cells with BOTH p_Rate<0.01 and p_Salt<0.01 were considered as tagged (both methods typically agreed for most cells). Moreover, slice work testing synaptic connections between LEC layer 2 cells found extremely low levels of connectivity between fan and pyramidal cells Nilssen et al., J. Neuroscience, 2018. This makes it unlikely that LED-induced firing of fan or pyramidal cells would recruit indirectly (synaptically) excited cells.

      1. The authors sort odor response profiles by peak timing, and indicate that odor responses peak at different timing that tiles respiration cycles. However, this analysis does not indicate the reliability of peak timing. Sorting random activity by "peak timing" could generate similar figure. One way to show the reliability or significance of peaks is to cross-validate. For instance, one can use a half of the trials to sort, and plot the rest of the trials. If the peak timing is reliable, the original pattern will be replicated by the other half, and those neurons that are not reliable will lose their peaks. Please use such a method so that we can evaluate the reliability of peaks.

      We analyzed the data as suggested by this reviewer as shown below (Author response image 1). Plotting only the odd trials sorted by the odd trials in the dataset (top) looked identical to the data from all trails used in Figure 1g. More importantly, plotting only the even trials sorted by the odd trials (bottom), though noisier due to trial-by-trial variation, showed the same general structure of tiling throughout the respiration cycle for OB cells.

      Author response image 1

      Reviewer #2 (Public Review):

      In this study, Bitzenhofer et al recorded odor-evoked activity in the LEC and examined the coding of odor identity and intensity using extracellular recordings in head-fixed mice, and used the standard suite of quantitative tools to interpret these data (decoding analyses, dimensionality reduction, etc). In addition, they performed behavioral experiments to show the necessity of LEC in odor identity and intensity discrimination, and deploy some elegant and straightforward 'circuit-busting' slice physiology experiments to characterize this circuit. Importantly, they performed some of their experiments in Ntng1-cre and Calb-cre mice, which allowed them to differentiate between the two major classes of LEC principal neurons, fan cells and pyramidal cells, respectively. Many of their results are contrasted with what has previously been observed in the piriform cortex (PCx), where odor coding has been studied much more extensively.

      Their major conclusions are:

      Cells in the LEC respond rapidly to odor stimuli. Within the first 300 ms after inhalation, odor identity is encoded by the ensemble of active neurons, while odor intensity (more specifically, responses to different concentrations) is encoded by the timing of the LEC response; specifically, the synchrony of the response. These coding strategies have been described in the PCx by Bolding & Franks. Bolding also found two populations of responses to different concentrations: one population of responses was rapid and barely changed with concentration and the second population of responses had onset latencies that decreased with increasing concentration. Roland et al also found two populations of responses using calcium imaging in anesthetized mice: one population of responses was concentration-dependent and another population was 'concentration-invariant'. However, neither Bolding nor Roland were able to determine whether these populations of responses emerged from distinct populations of cells. Here, the authors elegantly register these two response types in LEC to different cell types: fan cells respond early and stably, and pyramidal cells response latencies decrease with concentration. This is a novel and important finding. They also showed that, unlike PCx or LEC where concentration primarily affects timing rather than rate/number, odor concentration in CA1 is only reflected in the timing of responses.

      Using optogenetic suppression of LEC in a 2AFC task, the authors purport to show that LEC is required for both the discrimination of odor identity and odor intensity. If true, this is an important result, but see below.

      In slice experiments, the authors characterize the differential connectivity of fan and pyramidal cells to direct olfactory bulb input, input from PCx, and inhibitory inputs from SOM and PV cells. This work is elegant, novel, and important, although it is a little out of place in this manuscript. As such, their findings are irrelevant/orthogonal to the rest of the results in this study. But fine.

      The simultaneous recordings from three different stations along the olfactory pathway are impressive.

      Major concern

      My major concern with this manuscript regards the behavioral experiments. The authors show that blue light over the LEC in GAD2-Cre/Ai32 mice completely abolishes (i.e. to chance) the mouse's ability to perform a 2AFC task discriminating between either two different odorants or one odorant at different concentrations. Their interpretation is that LEC is required for rapid odor-driven behavior. The sensory component of the task is so easy, and the effect is so striking that I find this result surprising and almost too good to be true. The authors do control for a blue-light distraction effect by repeating the experiments in mice that don't express ChR2, but do not control for the effect of rapidly shutting down a large part of the sensory/limbic system. If they did this experiment in the bulb I would be impressed with how clean the result was but not conceptually surprised by the outcome. I think a different negative control is needed here to convince me that the LEC is necessary for this simple sensory discrimination task. For example, the authors could activate all the interneurons (i.e. use this protocol) in another part of the brain, ideally in the olfactory pathway not immediately upstream of the LEC, and show that the behavior is not affected.

      This reviewer suggests a negative control experiment for the effects we observe on behavior when optogenetically silencing LEC. However, we disagree that it would be informative to silence other olfactory pathways in search of those that do not affect behavior. Our strong effects on behavior are also in complete agreement with recent findings that muscimol inactivation of LEC abolishes discrimination of learned odor associations (Extended Data Figure 8, Lee et. al., Nature, 2021).

      More specifically, both the presentation and the interpretation of the data are confusing. First, there is a lack of detail about the behavioral task. I was not sure exactly when the light comes on and goes off, when the cue was presented, and when the reward was presented. In the manuscript they say (line 108) "…used to suppress activity during odor delivery on a random subset…". There is nothing more about this in the figure legend or Methods. The only clue to this is the dotted line in the 'LED On' example at the bottom of Fig. 2a. The authors also say that (line 660) "Trials were initiated with a 50 ms tone." When exactly was the tone presented? In the absence of any other information, I assume it was presented at odor onset. When was the reward presented? Lines 106-7 say "Mice were free to report their choice (left or right lick) at any time within 2 s of odor onset." Presumably this means the reward was presented to one of the ports for 2 seconds, starting at odor onset.

      The LED is applied during odor delivery, the 50 ms tone immediately precedes odor delivery, and water reward is dispensed after the first lick at the correct lick port during the choice period. The choice period begins with the odor onset and odor delivery is terminated by the first lick at either the correct or incorrect port. If there is no lick at either port, odor delivery lasts 1s and is followed by an extended choice period (terminated by correct or incorrect lick) lasting 1s. To clarify the behavior protocol, we have included a schematic of the trial structure in Figure 2-supplement 1.

      These details matter because the authors want to claim that "LEC is essential for rapid odor-driven behavior." The data presented in support of this claim are (1) that mice perform this task at chance levels in LED On trials, presumably based on which port the mouse licked first (this is the 'essential' part), and (2) that in control in LED Off trials, d' becomes statistically different from baseline after ~200 ms (this is the 'rapid' part).

      To further support the argument that LEC is required for rapid odor-driven behavior, we now show a plot of % correct responses over time from first odor inhalation.

      On first reading, these suggested that shutting off LEC makes odor discrimination worse and/or slower. However, the supplementary data clarifies several things. First, the mice never Miss (Fig.2S.2a & c), meaning then they always lick. Second, in LED Off trials (F2S2 & e), the mice make few mistakes, and these only occur immediately after inhalation, presumably meaning the mice occasionally guess, possibly in response to the auditory cue. Thus, the mean time to lick is much shorter for Error trials than Correct trials. To state the obvious, the mice often wait >300 ms before they lick, and when they do wait, they never make mistakes. Now, in the LED On trials, the mice almost always lick within the first 300 ms and perform at chance levels, with the distribution of lick times for Correct and Error trials almost overlapping. In fact, although the authors claim LEC is required for rapid odor discrimination, the mean time to lick on Correct trials appears to decrease in LED On trials. This makes me think that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily implicate a dependence on LEC for odor discrimination.

      We do not believe that mice are making ballistic guesses in response to the tone for LED on trials. First, although a 50 ms tone immediately precedes odor delivery, all data in Figure 2-supplement 1 shows lick times aligned to the first inhalation of odor. Thus, time 0 ms is not the tone or subsequent odor onset but rather a variable time point coinciding with the first odor inhalation (the delay from odor onset to first inhalation is ~300 ms, the average respiration interval under our conditions). In fact, we excluded trials if mice made premature licks between the time of odor onset and first odor inhalation. We re-analyzed these trials to test the reviewer’s idea that mice were more likely to make fast ballistic guesses when the LEC was silenced. However, we saw no evidence that mice made more premature licks in trials with LED on (Author response image 2).

      Author response image 2

      The authors' interpretation of their data would be more solid if, for example, there were a delay between the auditory cue and odor delivery and/or if the reward was only available with some delay after the odor offset. Here, however, it seems just as likely as not that the mice are making ballistic guesses in response to the tone in LED On cases, which doesn't necessarily involve dependence on LEC for odor discrimination. Here, the divergence of d' from baseline in the control (i.e LED Off) condition seems mostly because mice take longer to correctly discriminate under control conditions. While this is not formally contradictory to LEC is essential for rapid odor-driven behavior", it is nevertheless a bit contrived and misleading. An interesting (thought) experiment is what would happen if the authors presented a tone but no odor. I would guess that the mice would continue licking randomly in Light On trials.

      While a delay between odor delivery and reward would have been useful for some aspects of interpreting the behavior, we would have lost the ability to examine the role of LEC in response timing. To address this reviewer’s concern, we have added a section to the Discussion mentioning caveats related to the interpretation of experiments using acute optogenetic silencing to understand behavior.

    1. Author Response:

      Reviewer #1 (Public Review):

      This article by H. Izgi et al. describes interesting work measuring transcriptional changes through development and later aging. The authors broadly conclude that these tissue transcriptomes diverge during development, but re-converge during aging. They name this expression pattern divergence convergence, or DiCo.

      After drawing this conclusion from tissue samples drawn from 16 mice of their own, they look at published mouse and human transcriptomic data and observe similar patterns of change.

      Overall the authors emphasize that both highly mitotic and less mitotic tissues show examples of the DiCo transcriptional pattern, supporting the possibility that this may be a general phenomenon.

      In addition, the authors ask whether the tissue-specific changes they observe might depend on changes in cell composition with tissues, or cell autonomous transcriptional changes within cells, using published single-cell data. They conclude here that both play a role.

      Some of the more specific findings are not surprising and in this support the soundness of parts of the methodology, e.g. that shared developmentally down-regulated genes were enriched in functions such as cell cycle and cell division.

      My largest suggestion centers around an alternative hypothesis that may occur to readers; namely that the convergence or Co part of DiCo could be just regression to a mean due to heteroscedasticity with respect to time (age) caused by increased noise in expression. As the divergence could be imagined to be largely due to tissue differentiation during development, which has been studied extensively previously, the overall novelty of these findings relies much more on the later convergence that the authors have observed. The authors note: "Interestingly, we found no overlap between gene sets with the reversal pattern (up-down or down-up genes) across tissues, relative to random expectation". They also note "Intriguingly, we found that similar cell types (i.e. those with the highest correlations) among tissues become less similar with age (36/54 [67%] of pairwise comparisons, Figure 5-source data 1). On the contrary, the most distinct cell types (i.e. those with the lowest correlations) among tissues become more similar with age (45/54 [83%], Figure 5-source data 1).", which is at first glance consistent with this alternative hypothesis. The authors do directly address previous observations of increased noise with age in their Discussion (Bahar et al. 2006; Martinez-Jimenez et al. 2017; Angelidis et al. 2019; Somel et al. 2006), although I might also suggest perhaps PMID: 20832724 PMID: 8604994, and PMID: 28965763. Their acknowledgment refers to the disagreement of their own findings of inter-tissue correlation distributions being modest and comparable between aging and development in Figure 1c. Their CoV trajectory data in Figure 2, perhaps most relevant here in Figure 2c, may also speak to this issue. Nevertheless, in my opinion it would strengthen the manuscript greatly for many readers if this alternative hypothesis were more explicitly and clearly spelled out, and then perhaps more explicitly ruled out, in the manuscript.

      We thank the reviewer for pointing out this interesting possibility, i.e. that increased expression heterogeneity during ageing (heteroscedasticity) may cause the observed DiCo pattern. Heteroscedasticity can occur at two different levels: inter-individual (Somel et al., 2006) or inter-cellular (Bahar et al., 2006). Here we only have enough power to test whether inter-individual heterogeneity may contribute to DiCo. We used two heteroscedasticity tests. In both cases we compared i) genes with DiCo pattern and ii) genes with DiDi pattern (divergent throughout the lifetime). The hypothesis was that if heteroscedasticity has a role in DiCo, DiCo genes should show stronger heteroscedasticity than DiDi genes.

      In the first approach, we followed the method we used to measure heteroscedasticity in (Işıldak et al., 2020) and (Kedlian et al., 2019). We first fit a linear model between age (log2 scale) and the expression level of each gene. Then, we calculated Spearman’s correlation between the absolute residuals from this model and age. We found that DiCo and DiDi genes are not significantly different in terms of their effect sizes in heteroscedasticity in any of the tissues (two-sided KS test, p>0.05 in all tissues, Figure 2-figure supplement 15a).

      In the second approach, we used the ‘ncvTest’ function from the ‘car’ package which performs Breusch-Pagan test for heteroscedasticity in a linear model. We compared the test statistics of Breusch-Pagan test, i.e. measure of heteroscedasticity of each gene in each tissue, between DiCo and DiDi genes. We found that the two gene sets do not significantly differ in heteroscedasticity in the three tissues. The only exception was muscle; here, contrary to expectation under the alternate hypothesis, DiDi genes showed slightly higher heteroscedasticity (two-sided KS test, p=0.042, Figure 2-figure supplement 15b).

      We believe that the new results strengthen our results and suggest the observed DiCo pattern is not an artefact of inter-individual heteroscedasticity. We have now updated the text to include these new analysis results and figures (Figure 2-figure supplement 15).

      Meanwhile, the above analysis does not test the possible relationship between heteroskedasticity and DiCo at the cellular level. Inter-cellular expression noise, when coupled with constraints on minimum and maximum expression levels, can theoretically lead to gene expression becoming more similar to the mean levels. In other words, the most cell-type-specific genes with the highest and lowest expression may attain lower or higher expression levels during ageing, simply due to increased expression noise during ageing. Such an effect could theoretically increase correlation among cell types. This model is in essence an alternative description of our “loss of cellular identity” model and elegantly links together two observations, inter-cellular heteroskedasticity and convergence.

      We thank the reviewer also for suggesting new references for increased noise with age. We have now updated the text to add those references.

      Reviewer #2 (Public Review):

      In this manuscript, Izgi et al investigated age-dependent gene expression pattern changes in male mice by analyzing a new bulk RNA-seq data from four different tissues collected at different ages covering post-natal development and aging. Gene expression patterns observed before and after sexual maturation seem to suggest inter-tissue divergence and convergence of gene expression profiles, respectively. The authors name that phenomenon Divergence-Convergence or "DiCo". Analysis of publicly available single cell RNA-seq [scRNAseq] datasets (from the Tabula Muris Senis consortium) suggests that such gene expression pattern changes may be explained by both alterations in tissue cell type composition, as well as by cell-autonomous expression changes. These observations may suggest that aging results in at least a partial loss of tissue identity acquired developmentally.

      Although the authors report an intriguing finding, there are major issues in the manuscript as it stands, notably concerning the clarity and rigor of the data analysis and manuscript. Notably, the authors compare expression levels across samples using the FPKM normalization method, which has been shown to be a problematic metric. There are also inconsistencies in statistical and methodological choices for which there is not a clear rationale explained in the manuscript. Finally, the authors use only male animals, which may not reflect age-related trajectories in female animals, but draw broad cross-species conclusions without raising sex as a caveat to the generalization of the conclusions.

      We thank the reviewer for their careful reading and we are happy to hear that they found the results intriguing. Following the reviewer’s criticism, we carefully re-wrote the Methods section. We hope that the reviewer will now agree that the problem in our first submission was with lack of textual clarity instead of methodological. With regard to the comment about normalisation; we do not only use FPKM in our analyses (which, as the reviewer suggests, is an intra-sample normalisation), but we also apply quantile normalisation, which is an intersample normalisation method. We now clarified this aspect in the text. In addition, we repeated the main analyses using VST, which is another inter-sample normalisation that is implemented in the widely used DESeq2 pipeline. This confirmed our main conclusions. In the main text we retained the results based on quantile normalisation, but also report the VST-based analyses for confirmation.

      We also thank the reviewer for their comment on the potential sex-dimorphism of the observed phenomenon, which we had not considered before. Our samples are indeed all-male, whereas the additional dataset from Jonker et al. is composed of only female individuals, and notably, inter-tissue convergence during ageing was also observed in this dataset. Additionally, the GTEx data covers both female and male samples in humans and also suggests a trend towards inter-tissue convergence during ageing. While we observe DiCo in both sexes, it is still possible that the genes and functional pathways that show this pattern might be sexspecific and do not overlap between sexes. A comparison of male and female-specific convergent genes in mice (i.e. those identified in our data and that of Jonker et al.) is not possible at this point, as sex effects would be confounded with laboratory and platform effects.

      Although the human GTEx data contains both males and females, the age distribution of female (n=11) and male (n=36) samples are quite different (and we also lack male individuals at 20-29 and 70-79 age groups, limiting our data only to 30-69 for males). Consequently, we could only test inter-tissue convergence in each sex but could not compare those gene sets. Based on the analysis in the GTEx data, we observed that convergence during ageing was marginally significant in the female sample (⍴_female= -0.58, p_female=0.059) but not in the male sample (⍴_male= -0.052, p_male=0.77) (Figure2-figure supplement 16). The difference might be driven by missing individuals for the youngest and oldest age groups.

      We now included these results in the main text, and discuss the importance of addressing sex-specific effects in the future.

      Reviewer #3 (Public Review):

      In this manuscript Izgi et al. analyzed gene expression time-course data in four tissues during postnatal development and ageing in mice. Authors show that the expression levels of genes often reverse with ageing compared to development. Authors further show that the expression pattern diverge among the tissues during postnatal development and converge among tissues with ageing. This divergence and convergence pattern (called DiCo) is analyzes at both individual gene and genome-wide levels using multiple statistical approaches. Both cellular composition changes and cell autonomous expression changes contribute to the reversal of gene expression pattern during ageing. This study connects expression pattern during postnatal development with ageing, extending previous work on a single tissue.

      Strengths:

      -The expression convergence with age is consistently seen across multiple datasets and species indicating it can be widespread.

      -The datasets generated are unique and would be useful resource for ageing genomic community.

      -Authors go beyond bulk RNA-seq and also analyze available single cell RNA-seq datasets in mice to asses the contribution of cell composition changes and cell intrinsic expression changes to DiCo.

      Weaknesses:

      -Many aspects of expression convergence and DiCo pattern have low effect size and some are not significant. It also appears that this pattern is best seen at the genome-wide level.

      -Although there is statistical support for DiCo, there are no consistent functional associations discovered in Gene Ontology enrichment.

      -The mechanism for DiCo and the extent to which the same genes or pathways underlie this across species is unclear.

      We thank the reviewer for their careful reading of our manuscript and for pointing out the strengths and weaknesses in a clear manner. We hope that both the dataset and the insight we gained from this study will be useful for the community and open new directions of research in the future.

      We agree with the reviewer that although we study the convergence of expression at different levels, it is the most prominent at the genome-wide level and the effect size is small. We now included a discussion on this aspect in our limitations paragraph. As the reviewer points out, our analysis was focused on identifying genome-wide patterns and not on particular genes and/or specific functional processes. Still, we do find certain associations between DiCo genes and GO categories related to tissue development and differentiation. In this version, we provide a more in-depth analysis of these categories, together with their profiles of gene expression during development and ageing. Unfortunately, confirmation of the functional consequences through experimental studies is outside the scope of this paper. Thus, the results should be seen as potential links that require further experimental support. We also mention this in our limitations paragraph. Lastly, to address the reviewer’s comment on the mechanisms, we tested whether the DiCo pattern is associated with certain transcription regulators, miRNAs and TFs; however, we did not find any specific regulator. If DiCo is indeed a transcriptome-wide phenomenon caused by loss of expression regulation and cellular identity during ageing, rather than the result of a controlled program, lack of significant association with specific transcriptional regulators may be expected. This new result and its discussion are also included in the new version.

    1. Author Response:

      Reviewer #1 (Public Review):

      Here, Garner and Theriot investigate the question of leading edge maintenance in migrating cells. They analyze small and dynamic fluctuations of the membrane at the cell front in order to understand how membrane stability emerges from these seemingly random and uncoordinated events. Experimental data enable description of fluctuations at different length scales and their relaxation in a visco-elastic manner.

      To gain knowledge about this system, a stochastic model of branched actin network growth against a membrane is developed, taking into account a number of molecular reactions at play. This model recapitulates correctly the cellular observations, with correct orientation of the filaments and similar membrane fluctuations. Also, addition of Latrunculin B which leads in vivo to increased amplitude of the fluctuations with decreased fluctuation rates is described in the model when nucleation and elongation rates are decreased.

      Changing the different parameters of the model reveals that two features are critically important (2): a branching reaction occurring solely at proximity of the membrane, and the possibility for filaments to spread laterally. Other important parameter includes the Arp2/3 complex branching angle, where a 70-80{degree sign} geometry is found to be optimal for minimizing actin density fluctuations and leading edge fluctuation amplitudes.

      This work is of excellent quality and its conclusions seem justified. However, it would be important to have more details on the limit of detection of membrane shape fluctuations and network growth by phase contrast microscopy.

      The reviewer raises an important point on the differences in spatial resolution between the experimental and theoretical aspects of our work. We appreciate this opportunity to further clarify which of our conclusions are directly demonstrated by experimental data, and which are theoretical predictions that are grounded in experimental data but not explicitly measured. Our updated manuscript includes an expanded discourse on this topic in the Results and Discussion. I outline the major points below:

      1) In lines 92-99 of the Results, we estimate our experimental spatial resolution for measuring leading edge fluctuations, emphasizing that imaging by phase-contrast is not sufficient to resolve individual filaments or polymerization events. We also clarify our hypothesis that the measured fluctuations are a micron-scale property arising from stochastic monomer addition at the molecular scale, now more directly stating that simultaneous stochastic polymerization of filaments throughout the leading edge might act collectively to generate large scale curvature.

      2) In lines 142-143, we make more clear that we developed the molecular-scale actin network growth model to explore how molecular interactions might lead to the observed larger scale fluctuation behavior.

      3) In lines 151-152, and 284-287 of the Results, we discuss the range of wavelengths over which the experiments and modeling output can be directly compared.

      4) Finally, in the Discussion (lines 317-324) we emphasize that we experimentally measured micron-scale lamellipodial shape dynamics, but inferred nanometer-scale details using a molecular-scale model that correctly predicts this emergent behavior (as well as many other experimentally-measured features of lamellipodial actin networks). We then discuss how our results might inspire new super-resolution experimental approaches to directly test molecular-level predictions of the model.

      Reviewer #2 (Public Review):

      The topic of actin driven cell motility will be of general interest. The authors provide new ideas for the field of research, the modeling methods and model design seem valid and appropriate, and the paper is well written. My main concern is whether the fluctuation spectrum derived from the model corresponds to that of the experimental images.

      Visually (and perhaps mistakenly on my part), the experimental analysis of Fig. 1b seems to show a nearly periodic red-blue curvature pattern with a scale of order 4 microns that persists over 10-15 sec, a time over which the cell advances by a distance of order the size of the lamellipodium. While such a nearly periodic pattern would be expected to lead to peaks at the corresponding periods and wavelength in Fig. 1e and 1g, no clear peaks are observed in those figures.

      However, the autocorrelation functions in Fig. 1e are not plotted over times comparable to 10-15 sec. Further, the analysis of the leading edge contour is done with a background subtraction method that removes fluctuations over 7 microns, a length scale that may be dampening a real peak at ~4 microns in Fig. 1g.

      The feature I am pointing out could be occurring at a length scale in between the shortest length scales (a pixel) and the longest ones (cell size) in the system. Instabilities, a main theme of the paper, frequently get amplified at a characteristic length scale. Here there may be a length scale that is selected by the system that may not be picked up by the analysis or the proposed model.

      We thank the reviewer for drawing our attention to an apparent discrepancy between the curvature kymograph shown in Fig. 1b and the results of the autocorrelation analysis, which we now believe we have reconciled. In our updated manuscript, we demonstrate that (1) the feature the reviewer points out in the kymograph is not indicative of a dominant mode or instability; (2) regardless, the feature in question is not removed by our pre-processing step; and (3) an extension of our analysis to longer length and time scales does not affect our results. These points are summarized in an extended description of the curvature kymograph and autocorrelation analyses in the Results (lines 120-125), Methods (lines 416-423, 443-444, 471-473, 487-503), and in three new supplemental figures (Fig. S3 5). Our argument is as follows:

      1) The apparent instability in the curvature kymograph (which the reviewer suggested our autocorrelation analysis might not be detecting) can be reproduced in a model in which there are, by definition, no instabilities, dominant wavemodes, or oscillations – that of a membrane freely fluctuating under Brownian motion (Fig. S3). This proves that one cannot interpret the appearance of such an underlying pattern in the kymograph as evidence of an instability. We note that the apparent “dominant wavemode” of ~4 µm in the curvature kymograph might simply reflect the span used to perform the curvature fitting, as it is approximately twice the size of the curve-fitting window. Overall, this control provides a case-in-point for the potential pitfalls in interpreting kymographs and the necessity of Fourier mode autocorrelation analysis as a more comprehensive approach.

      2) The reviewer raised the possibility that baseline-subtracting features above 7 µm might remove the apparent ~ 4 µm instability from our data, but these visual features remain apparent in curvature kymographs generated after the baseline-subtraction is applied (Fig. S3). Therefore our 7 µm cut-off does not remove the features in question.

      3) As suggested by the reviewer, we extended our analysis to longer length and time scales, and found that it did not affect our results. Consistent with what could be observed from the originally-plotted timescales in Fig. 1e, longer timescales show the signal decays to noise (or at least something which cannot be distinguished from noise in any straightforward way) at all length-scales (Fig. S4). Additionally, repeating the analysis using a 10 µm span for background-subtraction of the leading edge shapes (an increase of ~50% compared to the 7 µm span used in the original manuscript, and more than twice the width of the feature of concern to the reviewer), reveals no new features in the data (Fig. S5).

    1. Author Response:

      Reviewer #1 (Public Review):

      Dicks et al. in this study characterized electrophysiological properties of mutant and wild-type hiPSC-chondrocytes and the expression of chondrocyte-associated markers during chondrogenic differentiation of the cells, and analyzed the differential expression of global transcriptome between the different chondrocyte groups. They demonstrated TRPV4 mutation-induced changes in calcium signaling, mechanical property of matrix, and transcriptome of hiPSC-chondrocytes and concluded that the V620I and T89I mutations of TRPV4 in chondrocytes delay or inhibit hypertrophy, which may be a potential cause of skeletal dysplasias.

      This study applied a gene-editing tool to creating mutant hiPSCs as a human cell model of the disease in culture to study TRPV4 mutation-induced alteration in cellular activities and molecular regulation. Establishing such an hiPSC model for disease study is novel and considered a major strength. Other strengths of this report include adequate background information, solid data analysis, and well-referenced discussions. The iPSC model established in this study could potentially be used to study pathogenic mechanisms of the diseases and identify molecular targets involved in regulating the mechanisms for the development of disease treatments.

      However, there are two weaknesses identified in this current report, which are described below.

      1. Through comparison, differences in biological response and activities between mutant and wild-type hiPSC-chondrocytes were shown, and molecules and mechanisms of interest were identified as potential regulators involved in the mutation-induced changes. However, critical experiments such as gain- and loss-of-function assays to determine whether and how some or all of the identified molecules or mechanisms (HOXs, TGFB, biomineralization genes …) are regulated by the mutations to alter chondrocyte activities are missing. These experiments are needed to strengthen their conclusions. The discussions about the identified molecules and mechanisms with cited references are inadequate as a support for the conclusions.

      We agree with the reviewer that gain- and loss-of-function experiments would be critical for identifying whether the proposed mechanisms are in fact responsible for the differences caused by the TRPV4 mutations and the disease phenotypes. However, these experiments are out of the scope of this study, and we plan to investigate each of these mechanisms in future studies. In the meantime, we have added additional citations to the discussion to further support these conclusions.

      1. The data currently presented in Figures 1, 5 and 6 are insufficient to justify the claims regarding mutation-induced changes of TRPV4, chondrocyte hypertrophy, and expression levels of the identified molecules.

      To further support the conclusions in Figures 1, 5, and 6, we have added additional data. As suggested, we investigated the role of TRPV4 phosphorylation on channel function and activity. We found V620I had increased expression of PRKCA, the gene encoding for protein kinase C alpha. These data indicate that TRPV4 phosphorylation may be responsible for the increased basal calcium signaling through V620I TRPV4.

      We then performed western blots to investigate production of hypertrophic proteins to validate the gene expression and support the claims that V620I and T89I had delayed hypertrophy in response to BMP4 treatment. Indeed, BMP4 treatment increased ALPL, COL10A1, IHH, and RUNX2 gene and protein expression compared to TGFβ3 controls, and this response was more prominent in WT than mutant lines. These data have been added to the paper to support our conclusions (Fig. 4 – Fig. S1B, Fig. 5, Fig. 5 – Fig. S1).

      Reviewer #2 (Public Review):

      In this manuscript, Dicks et al. generated two human iPSC lines with TRPV4 mutations (mild V620I or lethal T89I) using a CRISPR-Cas9 approach and examined their channel function and differentiation abilities into chondrocytes. While their initial goal is to elucidate the detailed molecular mechanisms underlying how these two mutations lead to strikingly distinct severities of skeletal dysplasias, most of their data found that these two mutations behave in a similar manner. The minor differences they found are: 1) increased basal currents in V620I cells; 2) reduced mechanical properties of cartilage matrix in V620I chondrocytes; 3) some differences in DEGs of RNA-seq data. They also stated that "The severe T89I mutation inhibits chondrocyte hypertrophy more than moderate V620I 298 mutation" (page 16). However, no substantiated data were provided to support this conclusion. While a serial of RNA-seq experiments were performed to explore the underlying mechanism, they were not followed by validation experiments to pinpoint the exact pathways or molecular mechanisms. Thus, although using CRISPR-Cas9 and iPSCs are novel and potentially important, this manuscript is overall descriptive with limited mechanistic information.

      We thank the reviewer for the summary of the paper. We have further investigated the differences between WT and the two mutant lines to add to the RNA-seq experiments. As suggested by another reviewer, we looked at protein kinase gene expression, which may be altering TRPV4 phosphorylation and ultimately changes in channel activation. This expression data is consistent with the basal calcium differences we saw, and we believe these warrant further investigation in a follow-up study regarding biochemical changes to the channel structure and activation.

      We also further validated the differences in BMP4-induced hypertrophy by looking at protein production. BMP4 not only increased hypertrophic proteins COL10A1, ALPL, IHH, and RUNX2, but we saw much larger increases in WT compared to mutants. Further, ALPL production was increased in the moderate V620I mutation compared to the severe T89I mutation, indicating a potential player in the differences in disease severity caused by the two mutants.

      Finally, we investigated the DEGs between V620I and T89I to highlight the differences between the two mutations. We believe this study has served as a foundation for identifying potential mechanisms leading to the disease phenotypes of moderate and severe skeletal dysplasias. In future studies, we hope to validate these mechanisms.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Cai and authors offer a new and important discovery demonstrating the persistence of a clade on non-caballine equids, Sussemionus, well into the later millennia of the Holocene in northern China. My expertise does not lie with the genomics analysis, so I will not offer detailed comment - but as an outsider, the arguments seemed well-supported and convincing.

      We thank the reviewer for the positive assessment.

      The primary weakness of the article lies in the omission of detailed archaeological context, and in the failure to consider implications for and from human societies. All specimens were taken directly from archaeological sites, but no information is given about the archaeological sites and cultures the specimens were derived from. In early China, ca. 3500 BP, the persistence of wild equid taxa is a very significant finding. This time period was a very dynamic period across northern East Asia, with the first introduction of domestic horses and the first spread of other livestock pastoralism (see Brunson et al, https://www.sciencedirect.com/science/article/abs/pii/S2352409X20300535). And, as summarized in Yuan and Flad (2006), many of the earliest sites speculatively linked with domestic horses that predate the final Shang Dynasty are isolated equid bones from archaeological sites, without definitive archaeological data to determine domestic or wild status. Therefore, the archaeological context of these finds is really important - how were each of the bones originally identified in archaeological reports? Is there associated evidence that the equids were hunted and eaten? The authors must add a section describing the archaeological context in greater detail, and considering the possible implications of the finds. For example, the persistence of sussemione equids through the 2nd millennium BCE implies that researchers must be exceedingly careful in zooarchaeological identifications prior to this period.

      We thank the reviewer for pointing this out. We have provided more details about the archaeological context in the revised manuscript: “Nearly 20,000 square meters of Honghe (47.20°N, 123.62°E) have been excavated, revealing a late Neolithic settlement site dated to approximately ~3,400-4,400 years ago and belonging to a unique, rich fishing and hunting culture characteristic of northeastern China (Figure 1—figure supplement 1). The scale of the moated settlement indicates that there was already social management and relatively high productivity and building technology. The Muzhuzhuliang site (38.83°N, 110.50°E) belongs to the “Longshan Culture” dated to approximately ~3,800-4,300 years ago. It is the most complete moated settlement hitherto excavated in the late Neolithic Age of Northern China, and showed a subsistence economy based on agriculture, animal husbandry and hunting. The Shatangbeiyuan site (35.63°N, 105.11°E) belongs to the early cultural relics of “Qijia culture” in the Neolithic Age, which is dated to approximately ~3,900-4,200 years ago. Millet represented the main crop produced at that time, stone and bone arrowheads have also indicated that hunting was also performed. The rise and decline of these cultures were substantially influenced by the regional environmental conditions. And no traces of domestication but consumption were found in the equine specimens of three sites, indicating that they were hunted for food”.

      And we have added “And given that the persistence of Sussemiones through the second millennium BCE, researchers must be exceedingly careful in zooarchaeological identifications prior to this period.” at the end of the article.

      Moreover, the result might also warrant a discussion about the role of pastoral cultures, or the introduction of domestic horses, in the final extinction of the sussemiones. Without such a summary, it is incomplete to suggest that their final extinction is a result of inbreeding and reduced genetic diversity.

      We agree that this is an interesting point to consider. We have added the sentence “Considering the knowledge of environmental and human archaeology, our results imply that the extinction of this lineage may be affected by the combination of climatic change and human mediation.”

      Reviewer #2 (Public Review):

      Dawei Cai and colleagues present a series of firsts and new discoveries including (1) the first high coverage genome from an equid that is unequivocally an extinct species and (2) demonstrating that Equus (Sussemionus) ovodovi survived into the late Holocene, belonged to a lineage sister to all extant non-caballine equids, and underwent extensive admixture soon after its divergence from non-caballine equids.

      The manuscript is clearly laid out and well written. The analyses are conducted logically and to a high standard, which includes testing the impacts of reference genome choice and DNA misincorporations in nearly all analyses. The conclusions are mostly supported by the data but some methodological clarifications and discussion of conflicting results are required.

      Thanks for your comments.

      Strengths/weaknesses of the five main findings:

      (1) Sussemiones survived into the late Holocene. Strengths: It is remarkable that Sussemiones survived so late into the Holocene, but the authors present radiocarbon evidence from multiple skeletal elements and sites supporting the late survival hypothesis. Combined with the genomic evidence, there is very strong support for this assertion. Weaknesses: The manuscript does not describe the radiocarbon methods, such as which laboratory these analyses were conducted in and whether samples were ultrafiltered or not. A description of the calibration methods and curve version used is also lacking.

      Thank you for this suggestion. We have provided more details about the radiocarbon methods in the revision and Supplementary Table S2. “Radiocarbon dating of the samples was performed at the Beta Analytic Radiocarbon Dating Laboratory, Miami, Florida. Bone or tooth pieces about 2g were sampled in the bone and sent for subsequent dating of collagen (not ultrafiltered). Calibration was carried out using OxCalOnline (https://c14.arch.ox.ac.uk/oxcal.html) and the IntCal20 calibration curve.”

      (2) Equus (Sussemionus) ovodovi is a sister lineage to all extant non-caballine equids. Strengths: The authors construct both exome and candidate neutral loci phylogenies from across the nuclear genome, including testing the impact of two different reference genomes. All analyses support the same placement of E. ovodovi with 100% bootstrap support. The assertion is therefore strongly supported. Weaknesses: No weaknesses identified.

      We thank the reviewer for the positive assessment.

      (3) The early evolution of the lineages leading to the E. ovodovi and the three main extant equid groups was characterised by extensive admixture. Strengths: The authors use three different methods to infer the presence, extent, and/or direction of admixture. Weaknesses: A major weakness here is the incongruence between the TreeMix models and the D-statistics and G-PhoCS analyses (the latter two give a coherent story). Given the large admixture events determined by G-PhoCS, it seems concerning that these events are not recovered as migration edges in the TreeMix analyses.

      We thank the reviewer for the suggestion. As the reviewer notes, two reasons may cause the incongruence between the TreeMix models and the G-PhoCS analyses. First, the TreeMix models will work best when gene flow between populations is restricted to a relatively short time period, situations of continuous migration violate this assumption and lead to unclear results (see Pickrell, Joseph K., & Pritchard, Jonathan K. (2012), https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002967). Second, two different datasets were used in the analyses. The tree topologies and gene flow were recovered using whole-genome SNPs by TreeMix, while the G-PhoCS analyses of selected samples based on 15,324 candidate ‘neutral’ loci.

      (4) Population size of E. ovodovi over the past 2 Myr. Strengths: The authors correct for differences in genome coverage to allow for the PSMC profiles between four equid taxa to be comparable, allowing for comparison of population size trajectories. Weaknesses: In Figure 4, the presented PSMC profiles are a mix of those with or without transitions (comparing profiles to Figure - 4 figure supplement 1). Given that the exclusion of transitions impacts the PSMC profiles, these should be standardized in Figure 4 to give a fair comparison.

      We thank the reviewer for this suggestion as well. As for the possible mis-incorporation pattern and high error rate of four equids, we compared the PSMC analyses performed with and without transitions. A consistent pattern was observed regarding two datasets expect for the PSMC bootstrap pseudo-replicates for HH06D, and we therefore only presented PSMC profiles without transitions when considering the ancient HH06D specimen. Meanwhile, we applied a correction based on an empirical uniform false-negative rate for low coverage genomes (<20×). All three Eurasian equine species genomes were rescaled following the same procedure (see L. Orlando et al. (2013), https://www.nature.com/articles/nature12323).

      (5) Inbreeding was a contributing factor to the extinction of E. ovodovi. Strengths: The authors determine heterozygosity and runs-of-homozygosity in E. ovodovi and compare these to all living equids, and find that E. ovodovi had low heterozygosity although not excessive runs-of-homozygosity. Weaknesses: The authors should be more cautious with their interpretation/phrasing on L383-384, given that inbreeding and/or reduced genetic diversity has not been demonstrated as the extinction driver.

      Thanks for the suggestion, and we have now re-written this sentence: “So combined with a degree of inbreeding, the reduced genetic diversity available may have contributed to the subsequent extinction of the lineage”).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Bishop et al. aim to quantify the ventilatory response to hypoxia and hypercapnia in the common marmoset, an increasingly more common primate research model. They also present an unsupervised analysis tool to quantify ventilatory behavior, which is a potentially major contribution to the respiratory field.

      Strengths of this manuscript include the inclusion of male and female animals and the development of an analysis toolkit that may be less impacted by biases that are introduced when hand analyzing respiratory behavior, as is commonly done in the field. This tool could be of tremendous value to the respiratory community. Identification of sniffs, sighs, and apneas are often plagued by the qualitative nature of the analysis.

      We thank the reviewer for taking their time to evaluate our submission and for the overall positive assessment of our submission.

      Limitations of the study relate to the measure of the hypoxic and hypercapnic ventilatory drive. Tidal volume in whole body plethysmography is not accurate unless the plethysmograph and body temperature are taken into account. (See, https://pubmed.ncbi.nlm.nih.gov/25080926/). This is particularly important when the animal's core body temperature changes during hypoxia because of a fall in metabolic rate. The decrease in VCO2 shown here suggests that this is occurring here.

      We thank the reviewer for their comment. We applied acute hypoxia and hypercapnia to perturb breathing behaviors and used our analysis tool to evaluate said disturbed respiratory behaviors. We have addressed the limitations of our studies in the revised submission. In addition, and because of this limitation, we include an arbitrary unit (a.u.) for tidal volume (and other characteristics of breathing derived from tidal volume).

      It is worth pointing out that the fall in VCO2 is not typically observed in humans. So, while the authors conclude that minute ventilation does not increase in the marmoset, it is not necessarily a valid conclusion that that hypoxia ventilatory drive is low because VE should be expressed as a function of VCO2. If VCO2 falls but VE is constantly, ventilation per unit metabolism will actually have increased. Ventilation may also be underestimated here because of the fall in core body temp that likely coincides with a lower VCO2.

      We thank the reviewer for this comment. The data on changes of metabolic rate (by measuring VCO2 or VO2) during hypoxia in human subjects are not consistent (for instance see PMID: 2390141, Figure 3 clearly shows a decrease of ~50% in metabolic rate during hypoxia). Therefore, we have soften the language in our submitted revision.

      In addition, in the revised manuscript, we have performed the recommended analysis to express VE as a function of VCO2. However, hypoxia did not increase the ventilation efficiency (VE/VCO2) in marmosets. We have added the new data (Figure 4H) and discussed it in the revised manuscript.

      It is also worth noting that the hypoxic ventilatory response is not necessarily linear and the full range of the response is not characterized. For example, 15% O2 in the rat elicits very little response but there is a robust response with 9% O2. It is also worth noting, relevant to the previous points, that this is not an isocapnic ventilatory response, so the hypoxic response is certainly confounded by the changing CO2 which may not mimic situations like sleep apnea.

      We thank the reviewer for this comment. In the revised manuscript, we added that we have applied ‘acute’ hypoxic/hypercapnic challenges and discussed the limitation of our study.

      Reviewer #2 (Public Review):

      I do not see any fundamental flaws in it as such.<br> However, what really compromises the paper, it the lack of a "punch line". It is highly descriptive rather than analytical, it reads like a list of mostly predictable outcomes, but what is the question, what is the novelty, why is it important... This does not come out at all. On one hand it is important to have such basic information about marmosets but is it best placed into a non-specialist journal? In addition, the whole point of getting involved with monkeys is because they are closer to humans than rodents, but authors did not fully explore these similarities/differences or focus on them or try to explain them. One would want to have a clear conclusion in the end, how closely they resemble humans, for what type of experiments they are better than rodents, because of what... But this is not evident. Neither is it clear what is the value of the novel protocol for data analysis which seems to have been a major effort. In the end we are left with the impression that the results you get with it are the same as with the old protocols... What is its value then? Something needs to be done to make this paper attract readers others but only specifically interested in this topic.

      We thank the reviewer for these comments. We acknowledge that the initial submission was not as clear as we had hoped. We have revised the manuscript and added more details about our new analysis tool and further strengthened its applicability by including new analysis from a rodent model. We believe the major contribution of this manuscript to the field is providing a new open-source tool to analyze complex breathing behavior signals in conscious, awake, and active laboratory animals. In this manuscript, we demonstrate the strength of this approach in rapidly expediting analysis of breathing behaviors, which we analyze of the common marmoset and rat, yet could be equally applicable to other animal models.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript addresses a major issue facing consumers of structure-organism pair data: the landscape of databases is very difficult to navigate due to the way data is made available (many resources do not have structured data dumps) and the way data is standardized (many resources' structured data dumps do not standardize their nomenclature or use stable entity identifiers). The solution presented is a carefully constructed pipeline (see Figure 1) for importing data, harmonizing/cleaning it, automating decisions about exclusions, and reducing redundancy. The results are disseminated through Wikidata to enable downstream consumption via SPARQL and other standard access methods as well as through a bespoke website constructed to address the needs of the natural products community. The supplemental section of the manuscript provides a library of excellent example queries for potential users. The authors suggest that users may be motivated to make improvements through manual curations on Wikidata, through semi-automated and automated interaction with Wikidata mediated by bots, or by addition of importer modules to the LOTUS codebase itself.

      Despite the potential impact of the paper and excellent summary of the current landscape of related tools, it suffers from a few omissions and tangents:

      1. It does not cite specific examples of downstream usages of structure-organism pairs, such as an illustration on how this information in both higher quantity and quality is useful for drug discovery, agriculture, artificial intelligence, etc. These would provide a much more satisfying bookend to both the introduction and conclusion.

      Thank you for this remark. We deliberately decided not to insist too heavily on the application examples of the LOTUS outputs. Indeed we are somehow biased by our main investigation field, natural products chemistry, and expect that the dissemination of specialized metabolites occurrences will benefit a wide range of scientific disciplines (ecology, drug discovery, chemical ecology, ethnopharmacology, etc.)

      However, Figure 5 was established to illustrate how the information available through LOTUS is quantitatively (size) and qualitatively (color classes) superior to what is available through single natural products resources.

      As added in the introduction, one of the downstream usages of those pairs is for example to perform taxonomically informed scoring as described in https://doi.org/10.3389/fpls.2019.01329. Obtaining an open database of natural products’ occurrences to fuel such taxonomically informed metabolite annotation tools was the initial impulse for us to build LOTUS. These metabolite annotation strategies, tailored for specialized metabolites, have been shown to offer appreciable performance improvements for current state-of-the-art computational metabolite annotation tools. Since metabolite annotation is still regularly cited as “the major bottleneck” in metabolomics in the scientific literature over the last 15 years (https://europepmc.org/article/med/15663322, https://doi.org/10.1021/acs.analchem.1c00238), any tangible improvement in this field is welcome. With LOTUS we offer a reliable and reusable structures-organisms data source that can be exploited by the community to tackle such issues of importance.

      Other possible usages are suggested in the conclusion, but benchmarking or even exemplifying such uses is clearly out of the scope of this paper, each one of them being an article per se.

      The additional queries are written in our first answer (see “essential revisions”) and demonstrate the impact of LOTUS on accelerating the initial bibliographic survey of chemical structures occurrences over the tree of life.

      This query (https://w.wiki/4VGC) can be compared to a literature review work, such as https://doi.org/10.1016/j.micres.2021.126708. In seconds, it allows retrieving a table listing compounds reported in given taxa and limits the search by years.

      1. The mentions of recently popular buzzwords FAIR and TRUST should be better qualified and be positioned as a motivation for the work, rather than a box to be checked in the modern publishing climate.

      It is true that the modern publishing system certainly suffers from some drawbacks (also critically mentioned within the paper). However, after consultation of all authors, we believe that because LOTUS checks both boxes of FAIR and TRUST, we would rather stick to these two terms. In our view, rules 1 (Don’t reinvent the wheel) and 5 (put yourself in your user’s shoes) of https://doi.org/10.1371/journal.pcbi.1005128 apply here. Both terms are indeed commonly (mis-)used but we felt that redefining other complicated terms would not help the reader/user.

      1. The current database landscape really is bad; and the authors should feel emboldened to emphasize this in order to accentuate the value of the work, with more specific examples on some of the unmaintained databases

      We perfectly agree with this statement and it is the central motivation of the LOTUS initiative to improve this landscape. It was a deliberate choice not to emphasize how bad the actual landscape is, but rather to focus on better habits for the future. We do not want to start devaluing other resources and elevate our initiative at the cost of others. We also believe that an attentive look at the complexity of the LOTUS gathering, harmonization, and curation speaks for itself and describes the huge efforts required to access properly formatted natural products occurrence data.

      If the reviewer and editors insist, although not in our scope, we are happy to list a series of specific (but anonymized) examples of badly formatted entries, of wrong structures-organisms associations, or poorly accessible resources.

      1. While the introduction and supplemental tables provide a thorough review of the existing databases, it eschews an important more general discussion about data stewardship and maintenance. Many databases in this list have been abandoned immediately following publication, have been discontinued after a single or limited number of updates, or have been decommissioned/taken down. This happens for a variety of reasons, from the maintainer leaving the original institution, from funding ending, from original plans to just publish then move on, etc. The authors should reflect on this and give more context for why this domain is in this situation, and if it is different from others.

      We do agree with the reviewer and added a “status” column in the table https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv We chose 4 possible statuses:

      • Maintained (self-explanatory)
      • Unmaintained: the database did not see any update in the last year.
      • Retired: the authors stated they will not maintain the database anymore.
      • Defunct: the database is not accessible anymore

      As for question 3 above, we decided not to focus too heavily on the negative points and resume the current situation in the previous table. Reasons for the databases publishing being in this situation are multiple, and we think they are well summarized in https://doi.org/10.1371/journal.pcbi.1005128 (Rule 10: Maintain, update, or retire), already cited in the manuscript introduction.

      1. Related to data stewardship: the LOTUS Initiative has ingested several databases that are no longer maintained as well as several databases with either no license or a more restrictive license than the CC0 under which LOTUS and Wikidata are distributed. These facts are misrepresented in Supplementary Table 1 (Data Sources List), which links to notes in one of the version controlled LOTUS repositories that actually describes the license. For example, https://gitlab.com/lotus7/lotus-processor/-/blob/8b60015210ea476350b36a6e734ad6b66f2948bc/docs/licenses/biofacquim.md states that the dataset has no license information. First, the links should be written with exactly what the licenses are, if available, and explicitly state if no license is available. There should be a meaningful and transparent reflection in the manuscript on whether this is legally and/or scientifically okay to do - especially given the light that many of these resources are obviously abandoned.

      This point is a very important one. We did our best to be as transparent as possible in our initial table. Following the reviewer’s suggestion, we updated it to better reflect the licensing status of each resource (https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv). Therefore, we removed the generic “license” header, which could indeed be misleading, and replaced it with ”licensing status”, filled with the attributed license type and hyperlink to its content). It remains challenging since some resources changed their copyright in the meantime. We remain at the editor and reviewers’ disposal for any further improvement.

      Moreover, as stated in the manuscript, we took care of collecting all licenses and contacted authors of resources whose license was not perfectly explicit to us, therefore accomplishing our due diligence. Additionally, we contacted legal offices in our University and explained our situation. We did everything that we had been advised.

      1) To the best of our knowledge, the dissemination of the LOTUS initiative data falls under the Right to quote for scientific articles, as we do not share the whole information, but only a very small part.

      2) We do not redistribute original content. What comes out of LOTUS has undergone several curation and validation steps, adding value to the original data. The 500 random test entries, provided in their original form for the sake of reproducibility and testing, are the only exception.

      Many scientific authors forget about the importance of proper licensing. While it might be deliberate to restrict the use, inappropriate license choice (or omission) is too often due to a lack of information on its implication.

      All authors of the utilized resources can freely benefit from our curation. We are sharing with the community the results of our work, while always citing the original reference.

      Concerning the possible evolution of licensing, it remains a real challenge. While we tried to “freeze” the license status when we accessed the data, some resources updated their licensing since then. This can be tracked in the git history of the table (https://github.com/lotusnprod/lotus-processor/blob/main/docs/dataset.csv). Discrepancies between our frozen licensing (at the time of gathering) and actual license can therefore occur. Initiatives such as https://archive.org/web could help solving this issue, coming with other legal challenges.

      1. The order of sections of the manuscript results in several duplicated, but not further substantiated explanations. Most importantly, the methods should be much more specific throughout and the results/discussion should more heavily cross-link to it, as a reader who examines the paper from top to bottom will be left with large holes of misunderstanding throughout.

      As our paper focuses a lot on the methods, the barrier between results & methods becomes thinner. We took into account the reviewers’ suggestions and added some additional cross-links for the reader to be able to quickly access related methods.

      1. The work presented was done in a variety of programming languages across a variety of repositories (and even version control systems), making it difficult to give a proper code review. It could be argued that the most popular language in computational science at the moment is Python, with languages like R, Bash, and in some domains, still, Java maintaining relevance. The usage of more esoteric languages (again, with respect to the domain) such as Kotlin hampers the ability for others to deeply understand the work presented. Further, as the authors suggest additional importers may implemented in the future, this restricts what external authors may be able to contribute.

      Scientific software has indeed always been written in multiple languages. To this day, scientists have used all kinds of languages adapted both to their needs and their knowledge. Numpy uses Fortran libraries and many projects published in biology and chemistry recently are in Java, R, Python, C#, PHP, Groovy, Scala… We understand that some authors are more comfortable with one language or another. But R syntax is for example much more distant from Python's syntax than Kotlin can be. We needed a highly performant language for some parts of the pipeline and R, Bash, or Python were not sufficient. We decided to use Kotlin as it provides an easier syntax than Java while staying 100% compatible with it.

      The advantage of the way LOTUS is designed is that importers are language-agnostic. As long as the program can produce a file or write to the DB in the accepted format, it can be integrated into the pipeline. This was our goal from the beginning, to have a pipeline that can have its various parts replaced without breaking any of the processes.

      1. As a follow up to the woes of point 4., 5., and 7., the manuscript fails to reflect on the longevity of the LOTUS Initiative. Like many, will the project effectively end upon publication? If not, what institutions will be maintaining it for how long, how actively, and with what funding source? If these things are not clear, it only seems fair to inform the reader and potential user.

      LOTUS is an initiative that aims to improve knowledge management and sharing in natural products research. Our first project, which is the object of the current manuscript, is to provide a free and open resource of natural products occurrences for the scientific community. Its purpose is not to be a database by itself, but instead to provide through Wikidata and associated tools a way to access natural products knowledge. The objective was not to create yet another database (https://doi.org/10.1371/journal.pcbi.1005128), but instead to remove this need and give our community the tools and the power to act on its knowledge. This way, as everything is on Wikidata, the initiative is not “like many”. This also means that this project should not be considered and evaluated exactly like a classical DB. Once the initial curation, harmonization, and dissemination jobs have been done, they should ideally not be run again. The community should switch to Wikidata as a point of access, curation, and addition of data. If viewed with such arguments in mind, yes, LOTUS can live long!

      Wikimedia is a public not-for-profit organization, whose financial development appears to indicate solid health https://en.wikipedia.org/wiki/Wikimedia_Foundation#Finances.

      In terms of funding sources, we would like to refer to https://elifesciences.org/articles/52614#sa2 , which stated the following in response to a similar question: "Wikidata is sustained by funding streams that are different from the vast majority of biomedical resources (which are mostly funded by the NIH). Insulation from the 4-5 year funding cycles that are typical of NIH-funded biomedical resources does make Wikidata quite unique." The core of the Wikidata funding streams are donations to the Wikipedia ecosystem. These donations - with a contributor base of millions of donors from almost any country in the world, chipping in at an average order of magnitude of around 10 dollars - are likely to continue as long as that ecosystem is useful to the community of its users. See <https://wikimediafoundation.org/about/financial-reports for details>.

      1. Overall, there were many opportunities for introspection on the shortcomings of the work (e.g., the stringent validation pipeline could use improvement). Because this work is already quite impactful, I don't think the authors will be opening themselves to unfair criticism by including more thoughtful introspection, at minimum, in the conclusions section.

      We agree with the reviewer and therefore, list again the major limitations of our processing pipeline:

      First, our processing pipeline is heavy. It includes many dependencies and requires a lot of time for understanding. We are aware of this issue and tried to simplify it as much as possible while keeping what we considered necessary to ensure high data quality. Second, it can sometimes induce errors. Those errors, ranging from unnecessary discarded correct entries to more problematic ones can be attributed to various parameters, reflecting the variety of our input. We will therefore try listing them, keeping in mind that the list won’t be exhaustive. For each detected issue, we tried fixing it at best, knowing it will not lead to an ideal result, but hopefully increase data quality gradually.

      ● Compounds

      ○ Sanitization (the three steps below are performed automatically since we observed a higher ratio of incorrect salts, charged or dimerized compounds. However, this also means that true salts, charged or dimeric compounds were erroneously “sanitized”.)

      ■ Salt removals

      ■ Charged molecules

      ■ Dimers

      ○ Translation (both processes below are pretty error-prone)

      ■ Name to structure

      ■ Structure to name

      ● Biological organisms

      ○ Synonymy

      ■ Lotus (https://www.wikidata.org/wiki/Q3645698, https://www.wikidata.org/wiki/Q16528).

      This is also one of the reasons why we decided to call the resource Lotus, as it illustrates part of the problem.

      ■ Iris (https://www.wikidata.org/wiki/Q156901, https://www.wikidata.org/wiki/Q2260419)

      ■ Ficus variegata (https://www.wikidata.org/wiki/Q502030, https://www.wikidata.org/wiki/Q5446649)

      ○ External and internal dictionaries are not exhaustive, impacting translation

      ○ Some botanical names we use might not be the accepted ones anymore because of the tools we use and the pace taxonomy is renaming taxa.

      ● References

      ○ The tool we favored, Crossref, returns a hit whatever the input. This generates noise and incorrect translations, which is why our filtering rules focus on reference types.

      ● Filtering rules:

      ○ Limited validation set, requires manual validation

      ○ Validates some incorrect entries (False positives)

      ○ Does not validate some correct entries (False negatives)

      Again, our processing pipeline removes entries we do not yet know how to process properly.

      Our restrictive filters but substantial contribution to Wikidata in terms of structure-organisms pairs data upload should hopefully incentivize the community to contribute by further adding its human validated data.

      We updated the conclusion part of the manuscript accordingly. See https://github.com/lotusnprod/lotus-manuscript/commit/a866a01bad10dfd8b3af90e2f30bb3ae51dd7b9e.

      Reviewer #2 (Public Review):

      Rutz et al. introduce a new open-source database that links natural products structures with the organisms they are present in (structure-organism pairs). LOTUS contains over 700,000 referenced structure-organism pairs, and their web portal (https://lotus.naturalproducts.net/) provides a powerful platform for mining literature for published data on structure-organism pairs. Lotus is built within the computer-readable Wikidata framework, which allows researchers to easily contribute, edit and reuse data within a clear and open CC0 license. In addition to depositing the database into Wikidata, the authors provide many domain-specific resources, including structure-based database searches and taxon-oriented searches.

      Strengths:

      The Lotus database presented in this study represents a cutting-edge resource that has a lot of potentials to benefit the scientific community. Lotus contains more data than previous databases, combines multiple resources into a single resource.

      Moreover, they provide many useful tools for mining the data and visualizing it. The authors were thoughtful in thinking about the ways that researchers could/would use this resource and generating tools to make it ways to use. For example, their inclusion of structure-based searches and multiple taxonomy classification schemes is very useful.

      Overall the authors seem conscientious in designing a resource that is updatable and that can grow as more data become available.

      Weaknesses/Questions:

      1) Overall, I would like to know to what degree LOTUS represents a comprehensive database. LOTUS is clearly, the best database to date, but has it reached a point where it is truly comprehensive, and can thus be used for a metanalysis or as a data source for research questions. Can it truly replace doing a manual literature search/review?

      As highlighted by the reviewer, even if LOTUS might be the most comprehensive natural products occurrences ressources at the moment, TRUE or FULL comprehensive quality of such resource will always be limited to the available data in the litterature. And the community is far from fully describing the metabolome of living beings. We however hope that the LOTUS infrastructure will offer a good place to start this ambitious and systematic description process.

      1) Yes it can serve as data source for research questions, as exemplified in the query table

      2) No, it cannot and must not replace manual literature search. Manual literature search is the best but at an enormous cost. If the outcome of such search can be made available to the whole community (eg. via Wikidata), the value of such would be even bigger. However, LOTUS can expedite a decent part of a manual litterature search and liberate time to complement this search. See our comment to the editors “To further showcase the possibilities opened by LOTUS, and also answer the remark on the comprehensiveness of our resource, we established an additional query (https://w.wiki/4VGC).This query is comparable to a literature review work, such as: https://doi.org/10.1016/j.micres.2021.126708. In seconds, it allows retrieving a table listing compounds reported in given taxa and limits the search by years.”

      We added these examples in the manuscript (see https://github.com/lotusnprod/lotus-manuscript/commit/a6ee135b83e56e8e2041d09d7ce2d5b913c1029d)

      2) Data Cleaning & Validation. The manuscript could be improved by adding more details about how and why data were excluding or included in the final upload. Why did only 30% of the initial 2.5 million get uploaded? Was it mostly due to redundant data or does the data mining approach result in lots of missed data?

      The reason for this “low” yield is that we highly favored quality over quantity (as in the F-score equation, ß being equal to 0.5, so more importance is given to the precision than the recall). Of course there is redundancy, but the rejected entries are mostly because of too low confidence level according to our developed rules. It is not fully discarded data as we keep it for further curation (ideally including the community) before uploading to Wikidata. We adapted the text accordingly.

      3) Similarly, more information about the accuracy of the data mining is needed. The authors report that the test dataset (420 referenced structure-organisms pairs) resulted in 97% true positives, what about false negatives? Also, how do we know that 420 references are sufficiently large to build a model for 2.5M datapoints? Is the training data set is sufficiently large to accurately capture the complexities of such a large dataset?

      False negatives are 3%, which is, in our opinion, a fair amount of “loss” given the quality of the data. We actually manually checked 500+ documented pairs, which is more or less the equivalent of a literature review. We were careful in sampling the entries in the right proportions, but we cannot (and did not) state they are enough. We cannot model it either, since the 2.5M+ points have absolutely different distributions, in terms of databases, quality, etc. Only “hint” is the similar behaviour among all subsets. (the 420 + 100 entries) were divided between 3 authors, which obtained similar results.

      4) Data Addition and Evolution: The authors have outlined several mechanisms for how the LOTUS database will evolve in the future. I would like to know if/how their scripts for data mining will be maintained if they will continue to acquire new data for the database. To what extent does the future of LOTUS depend on the larger natural products community being aware of the resource and voluntarily uploading to it? Are there mechanisms in place such as those associated with sequencing data and NCBI?

      Programs have been not only maintained but also updated with new possibilities (as, for example: the addition of a “manual mode” allowing user to run the LOTUS processing pipeline on a set of their own entries and make them Wikidata-ready (https://github.com/lotusnprod/lotus-processor/commit/f49e4e2b3814766d5497f9380bfe141692f13f23). We will of course do our best to keep on maintaining it, but as no one in academia can state he/she will maintain programs forever. However the LOTUS initiative hopefully embraces a new way of considering database dynamics. If the repository and website of the LOTUS initiative shut down tomorrow, all the work done will still be available to anyone on Wikidata. Of course, future data addition strongly relies on community involvement. We have already started to advocate for the community to start taking part of it, in the form of direct upload to Wikidata, ideally. At the time, there are no mechanisms in place to push publishing of the pairs on Wikidata (as for sequencing, mass spec data), but we will be engaged in pushing forward this direction. The initiative needs stronger involvement of the publishing sector (also reviewers) to help change those habits.

      5) Quality of chemical structure accuracy in the database. I would imagine that one of the largest sources of error in the LOTUS database would be due to variation in the quality of chemical structures available. Are all structure-organism pairs based on fully resolved NMR-based structures are they based on mass spectral data with no confirmational information? At what point is a structural annotation accurate enough to be included in the database. More and more metabolomics studies are coming out and many of these contain compound annotations that could be included in the database, but what level (in silico, exact mass database search, or relative to a known standard) are required.

      This is a very interesting point and some databases have this “tag” (NMR, cristal, etc.). We basically rely on original published articles, included in specialized databases. If poorly reported structures have been accepted for publication, labelled as “identified” (and not “annotated”) and the authors publishing the specialized databases overlooked it, we might end up with such structures.

      Here, the Evidence Ontology (http://obofoundry.org/ontology/eco.html) might be a good direction to look at and further characterize the occurrences links in the LOTUS dataset.

      Reviewer #3 (Public Review):

      Due to missing or incomplete documentation of the LOTUS processes and software, a full review could not be completed.

      Some parts of LOTUS were indeed not sufficiently described and we improved both our documentation and accessibility to external users a lot. We thank the reviewer for insisting on this point as it will surely improve the adoption of our tool by the community.

    1. Author Response:

      Reviewer #2 (Public Review):

      Mattis et al have used a hemizygous mutant of the gene Scn1a to study changes underlying the severe epilepsy disorder Dravet syndrome. They describe a change in activation of the dentate gyrus in this mouse model, due to altered excitatory synaptic input. They show that this occurs in the age range after normalization of early inhibitory interneuron dysfunction. This provides an interesting potential mechanism by which neural circuit function is altered even after deficits in inhibition are seemingly corrected. They also report that stimulation of inputs to the dentate gyrus increase seizure susceptibility when body temperature is elevated. Overall these findings indicate a new form of circuit dysfunction that may underlie the etiology of this severe genetic epilepsy disorder.

      These findings are not fully complete, and the manuscript suffers from some flaws in experimental design.

      The most pressing issue is the lack of a counter-balanced design in experiments testing the ictogenicity of DG stimulation. The authors attempt to justify this stating "there is a theoretical concern that seizure threshold on Day 2 (the second consecutive day of stimulation) could be lowered by a seizure 24 hours prior (a "kindling"-like phenomenon)". In the very next sentence, they cite a study in which this phenomenon has been shown (thus the concern is not theoretical). That said, this is not a semantic argument, but a flaw in experimental design. On day 1, the authors perform experiment A. On day 2, they perform experiment A+B. In an attempt to show that performing experiment A on day 1 does not by itself lead to changes in experiment A+B, they use a separate cohort and show that experiment A does not lead to changes in a repetition of experiment A. Unfortunately, this is not an adequate control. Experiment A+B involves a different set of stimuli, to which the response could very well be altered by the day 1 experiment, but this change would not be revealed with the described experimental design. To determine whether the effect shown in experiment A+B requires a more rigorous, counter-balanced experimental design where one group undergoes experiment A followed by experiment A+B, and a second group undergoes experiment A+B followed by experiment A.

      Thank you for this important critique.

      → We agree with these points and have repeated this experiment using an improved experimental design (Figure 6). We now present data from three groups of mice: Scn1a-ChR2 (experimental mice), Scn1a-YFP (photostimulation control), and WT-ChR2 (genotype control), tested on a single day (obviating concerns about day 2).

      → Please note that this revised manuscript includes an additional ictogenicity experiment (Figure 7), in which we employ the proposed counter-balanced experimental design.

      The second major issue is a lack of wild type control groups for several experiments. The experiments presented in Figures 4, 6C and F, and 7 all lack the necessary wild type control measures. Wild type controls were done for Figure 6E, but the data are not presented in the figure.

      This is also an important point.

      → For the Hm1a experiment (Figure 4), we now present wild-type control data for both PV-IN electrophysiology and 2P circuit-level imaging (Figure 4 – figure supplement 1).

      → We have removed the optogenetic imaging data (previously Figure 6C).

      → The entorhinal cortex ictogenicity experiment (Figure 6) has been re-designed, as per above, and includes appropriate controls.

      → For the experiment demonstrating a decrease in circuit activation in response to PV-IN stimulation (now Figure 8), we were not able to perform a wild-type control due to very low levels of wild-type activation under those conditions (see Figure 2 panel A3 – response to 1 pulse in young adult wild-type mice), as noted in the comments in response to the critique of Reviewer #1. In other words, in the wild-type mice, there was essentially no signal to block. In this experiment we in fact conceptualize the SST activation as the control group (for the PV activation), which we clarify in the text.

      Some of the cell physiology experiments presented were not optimally designed to provide a relevant mechanistic follow-up to the major findings. For the first major finding of the paper, Figure 2 shows clear and interesting changes in DG activation in the mouse model, and Figure 5 reveals changes to synaptic excitation and inhibition in these neurons. Figure 3 and 4 present data showing changes to PV-interneuron intrinsic properties that only reveal themselves under very intense stimulation. While these findings are interesting and worthy of follow-up, the changes aren't relevant to the synaptic stimulation used in Figure 2.

      Thank you for this important comment. We now include additional data, as follows:

      → A parallel dataset quantifying intrinsic properties in the early postnatal timepoint (Figure 3 – figure supplement 1; Table 2). We find that the PV-INs are much more profoundly impaired at this younger timepoint, which further argues against PV-IN dysfunction as the cause of the increased DG activation seen in young adult Scn1a mice relative to wild-type; i.e., PV-IN excitability partially normalizes with development in Scn1a+/- mice, whereas the DG hyperactivation becomes more severe.

      → Synaptic data from the early postnatal timepoint (Figure 5 – figure supplement 2), in which we find no genotype difference in the E/I ratio or EPSC magnitude.

      → PPR at both timepoints, showing no genotype difference in the early postnatal mice, but a higher release probability in the young adult mice.

      Finally, Figure 2 has missing data points, seemingly due to cropping of panels. Data visualization is problematic for this vital figure. The fit lines for individual experiments overwhelm the color-filled variance of the mean. Thus, the data in this figure are very difficult to read and interpret. The figure would benefit from including all the individual data points and summary data, but removing the individual fits or putting them into a supplement.

      We appreciate this very helpful feedback. We now present a “cleaner” version of this main Figure (Figure 2), with the individual fit lines shown in a supplemental Figure (Figure 2 – figure supplement 1).

      Reviewer #3 (Public Review):

      The authors tackle an interesting question - whether the dentate gyrus is a locus of pathology in Scn1a+/- mice and uncover a strong phenotype - the granule cells of the dentate gyrus are over-activated and the EC to dentate pathway is prone to seizure genesis. In the discussion, they suggest that their results support the idea that the DG may be a common locus to several different types of epilepsy… an attractive hypothesis! There are several strengths of the paper. The team has done a nice job of presenting 'ground-truth' data that their measurements of dF/F across a large population of granule cells correlates with action potentials in these cells. As the authors point out, this is especially important when working in disease models in which the dF/F-action potential relationship may be altered. Throughout, the authors were also careful about considering the limitations of their various techniques and analyze the data in several ways to account for possible artifacts (e.g. ensuring that differences in activation are not arising because of slicing and consideration of kindling in later in vivo seizure threshold experiments). The experiments were well designed and appropriately interpreted.

      One of most intriguing results of the work is that PV interneurons in the DG of Scn1a+/- show only very minor impairments in young adult animals (they show more spike accommodation than in control animals). Rather, it seems that the GCs receive enhanced excitation from the entorhinal cortex. They perform a set of pharmacological experiments to prove that PV interneurons (and more generally inhibition) do not account for the difference in granule cell activation - however, here it would be useful to see the data summarized more consistently. It is difficult to interpret the pharmacological results (both of which are presented as changes in dF/F0) with respect to the initial findings of the manuscript (presented as estimated activation across the entire population).

      We appreciate this helpful suggestion. We agree that the presentation of the calcium imaging data in the initial submission made data interpretation more difficult for the reader. In this revised manuscript we have improved the consistency of presentation of the calcium imaging data. Please note however that we conceptualize this imaging data as fitting into two categories, which do require different graphical depiction: 1) Unpaired data in which we analyze responses across a range of stimulation conditions, shown in Figure 2 and associated Figure 2 – figure supplement 1 and Figure 2 – figure supplement 3; and 2) Paired data in which we assess the response within a given imaging field to a manipulation performed at a single stimulation condition (Hm1a data in Figure 4 and Figure 4 – figure supplement 1; PTX data in Figure 5 and Figure 5 – figure supplement 2; PV-IN data in Figure 8)

      A beautiful aspect of this work is that it goes from cells to circuits to intact brain (in vivo). They nicely show that the heightened excitation from the EC to the DG is sufficient to drive seizures in the Scn1a+/- mice, and finally that since PVs are intact, they can be harnessed to balance out the over activation of GC via optogenetic stimulation of PVs.

    1. Author Response:

      We thank the editors and reviewers for their assessment of our manuscript on the instructive role of enhancer activity in the probability of gene allele activation and random monoallelic expression, and the associated helpful comments.

      Concerning the possibility that our findings apply to gene types other than hematopoietic-related genes, we believe that answer is yes. In fact, the first documented examples of enhancers regulating the probability of target gene expression were in nonhematopoietic cells: the non-hematopoietic cell lines CV-1 and HeLa (Weintraub PNAS 1988 PMID: 3045805; Walters et al. PNAS 1995 PMID: 7624382). Furthermore, the characteristic constitutive accessibility of enhancers at RME loci regardless of expression of the gene, which is suggestive of probabilistic effects, is shared by hematopoietic and non-hematopoietic (neural lineage) cells (Xu et al. Nat. Genet. 2017 PMID: 28112738). Together with our study, the available evidence argues for a unified role of enhancer activity in determining gene expression probabilities across cell types.

      Concerning the thoughtful review from reviewer 1, these dynamics are not limited to genes encoding cell surface receptors. Recent work that we cited in our manuscript showed that a distal enhancer regulates the expression probability of the gene encoding the transcription factor Bcl11b which determines the T cell fate (Ng et al. eLife 2018 PMID: 30457103).

      In summary, while we focused on the NK cell receptor genes and genes encoding other cell surface proteins in various hematopoietic cell types due to experimental tractability, we believe it is unlikely that our findings will be restricted to specific cell types or to receptor genes.

      Reviewer #1:

      This study investigates the role of enhancer activity in the regulation of stable random monoallelic expression (RME) using the Ly49 and Nkg2 receptor gene families expressed in natural killer (NK) cells, as models of RME genes. The authors show that, unlike promoters of RME genes, enhancer are accessible on both alleles and display histone marks of active enhancers. Moreover, they show that weakening enhancer activity, via CRISPR-mediated deletion, can lower the frequency of gene expression or lead to variegated expression patterns, that are reminiscent of RME. The manuscript is clearly written and the data presented are compelling. This study takes advantage of previously-characterised allele-specific antibodies for various genes expressed in NK cells, a powerful tool allowing the analysis of random monoallelic expression (RME) at the protein and single-cell level within a population. The use of these antibodies allows the investigation of in vivo cell population and circumvents the analysis at the RNA level, which is limited by expression bursts and transcript levels. The authors also substantiate their model using examples of receptor genes expressed in other cell types from the hematopoietic lineage. One question that remains is whether this model applies to other developmentally regulated stable RME genes, that are 1-not expressed at the cell surface (such as transcription factors) and 2- expressed in other cell lineages? It is also unclear what defines the strength of an enhancer upstream of the RME genes studied, e.g. what is the difference between a weak enhancer for Ly49 genes and strong enhancer. These points should be of broad interest for the readers and could be discussed further in the discussion part of the manuscript.

      We thank the Reviewer 1 for very thoughtful comments.

      First, we would like to address the reviewer’s question concerning whether our findings apply to genes that encode transcription factors or other proteins that are not on the cell surface. This indeed IS the case based on recent work we cited showing that a distal enhancer regulates the expression probability of the gene encoding the transcription factor Bcl11b, which determines the T cell fate (Ng et al. eLife 2018 PMID: 30457103). That our findings also apply to genes expressed in non-hematopoietic cells is addressed in the response above to the evaluation summary.

      We also welcome the opportunity to elaborate on enhancer “strength”, albeit somewhat speculatively. Enhancer activity acting upon a locus varies quantitatively in a context-dependent manner. The strength of enhancer activity is likely a function of several factors including (but not limited to) A) the collective (nonredundant) effects of multiple enhancers in genes that have more than one; B) the concentration of enhancer-binding transcription factors (TFs) in the nucleus; C) the affinity of those factors for the target DNA sequences; D) interactions of the relevant transcription factors with each other and with other components of the transcriptional machinery; E) interactions of the enhancer with the specific promoters; and F) the distance between an enhancer and a promoter. Concerning A), our work suggests that where multiple enhancers are present, elimination of one of them reduces overall enhancer strength/activity, resulting in a lower frequency of gene expression. Relevant to B) is work from several groups showing that Ly49 expression frequencies change when relevant TF expression levels are experimentally altered (Held et al. Immunity 1999 PMID: 10549625; Ohno et al. Int. Immunol. 2008 PMID: 18003603; Bezman et al. J. Exp. Med 2011 PMID: 22124110); those results suggest that one means by which enhancer activity may be increased is by increasing the concentration of available TFs. Relevant to F), enhancer-promoter distance may play a role in determining enhancer “strength”, as recent work has shown a distance-dependent binary effect of enhancers on gene expression in integrated reporters (Rinzema et al. BioRxiv 2021 https://doi.org/10.1101/2021.10.05.463209). Fully fleshing out the definition of enhancer strength in the context of RME gene expression will likely accompany a better understanding of how enhancers work generally, a subject of intense current study in the field. Finally, we do not exclude are role for the promoter, which may possess varying levels of intrinsic “competence” to be activated by the collective enhancer activity acting upon it.

    1. Author Response:

      Joint Public Review:

      Davis et al. parameterize a published, coarse-grained classical density functional theory (DFT) model to describe the free energy landscape of the FG-NTR system. They leverage their previously published experimental data (Zahn et al. eLife, 2016) to develop the model of inter-molecular cohesion calculations, which were tuned to reproduce their previous experimental results. The authors investigate NTR binding behavior to the planar film of FG-nups, first for single NTRs and then by combinations of NTRs. They confirm that the higher concentration of NTRs in the FG-nup films decreases their affinity to the film, which provides one rationale to explain the "transport paradox" of NTRs, which bind specifically to FG-nups but transit the NPC extremely rapidly and at high density. The second result is that increasing the concentration of one of the transport receptors in the film (by increasing its bulk concentration) reduces the adsorbed amount of the other transport receptor (whose concentrations is fixed). Last, the authors thus suggest that within some NTR concentration regimes there emerges a phase separation of the two NTRs such that NTF2 (small NTRs) locate near the surface while importin beta (large NTRs) go to the film/solution interface, implying the existence of separate transport pathways inside the NPC, which has been reported previously in experimental findings.

      There was broad enthusiasm for the model, which was found to be interesting, relevant, and to have successfully delivered testable insights. In general, the conclusions were found to be supported by the model outcomes. The segregation of small and large NTRs to different regions of the film was found to be an interesting result. Some results were found to be less exciting, for example the effect of competition between NTRs as they possess only repulsive interactions in the model.

      While there was some disagreement about the quality of the writing, there was a consensus that the explanation of the motivation, methodology, and impact of the conclusions was not sufficient. In particular, the reviewers felt there was a lack of sufficient context related to prior work in the field in the introduction and discussion and the need to better articulate the impact of the findings in the study. Thus, although the work was found by some to be a meaningful contribution addressing two important questions in the NPC field: how different NTRs are organized within the permeability barrier and if NTR organization and dynamics contribute to the efficient rates of nucleocytoplasmic transport through the crowded environment of the NPC, this point needs to be made clearer. Moreover, more attention is needed to previous theoretical works related to protein adsorption in polymer brushes.

      There was a consensus that the authors could have increased the impact of the work by broadening the study to investigate (or at a minimum discuss) 1) how the combination of NTRs with inert molecules behave (i.e. does the addition of NTRs influence the exclusion of inert cargo?); and 2) how cargo bound to the NTRs (particularly NTF2, which has a single cargo - Ran) influences the results (e.g. would the importin-beta effect be exacerbated by its coupling to an "inert" cargo?). A related theme was concern over the potential impact that the geometry of the NPC in vivo would have on the model outcomes, which speaks to the biological relevance. While the authors mention this issue in the Discussion, more directly addressing whether they can speculate on how their results will change for a cylindrical geometry and how the calculations would compare in a system with opposing surfaces (i.e., two surfaces modified by polymer brushes) was warranted. The latter system was felt to be a good proxy to understand how the effects of nanoconfinement in a cylindrical geometry may affect the results.

      We thank the reviewers for the thorough and comprehensive scientific evaluation of our manuscript and for the constructive feedback as articulated in their comments.

      Summary of Major Changes:

      We have added 28 individual data plots, in the form of four additional supplemental figures, in response to the comments of the reviewers. Importantly, we have produced an additional main figure containing a qualitative phase diagram that concisely summarizes the essential physical picture resulting from our work. In response to criticism about the explanation of our work, we have made substantial changes to the introduction, methods, results, and discussion sections in line with the feedback from the reviewers. Overall, we believe that the manuscript is much stronger than before in terms of the science, the relation to the existing literature, and the clarity in conveying our major assumptions.

    1. Author Response:

      Reviewer #2 (Public Review):

      I think this is a very interesting and timely contribution to the literature. It combines a dynamical systems perspective and single cell data in a very neat and exciting combination in order to identify aspects of the EMT process and dynamics.

      This is an ambitious and multi-faceted study and draws on a wide range of experimental, data science, and modelling tools and techniques. Overall I really liked the scope and focus of the study. I do believe that there are a few points where the arguments can be tightened and I will focus on those aspects.

      General Comments:

      In order to capture the dynamics the authors should perhaps engage with the arguments in Cruel and Flandoli (J Dynamics Diff Equations) which prove that additive noise destroys a pitchfork bifurcation. Related to this I think the arguments in PMC3372930 should be considered. They make a case against the pitchfork bifurcation on purely dynamical grounds. In PMID: 27616569 the arguments are not made quite as forceful but this is an excellent background reference. Against this background it is probably not surprising that the dynamics are best explained by saddle node bifurcations.

      One potential concern relates to the construction of the Langevin equation. Additive noise is a very specific choice and needs to be clearly justified. It is convenient, but not based on any physical reasoning in this case. We know that multiplicative noise (e.g. in the chemical Langevin equation, or geometric noise) will qualitatively alter the dynamics compared to the deterministic model. Much of the discussion in lines 250-260 is therefore limited or restricted to the case of additive noise and this needs to be made explicit. If additive noise is chosen because reaction coordinates can only be easily defined in this framework then this limitation should be specified.

      I can see that the simple additive noise makes the integrations in the calculation of the potential 486-499 easier, but again the limitations of this approach should be addressed either by pointing them out, or by considering a model with multiplicative noise.

      The most intriguing result to my mind is the existence of multiple reaction paths. I would like to see to what extent this is robust to e.g. multiplicative noise and other factors in the analysis.

      Thanks for these great points. One point we want to clarify. In our Langevin formulation, we do not assume additive noises, and the corresponding diffusion constant D is also positiondependent, as explained in Materials and Methods (1). In the revised manuscript we added the x-dependence of the noise terms to make it clear.

      References:

      1. Scheffer M, et al. (2009) Early-warning signals for critical transitions. Nature 461(7260):53-59.
    1. Author Response:

      Reviewer #1 (Public Review):

      [...]

      1. A notable shortcoming of the authors' interpretation is the generalization of their findings to preterm premature rupture of membranes (PPROM). As noted by the authors, term labor is considered a "sterile" process, which is particularly important in terms of the authors' findings since TLR4 in the fetal membranes may be responding to endogenous signals such as danger signals. However, a large proportion of PPROM cases are associated with microbial invasion of the amniotic cavity, and thus in this context TLR4 would be responding to bacterial products.

      To bring in some new elements and address this reviewer’s concern, along with the potential extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided first to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) from the revised version of the manuscript. To address this comment, it is well known that the percentage of PPROM associated with microbial invasion are variable based on the weeks of gestation. In fact, early gestational ages are clearly linked to high-microbial-associated intra-amniotic inflammation prevalence (64.3% when <25 WGA) whereas this percentage subsequently decreases throughout gestation (Romero et al., 2015), reaching one-third at term, which better links with the gestational stage of the current study. Such observations support the fact that the TLR4 model in physiological rupture could be transposed—at least in part—to sterile PPROM and initiated by the presence of alarmins (i.e., HMGB1) and their binding to such type of receptors. Indeed, TLR4 is now well described as being stimulated by ligands other than LPS, such as HMGB1, a member of the DAMPs (Robertson et al., 2020). Furthermore, the quantification of TLR4 mRNA expression and protein in the case of PPROM without chorioamnionitis compared with term no labor without chorioamnionitis was already carried out (Kim et al., 2004), indicating an absence of clear link between the chorioamnionitis and TLR4 expression. Finally, in an animal model of PPROM, an article underlined the importance of TLR4 in preterm labor by using TLR4 mice mutants in a sterile context (Wahid et al., 2015).

      1. It is a well-known concept that TLR4 is expressed by the fetal membranes and is responsive to LPS stimulation, and thus the confirmatory set of experiments performed by the authors do not seem to be as novel. Indeed, given that this study was focused on the "sterile" process of term labor, perhaps the utilization of danger signals that can interact with TLR4 would be more appropriate.

      The choice to use LPS (Figure 3C) was only to confirm that TLR4 leads to a proinflammation activation in the amnion and choriodecidua, demonstrating the functional pathway after TLR4 activation in the fetal membranes environment. We completely agree these are not novel data; this is why we decided to remove this part of results in the revised version of the manuscript. Furthermore, we decided to not repeat the use of DAMPs (such as HMGB1) to stimulate the TLR4 pathway in this work because it was already published in the fetal membranes context (Bredeson et al., 2014). To be in accordance with your comments, we have modified the end of the results paragraph entitled ‘Combination of transcriptomic and methylomic results in the ZAM zone demonstrate that genes more expressed in the choriodecidua are linked to pregnancy pathologies’ to better justify the choice to focus on TLR4 global transcriptional regulation.

      1. The distinction between the ZAM and ZIM seems to have been lost among the TLR4-focused experiments, and thus it is unclear how these fetal membrane zones fit into the conceptual model proposed by the authors in the final figure.

      The reviewer is correct here, so to avoid confusion between the ZIM and ZAM used, we decided to do the following:

      • Read carefully all the successive paragraphs of the results to check for the presence of ‘ZAM specification’
      • Add ‘ZAM’ in the legend of Figure 4. This information was present in the related text of the article.
      • Update Figure 7 and its legend (model of regulation). We had ‘ZAM zone’ in the discussion part regarding Figure 7.
      1. The study is largely descriptive and would benefit from the addition of fetal membrane tissues from pregnancy complications such as PPROM and/or animal models in which premature rupture of the membranes has been induced.

      We agree that animal models are available. Nevertheless, we considered that such models are far from the human reality. In fact, animal models are often used for fetal membrane studies, but they are different regarding pregnancy physiology, structure and uterine environment, which hamper their use. We used ‘term’ fetal membrane to decipher the physiological rupture of membrane and demonstrate the importance of the TLR4 actor. To bring some elements regarding this comment and the possible extrapolation between physiological rupture and pathological rupture in the case of PPROM, we decided to remove Figure 3C (expression of TLR4 in the presence of LPS from bacterial origin) to focus more on the physiological rupture of fetal membranes without the involvement of bacterial presence. Previous bibliographic data answer the reviewer’s question: Kim et al. (2004) well demonstrated that TLR4 mRNA levels are higher in PPROM (31.2 weeks of gestation) fetal membranes without chorioamnionitis than in term (39.1 week of gestation) ones without chorioamnionitis.

      1. The study focuses on the mechanisms of rupture of membranes, but does not provide an explanation as to how the regulation of TLR4 mediates the process of membrane rupture.

      We agree with your comment; however, ‘how the regulation of TLR4 mediates the process of membrane rupture’ is not the topic of the manuscript. In addition, this has already been well established in previous publications. Nevertheless, we added a sentence in the introduction part between the lines 97-100 : ‘The mechanisms implying TLR4 in the physiological or pathological rupture of membrane in case of PPROM are well known. Triggering TLR4 will lead to NFκB activation, leading to an increase of the release of proinflammatory cytokine, concentration of matrix metalloprotease and prostaglandin, which are well established actors of fetal membrane rupture (Robertson et al., 2020).

      Reviewer #2 (Public Review):

      This is a well-conceived and executed paper that adds novel data to improve our understanding of rupture of the human fetal membranes. The new information presented not only addresses gaps in our understanding of normal parturition mechanisms but also the significant issue of preterm birth. The authors highlight the need to understand the understudied human fetal membranes to be able to understand its role in normal parturition but also to lower the rates of preterm birth. They not only establish the need to study this tissue but also to improve our appreciation for regional differences within it, using a comprehensive genetic approach. The authors provide data from a genome wide methylation study and cross reference this with transcriptome data. Using this new knowledge, they then zero in on a specific gene of interest TLR4. This receptor is already established as an extremely important receptor for preterm birth but little is known about its role in normal parturition. Strengths of this paper stem from the comprehensive data set provided, answering both the questions pertaining to the specific aims of this paper but also potentially future questions and providing potential focused targets of study. One example of this may be the common methylated genes that are found in both the ZIM and ZAM, illustrating not regional changes but gestational programming of this tissue.

      We thank the reviewer for the positive and constructive comments regarding the article. Following all the reviewers’ comments, we now have an improved version.

      Reviewer #3 (Public Review):

      Manuscript by Belville et al describes the significance of epigenetic and transcription associated changes to TLR4 as a mechanistic event for sterile inflammation associated with fetal membrane weakening, specifically in the zone of altered morphology. This manuscript is timely in an understudied area of research.

      The authors have taken an extensive set of experiments to derive their conclusions.

      However, it is unclear why the focus is on TLR4. Although LPS is a ligand for TLR4, gram negative infections are rare in PPROM but mostly genital Mycoplasmas. The methylome and transcriptome analysis does not necessarily warrant examination of a single marker. A clear rationale would need to be included.

      We would like to thank the reviewer for their comments regarding the article. For the last part of the public review, we would like to underline the following:

      -The choice of focusing on TLR4 is explained in the article text between lines 161 and 165 by the following sentences: ‘Of all the genes classified in these processes, TLR4 was the only one represented in all these biological processes and, therefore, seems to play a central role in parturition at term. To validate this in-silico observation and pave the way for describing TLR4’s importance, immunofluorescence experiments were first conducted to confirm the protein’s presence in the amnion and choriodecidua of the ZAM (Figure 3B)’. Furthermore, this choice arises from analysis described in Figure 3A, which underlines that the four GO terms most represented have only one common gene: ‘TLR4’. The combination of two high-scale studies does not permit us to individually characterize how each gene is regulated. Nevertheless, the focus on TLR4 provides an original and interesting hypothesis on how a specific layer regulation between the amnion and choriodecidua could be cellular realised in the ZAM’s weaker zone. Finally, because the high-scale study results are public, this type of analysis could be conducted on other candidate genes.

      -Throughout the text, we changed all the ‘E. Coli’ to ‘Gram-negative bacteria’. Furthermore, as found in the literature, genital mycoplasma are considered ‘Gram-negative bacteria’. We focused on the ‘sterile inflammation phenomenon’, and to support the hypothesis concerning the importance of TLR4, we realised a supplementary transcriptome ‘ZAM heatmap’, which confirmed a sur-expression of DAMP in choriodecidua, S100A7, A8 and A9, for example, which are well-known ligands of TLR4 (given below as an image).

      Heatmap of genes differentially expressed in the ZAM zone in relation to the sterile inflammation phenomenon.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this interesting paper by Dady and colleagues the nature of human neural progenitor differentiation is evaluated via transplantation studies. The first part of the paper establishes the timing of neural differentiation in human IPSC model systems and in human embryonic spinal cord, showing that the relative timing of neurogenesis and gliogenesis is maintained. In the second part of the paper these human IPSC neural rosettes are transplanted into the chick spinal cord during neurogenic stages (i.e. Isochronic transplantation) and they find that neurons are generated by these transplanted populations. Analysis of transplants at later stages reveals that neurogenesis has "stalled" and is relatively reduced within the transplanted population.

      Overall, this is an interesting paper that uses classic approaches to answer potentially interesting questions, however there are some issues that limit it's potential impact. The first two figures are recursive and show that the authors can implement an existing protocol.

      Our previous work established this differentiation protocol (Verrier et al 2018 Development), but this is its first use to analyse by immunofluorescence the timing of neural differentiation and the appearance of specific neuronal and glial cell types in an in vitro neural rosette assay. None of the data presented is recursive and it provides a quantitative timeline of human dorsal spinal cord differentiation. Moreover, our comparison with the human embryonic spinal cord indicates that neural differentiation progresses more rapidly in vitro.

      The transplantation studies are intriguing but do not offer sufficient new insights. The key finding seems to be that at later stages post-transplantation neuronal differential is "stalled". There are many other reasons (besides "stalling") that could explain their results. Suppose that stalling was indeed occurring, the authors offer no cellular or molecular insights into what regulates these intrinsic differences across species. At the end, it is not sufficiently clear what we have learned about the mechanisms that control the timing (pace seems to be another term for timing) of differentiation in human neural stem cells.

      Our study shows that differentiation of isochronic transplantation of human iPSC neural rosettes into the chick embryonic spinal cord initially follows that of neural rosettes cultured in vitro, rather than that of the faster differentiating host chicken embryo. This suggests that the human cells follow an intrinsic differentiation programme. However, after a longer period of culture the transplanted rosettes now lag behind their in vitro counterparts, suggesting that intrinsic cues are insufficient and that appropriate extrinsic cues are needed to promote differentiation progression. We discuss potential reasons for this stall of the differentiation programme in the Discussion and agree that some further investigation is of interest. Moreover, revision experiments now further extend our findings.

      Reviewer #2 (Public Review):

      In general, this manuscript provides new significant knowledge by comparing between neural differentiation rate within the same species (human) in vivo and in vitro and between species (human and chick). The quality of the data is excellent, and the combining of the in vivo chick model to compare between grafted and host cells is a fantastic idea, that can only be done in this experimental model. Yet, some controls and more in-depth analysis are missing and are required in my opinion before publication.

      1. In the grating experiment, non-manipulated embryos serve as controls. Yet, the grafted rosettes are inserted near an injured area where a piece of the neural tube was moved. A better control would be to graft homologous cells from a donor chick embryo (GFP+ chick line is available in the UK) or quail embryo (which has a similar growing rate as chick at E2) and examining whether the injured area doesn't affect the grated cells to differentiate in a different pace as compared to the human grafts. This control is necessary to rule out the possibility that the human graft did not accelerate their differentiation rate and later stopped differentiating due to extrinsic signals/lack of signals form the manipulated environment.

      For clarity, we point out that the control is not an unmanipulated embryo, but an embryo subject to the same tissue removal experiment but lacking the graft. We found that these operated only embryos quickly regenerated lost tissue to such an extent that the neural tube appeared similar to the unoperated contralateral side of the neural tube after 2 days. We further note many previous studies using quail tissue in place of chick to fate map the embryo (for example the many excellent studies of Nicole Le Douarin) which suggest that such manipulations result in normal differentiation of the grafted tissue and so are unaffected by placement into an injury site in a developing embryo. However, we appreciate that it would be informative to demonstrate this for our precise experiment and in our hands.

      1. When examining the entire results of the manuscript some important points need to be addressed: On the one hand, the rosettes correspond to their in vitro growth conditions/extrinsic cues and display an accelerated differentiation pace, when compared to their in vivo counterpart human cells. On the other hand, the rosettes do not correspond initially to the chick environment and maintain their own intrinsic tempo.

      The human rosettes were matched for differentiation state with the tissue removed from the chick embryo, despite this and the local cues that support the chicken spinal cord development beyond this point, the human cells retained the differentiation timing of their in vitro counterparts rather than that of the host chick embryo. This is consistent with the slower cell cycle and cell metabolism characteristic of human cells.

      Later, they do change their developmental program and attenuate their differentiation. Therefore, the conclusion that the cells mostly obey to intrinsic regulation is confusing. It would be great if the authors could provide better experimental data to confirm their conclusion. Some ideas that the authors may consider are to determine whether there is a time window that sets the tempo of the rosettes that cannot be influenced later by extrinsic cues. Will the grafted cells correspond differently whether they would be grafted at a more/less advanced stages and domains? Is there an initial mechanistic elucidation to the different behavior of spinal cord progenitors in the three contexts? Is there a possibility somehow to obtain human spinal cord progenitors and grow them in the same in vitro conditions as the rosettes to compare their differentiation rate? I am aware that some of these experiments are very hard to perform and not expecting the authors to perform all the suggested ones, yet, some more in-depth analysis would enable this article to explain better the presented observations.

      These are interesting suggestions, heterochronic grafting of the human neural rosettes, for example into the same site in an older chicken embryo would further test whether they continue to operate an intrinsic differentiation programme in this temporary distinct embryonic environment.

      Reviewer #3 (Public Review):

      The authors have developed dorsal spinal cord rosette assays from human pluripotent stem cells (hPSCs) and also from human induced pluripotent stem cells (hiPSCs) in a minimal culture medium containing retinoic acid. They define the dorsal spinal cord identity of these cells based on the presence of SOX2, PAX6, SNAI2 and PAX7, and absence of OLIG2 (characteristic of more ventral neural tube). Assessment of markers for migrating neural crest-like cells (HNK1, SOX10 and TFAP2alpha), immature neurons (DCX) and glial progenitors (NFIA) at different time points was used to show that the in vitro model recapitulates sequential differentiation observed in the spinal cord of avian and mouse embryos. Next, by comparing these results with neural differentiation in the human embryo, the authors show that neural differentiation occurs faster in vitro than in vivo. The authors then asked how these hiPSC-derived neural rosettes would respond to the more rapidly developing chicken embryonic environment, by grafting the rosettes into the developing chick neural tube. By assessing expression of various neural markers in the graft-derived cells, authors conclude that after two days of culture, human cells continued differentiation at the rate of the in vitro hiPSCs rather than at the rate of the host chicken cells. After longer culture (5 days), authors say that neurogenesis rate among graft-derived human cells attenuates and that the cells stall in the neural progenitor phase. Authors conclude that while initially an intrinsic differentiation programme is followed by the human cells, appropriate extrinsic inputs are required to maintain the neural differentiation trajectory of human cells.

      However, it is difficult to assess whether all conclusions by the authors for the human-into-chicken graft experiments are supported by their data, as some details of analysis are unclear (1) or experimental design was not conducive to the questions being asked (2). Some aspects of data analysis therefore need to be clarified and extended.

      1. Position of graft derived cells within the chicken host is very important when analysing presence/absence of a marker, but it is not always clear whether this has been taken into account by the authors. It appears that authors are assessing expression of markers in graft derived cells that are present outside OR inside domains in the chick host that would normally express that marker, and are not separating out such analysis. This will confuse interpretation of results and affect conclusions.

      One example where this would affect major conclusions of the manuscript is in the case of Islet-1 expression in human graft derived cells in the chicken host. Authors say that no Islet-1 was found in single graft derived cells in the chick embryo after two days of culture and use this to support their conclusion that the "pace of neural differentiation in the grafted human rosettes is unaltered in a more rapidly differentiating environment". However, Islet-1 expression in the chick is restricted to specific domains, therefore it would be important to know whether the graft-derived cells that the authors were analysing were within these Islet-1 positive host domains. Lack of Islet-1 in graft derived cells within such Islet-1 positive domains in chick would suggest that the graft derived cells have not responded to the host's timing of differentiation, and would support the authors' conclusions. However, lack of Islet-1 in graft derived cells outside of such Islet-1 positive domains could not be used to conclude the same thing as cells would be receiving different signals from the host. It appears that the graft used by the authors to show absence of Islet-1 in Fig 4G is outside of chick Islet-1 positive domains. Therefore, lack of Islet-1 in graft derived cells cannot be used to suggest that pace of human neural differentiation is initially directed by cell intrinsic factors, unless the location of the human cells in the chick is clearly shown to be within Islet-1 expressing domains in the chick.

      Human rosette cells were grafted into the chicken neural tube following removal of the dorsal half of the host neural tube at E2 and grafted cells were assessed in at least 3 sections from grafts in 3 different embryos for each marker analysed (see meta-data tables S1-7 and Methods). The reviewer is correct in that in a subset of sections graft cells were not in precisely the same position as chick endogenous Islet-1 expressing cells. We can provide the data which just includes only those with cells in this precise domain (3 sections from 3 different graft embryos), but also note that none of the sections analysed included human cells expressing ISLET1. This is the only marker analysed where this is an issue, other neuronal markers, such as P27 are expressed throughout the dorsal extent of the neural tube.

      1. Size of the graft used when transplanting human iPSCs into the chick will also affect the interpretation of results, as human cells will be exposed to varying levels of host signal depending on how much of their surface is exposed to host cells. Since the authors are using this experiment to test the effects of the chicken environment on human cells, this is a crucial point. After grafting hiPSC derived neural rosettes into the chick and culturing the chick embryo, authors assess expression of various markers in the graft-derived cells and separate out their analysis of marker expression across three different categories; cells found in 'cell rosettes', 'cell groups' or as 'single cells'. However, it remains unknown for how long these groupings were true during the culture time. For example, while it is known that at the time of grafting the cells were in a rosette structure, it is unknown at what time cells detached to incorporate as single cells (it could have been directly after grafting, or just prior to analysis) and is therefore not consistent across cells being analysed.

      One way to go around this would be not to graft the entire rosettes, but rather to dissociate the rosette and graft single cells/small groups of cells into the chick. With single cells the community effect (Gurdon 1988) would be avoided and the experiment would be testing the influence of only the host environment on this cell (rather than a combined influence of host environment and environment created by neighbouring graft derived cells as is the case in the current manuscript). This is particularly important as the data presented in the manuscript appear to show a difference between marker expression in single cells versus groups of cells and rosettes (plots in Fig 4 and 5).

      Details of rosette graft preparation are provided in the paper and this includes a gentle cell dissociation step, so we grafted human rosette cells that then reformed a rosette structure (which may reflect that human cells have greater affinity for each other), and some single cells were also initially available for insertion into the chicken neuroepithelium. It is likely that this cell mixing takes place early on while the chick dorsal neural tube reforms following the operation. For this reason, we analysed cell type specific markers in the human cells in large cell groups (reformed rosettes), smaller cell groups incorporated into the chick dorsal neural tube and single cells within this chick neuroepithelium. We appreciate that without the ability to monitor single cells throughout the experiment it is not possible to account fully for the environment experienced by a grafted cell. We agree that smaller grafts or other approaches may increase the number of cases of single human cells surrounded by chick neuroepithelial cells. We note the reviewer has taken up our consideration of the Community effect in the paper, which is of course why we have analysed marker expression in the three cell configurations. We also make clear in the paper that the apparent increase in P27 expression in single cells is not statistically significant and that this reflects the small number of single/isolated human cells within the chick neuroepithelium available for analysis (see metadata provided).

    1. Author Response:

      Reviewer #1 (Public Review):

      My main concern relates to the title, which does not appear to be supported by the data. One can't conclude that the reported effects are strictly due to altered glycolysis in cholinergic neurons without directly assessing glucose metabolism in these neurons. Moreover,TIGAR functions by blocking glycolysis and directing the pathway into the pentose phosphate shunt. Therefore, the resulting effect of deleting TIGAR in a neuronal population might be multiple.

      The authors show convincingly that deleting TIGAR from ChAT-expressing neurons, but not adipose or muscle cells, protects mice from cold-induced hypothermia. It is however unclear whether this leads to alteration in energy expenditure per se. This it important considering the first argument of the discussion highlighting how approaches to increase energy expenditure through the development/activation of brown/beige adipose tissue thermogenesis have failed. Moreover, it is unclear if TIGAR also affects heat dissipation considering the impact of its deletion from ChAT-expressing neurons on blood pressure and heart rate, two parameters that will likely influence the tail vasoactivity. Evaluating energy expenditure and heat loss appears to be necessary to support the conclusion that the resistance to hypothermia is exclusively dependent on shivering thermogenesis.

      One key aspect that may deserve discussion is a potential contribution of the sympathetic nervous system to the observed phenotype. The focus of the manuscript is on acetylcholine but one can't disqualify that sympathetic compensations may happen following the deletion of TIGAR in ChAT-expressing neurons.

      There are many data that are not shown but that would worth be included (lines 99, 113, 119, 159, 168, 181, 221,

      1. We have changed the title to better reflect the specific findings in this study.
      2. We now present in the Discussion section the potential roles of other mechanisms besides cholinergic signaling (sympathetic, vascular, behavioral) that could also contribute to temperature regulation in this model system.
      3. We have now included some of the data that was originally indicated as data not shown but have eliminated some of these data from the text as they are superfluous and do not provide important information for any of the conclusions drawn.

      Reviewer #3 (Public Review):

      Strengths: The study is nicely written and presented. The investigation of whole-body TIGAR knockout (TKO) clearly demonstrates resistance to cold exposure, and the authors logically follow potential sources through the obvious tissue candidates.

      Both skeletal muscle and adipose specific TIGAR knockouts were generated, neither of which recapitulated the effect of the TKO. Other obvious candidates, such as UCP1 content in adipose and basal oxidative capacity and contractility of skeletal muscle were ruled out using ex vivo techniques.

      Nevertheless, pharmacological interventions indicated that muscle contraction was necessary for protection from cold exposure and that the loss of TIGAR overcame competitive antagonism of the nicotinic acetylcholine receptor. These data were supportive of a role for skeletal muscle contraction, particularly at the level of cholinergic signaling.

      A cholinergic neuron specific TIGAR knockout was produced. Loss of TIGAR was molecularly confirmed, and this mouse recapitulated the whole-body knockout's resistance to cold exposure.

      Tracer studies are largely compelling and confirm that loss of TIGAR increases substrate dependence on glucose oxidation in a cell model.

      Weaknesses: The TKO mice were not characterized for body weight, body composition or energy expenditure, leaving some room for alternative or additive mechanisms.

      Although the tracer data demonstrate that loss of TIGAR causes the cell model to increase reliance on glycolysis compared to other unlabeled substrates, the data do not necessarily demonstrate an increase in the absolute rate of glycolysis or total acetyl-CoA production as intimated in the discussion. It is also unclear why media glutamate is examined for tracer incorporation rather than tissue glutamate.

      There are some minor weaknesses related to the description of the methods. For example, the 18O studies need clarification. It will be unclear to most readers how this method works.

      1. We now include body composition, food intake, activity and energy expenditure data in new Figures S1D-H.
      2. Following the stable isotope label from 1,2- 13C glucose into glutamate was used in these tracer analyses to non-invasively assess the differences in carbon flux between pyruvate carboxylase and pyruvate dehydrogenase, allowing us to use the cells for assessment of acetyl CoA and acetyl carnitine in the same experiment. This media tracer data indicates an increase in PDH flux (m1) in TKO cells compared to that in control cells, which, along with the corresponding cellular data for acetyl-CoA and acetyl-carnitines levels, all elevated in the TKO SH-SY5Y cells that are also consistent with an increase rate of glycolysis (new Figure S7C and D).
      3. We have further clarified the methods for the use of 18O labeled water.
    1. Author Response:

      Reviewer #2:

      In this manuscript, Ng et al., report on a system where cardiac mesoderm and pulmonary endoderm co-develop from pluripotent stem cells. This is of potential interest, as it could provide an integrated model for the study of human cardiopulmonary development.

      The main weakness lies in the lack of thorough characterization of the resulting cells and tissues. The characterization relies almost entirely on reporter gene expression and PCR for a limited set of markers. The only indication that ATII cells are generated is expression of a SPC-dTomato reporter and SFTPC mRNA. No evidence is given of function, of expression of other markers or direct staining for SPC, or of ultrastructure. No data are provided whether the lung component contains other lung cells. Another outstanding question for the lung component is whether any pulmonary mesenchyme was generated.

      Thank you for the suggestion. In the revised manuscript, we have included further cellular characterization of the 3D µTs. We included additional characterization for alveolar type 2 (AT2) cells, including a direct immunofluorescence staining of Pro-SPC and transmission electron microscopy imaging of the lamellar bodies (Fig. 6). Besides AT2 cells, we also identified the emergence of AT1-like cells via the expression of HOPX (Fig. 6b). To characterize cell types beyond the alveolar epithelium, we observed positive staining for S100A4, which is a marker for mesenchyme in the µTs (Figure 6-figure supplement 1a). In the meantime, we did not detect any proximal airway epithelial cell types, such as cilia cells (FOXJ1), secretory cell (MUC5AC), and basal cells (p63) (Figure 6-figure supplement 1b-d).

      The same is true for the cardiac component. Which types of cardiac cells are generated: ventricular, atrial, endocardium, epicardium, conducting tissue? No benchmarking was done compared to either human tissues or similar cells generated using more focused differentiation protocols, and functional studies are very limited.

      We agree with the reviewer’s perspective that the present study was primarily focused on progenitor specification. Nonetheless, in the revised manuscript, we have provided additional characterization of the induced cardiac tissues via immunofluorescence staining of Sarcomeric Alpha Actinin. Further, we have included new data on assessing the cardiac contractile function using a calcium channel blocker (Verapamil), showing reduced contractility in response to increasing concentrations of Verapamil (Fig. 6e).

      Another weakness is that there is no characterization of early intermediate developmental stages: primitive streak, mesendoderm, definitive endoderm, cardiac mesoderm, first or second heart field. This type of analysis would be required to validate this complex model as an approach to study human cardiopulmonary development.

      Thank you for pointing this out. In the revised manuscript, we have added a new figure (Figure 1-figure supplement 1) to include data on characterizing the presence of primitive streak by staining for T (Brachyury) after 2 days CHIR treatment. We also showed the presence of the mesendodermal marker (MIXL1), endodermal marker (SOX17) and mesodermal marker (NCAM1) during Stage-1 co-differentiation, as indicated by qPCR and immunostaining (Fig. 1).

      There is also no quantification of differentiation efficiency and yield, and neither are data shown to document absence or presence of other endodermal or mesodermal lineages. NKX2.1, for example is also expressed in the forebrain and in the thyroid.

      Thank you for the suggestion. In the revised manuscript, we have included FACS analysis of Day-15 differentiated cells to quantify the percentage of NKX2.1+ lung and NKX2.5+ cardiac progenitor cell populations. To assess the possibility of other related endodermal and neuronal cell populations, we have included new data on characterizing the Day-15 differentiated cells and showed no co-expression of NKX2.1 with TUJ1 (neuronal marker) or PAX8 (thyroid marker), thus, further supporting the observed NKX2.1+ cells representing the lung lineage (Figure 2-figure supplement 1).

      A final limitation is that multiple pluripotent line should be used.

      In the revised manuscript, we have provided a comprehensive characterization of applying the co-differentiation protocol to another hiPSC line (BU1), including germ layer induction, cardio-pulmonary progenitor induction, 3D organoid formation, and alveolar maturation (Figure 4-figure supplement 5). We have included the data for mesoderm and endoderm induction during Stage-1 (Figure 4-figure supplement 5b) and cardio-pulmonary µT formation from Day-15 progenitor cells. On Day-18, we showed that BU1-derived cardio-pulmonary µTs were stained positive for NKX2.1 and NKX2.5 as what we have observed in BU3. These µTs were also able to further mature into distal lung epithelial cells as indicated by positive staining of SFTPC and HOPX. Meanwhile, the NKX2.5+ cardiac lineages expressed cTnT and Sarcomeric Alpha Actinin ( Figure 4-figure supplement 5e).

      This type of model could be very useful, but it not clear that the goal of integrated cardiopulmonary development was achieved.

      We thank the reviewer for the comment. The following findings from this study suggests that an in vitro hiPSC-based integration of cardio-pulmonary development is possible. First, we showed that following establishment of a mixture of endoderm and mesoderm, the same set of signaling molecules were capable of inducing parallel induction of endoderm-to-pulmonary and mesoderm-to-cardiac specification, echoing their close spatial coordinates with embryonic body patterning and shared requirement of paracrine signaling. Second, we showed that in the presence of cardiac accompaniment, alveolar maturation was expedited, implying inter-lineage crosstalk between the co-developing cardio-pulmonary systems. In the meantime, we agree to the overall suggestion from reviewers that this study is primarily focusing more on cardio-pulmonary progenitor specification, and future investigations are needed to further clarify the mechanism and outcome of integrated cardio-pulmonary co-development. We have added this clarification in our revised manuscript.

      Reviewer #3:

      Ng and Johnston et al. reported the successful multilineage co-differentiation of mesoderm-derived cardiac and endoderm-derived lung progenitors from human pluripotent stem cells (hPSCs). The authors achieved their goals through a stepwise strategy built on the knowledge from published cardiac and lung differentiation protocols. The authors first employed WNT activation using GSK3 inhibitor CHIR, an established WNT signaling agonist, at relatively high dosage to induce primitive streak formation from hPSCs maintained in pluripotent medium (days 1-2). This is supported by knowledge from vertebrate development that both mesodermal and endodermal germ layers are patterned by primitive streak. This is also consistent with recent findings by Martyn et al. (PMID 29795348, https://doi.org/10.1038/s41586-018-0150-y) that activation of WNT signaling is sufficient to induce primitive streak from hPSCs. In the subsequent step (days 2-4), the newly formed primitive streak provides a gradient of endogenous WNT, BMP and Nodal/Activin signaling, which allows the co-induction of both mesoderm and definitive endoderm (DE) from the remaining hPSCs in culture in a serum and morphogen free differentiation medium. Consistently, high Nodal (by exogenous Activin A) favors endodermal induction at the expense of mesodermal specification, and medium-high exogenous BPM4 is detrimental to lung endodermal specification but enhances cardiac mesodermal differentiation. The authors then demonstrated that dual TGF and WNT inhibition is efficient to pattern the mesoderm and endoderm simultaneously for future cardiac and lung induction (days 4-8). This agrees with the existing knowledge that lungs derive from anterior foregut endoderm, and cardiac progenitors, the major substance of heart, derive from cranial lateral mesoderm. Mesoderm and DE patterning was followed by lung and heart specification through the activation of WNT and RA signaling exogenously, in the presence of endogenous BMP4 signaling (days 8-15).

      The differentiation strategy developed by the authors follows the lung and cardiac developmental paradigm overall, the protocol yields efficient lung and heart progenitor specification on the tested hiPSC line. The work provides a new insight into cardiac and lung directed differentiation, and offers a valuable platform to study human heart and lung development in vitro. For cardiac and pulmonary progenitor differentiation (days 4-15), the protocol described in this manuscript relies mainly on the exogenous application of common key developmental signal events shared by heart and lung specification from meso- and endo- derms, respectively. For progenitor maturation (post day 15), the data shows expedite alveolar maturation process in cardio-pulmonary co-differentiation culture, suggesting paracrine signal(s) from cardiac cells positively regulate alveolar maturation. The authors did not report any data on whether/how paracrine signal(s) from lung lineages may influence cardiac maturation. The authors achieved their goals, and the results support the conclusion of the paper overall.

      We agree to the reviewer’s point of view that the present study was primarily focused on progenitor cell induction and the maturation of the pulmonary lineage. In response to the reviewer’s comment, we have provided additional discussion suggesting how this model can be used to further investigate how paracrine signals from the lung lineage may influence cardiac maturation. Also, thank you for suggesting the reference (PMID19795348), we have added it to the manuscript.

      The weaknesses of manuscript are: 1) Lack of evidence/characterization of primitive streak formation at 48 hours of differentiation. 2) Lack of a thorough characterization of the composition of the entire differentiation culture at progenitor stage (day 15): it is very likely that there are pulmonary mesenchymal/mesodermal cells generated in the differentiation culture, besides cardiac mesoderm. The pulmonary mesenchyme may not be abundant in quantity but it plays critical roles in promoting alveolar maturation that the authors observed at day 18 of co-differentiation culture. Before drawing a conclusion, the authors must examine rigorously whether alveolar maturation was promoted by cardiac mesoderm or pulmonary mesoderm.

      We thank the reviewer for bringing this to our attention. In the revised manuscript, we have provided additional data on characterizing the primitive streak at 48 hours of differentiation (Figure 1-figure supplement 1), as well as on characterizing Day-15 differentiated cells using FACS analysis (Figure 2-figure supplement 1a-c). Further, as the reviewer pointed out, it is possible that supporting function maybe coming from additional mesodermal lineages aside from cardiac mesoderm. This has been demonstrated in the mouse model by Peng et al. that the heart served as a reservoir of cardiac and pulmonary mesenchymal cells that play a major role in lung development. In the revised manuscript, we have also added staining of S100A4, a marker for mesenchyme, in the 3D µT (Figure 6-figure supplement 1), as well as an additional discussion (line 556-562) on future studies needed to further assess the regulation of alveolar maturation by cardiac mesoderm or pulmonary mesoderm.

      3) The paper can benefit from providing mechanistic insights into whether/how alveolar maturation medium (CDCIK, days 15-18, and KDCI days 18-25) influenced the downstream cardiac lineage fate specification from the cardiac progenitors. Besides contracting/beating cardia cells, are there any other type(s) of cardiac lineages present in d25 culture? Do the cardiac progenitors generated by this protocol mainly represent cells from primary heart field? Is there any second heart field potential?

      We thank the reviewer for the comments. We agree to the overall comments from the editor and reviewers that the present study was primarily focused on the induction of cardiac and pulmonary progenitors. We also agree with the reviewer that further investigation and understanding of the cellular composition of cardiac-related lineages is needed. Related to this comment, we found that CHIR within the CKDCI was inhibitory for cardiac contraction, which would not initiate until the removal of CHIR, which is consistent with prior studies where they show that GSK-3β inhibition promotes expansion of cardiomyocytes but causes disorganized myofibrillar architecture. We have provided additional related discussion in the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] Recently, pupil dilation was linked to cholinergic and noradrenergic neuromodulation as well as cortical state dynamics in animal research. This work adds substantially to this growing research field by revealing the temporal and spatial dynamics of pupil-linked changes in cortical state in a large sample of human participants.

      The analyses are thorough and well conducted, but some questions remain, especially concerning unbiased ways to account for the temporal lag between neural and pupil changes. Moreover, it should be stressed that the provided evidence is of indirect nature (i.e., resting state pupil dilation as proxy of neuromodulation, with multiple neuromodulatory systems influencing the measure), and the behavioral relevance of the findings cannot be shown in the current study.

      Thank you for your positive feedback and constructive suggestions. We are especially grateful for the numerous pointers to other work relevant to our study.

      1. Concerning the temporal lag: The authors' uniformly shift pupil data (but not pupil derivative) in time for their source-space analyses (see above). However, the evidence for the chosen temporal lags (930 ms and 0 ms) is not that firm. For instance, in the cited study by Reimer and colleagues [1] , cholinergic activation shows a temporal lag of ~ 0.5 s with regard to pupil dilation - and the authors would like to relate pupil time series primarily to acetylcholine. Moreover, Joshi and colleagues [2] demonstrated that locus coeruleus spikes precede changes in the first derivative of pupil dilation by about 300 ms (and not 0 ms). Finally, in a recent study recording intracranial EEG activity in humans [3], pupil dilation lagged behind neural events with a delay between ~0.5-1.7s. Together, this questions the chosen temporal lags.

      More importantly, Figures 3 and S3 demonstrate variable lags for different frequency bands (also evident for the pupil derivative), which are disregarded in the current source-space analyses. This biases the subsequent analyses. For instance, Figure S3 B shows the strongest correlation effect (Z~5), a negative association between pupil and the alpha-beta band. However, this effect is not evident in the corresponding source analyses (Figure S5), presumably due to the chosen zero-time-lag (the negative association peaked at ~900 ms)).

      As the conducted cross-correlations provided direct evidence for the lags for each frequency band, using these for subsequent analyses seems less biased.

      This is an important point and we gladly take the opportunity to clarify this in detail. In essence, choosing one particular lag over others was a decision we took to address the multi-dimensional issue of presenting our results (spectral, spatial and time dimensions) and fix one parameter for the spatial description (see e.g. Figure 4). It is worth pointing out first that our analyses were all based on spectral decompositions that necessarily have limited temporal resolutions. Therefore, any given lag represents the center of a band that we can reasonably attribute to a time range. In fact, Figure 3C shows how spread out the effects are. It also shows that the peaks (troughs) of low and high frequency ranges align with our chosen lag quite well, while effects in the mid-frequency range are not “optimally” captured.

      As picking lags based on maximum effects may be seen as double dipping, we note that we chose 0.93 sec a priori based on the existing literature, and most prominently based on the canonical impulse response of the pupil to arousing stimuli that is known to peak at that latency on average (Hoeks & Levelt, 1993; Wierda et al. 2012; also see Burlingham et al.; 2021). This lag further agrees with the results of reference [3] cited by the reviewer as it falls within that time range, and with Reimer et al.’s finding (cited as [1] above), as well as Breton-Provencher et al. (2019) who report a lag of ~900 ms sec (see their Supplementary Figure S8) between noradrenergic LC activation and pupil dilation. Finally, note that it was not our aim to relate pupil dilations to either ACh or NE in particular as we cannot make this distinction based on our data alone. Instead, we point out and discuss the similarities of our findings with time lags that have been reported for either neurotransmitter before.

      With respect to using different lags, changing the lag to 0 or 500 msec is unlikely to alter the reported effects qualitatively for low- and high frequency ranges (see Figure 3C), as both the pupil time series as well as fluctuations in power are dominated by very slow fluctuations (<< 1 Hz). As a consequence, shifting the signal by 500 msec has very little impact. For comparison, below we provide the reviewer with the results presented in Figure 4 but computed based on zero (Figure R1) and 500-msec (Figure R2) lags. While there are small quantitative differences, qualitatively the results remain mostly identical irrespective of the chosen lag.

      Figure R1. Figure equivalent to main Figure 4, but without shifting the pupil.

      In sum, choosing one common lag a priori (as we did here) does not necessarily impose more of a bias on the presentation of the results than choosing them post-hoc based on the peaks in the cross-correlograms. However, we have taken this point as a motivation to revise the Results and Methods sections where applicable to strengthen the rationale behind our choice. Most importantly, we changed the first paragraph that mentions and justifies the shift as follows, because original wording may have given the false impression that the cross-correlation results influenced lag choice:

      “Based on previous reports (Hoeks & Levelt, 1993; Joshi et al., 2016; Reimer et al., 2016), we shifted the pupil signal 930 ms forward (with respect to the MEG signal). We introduced this shift to compensate for the lag that had previously been observed between external manipulations of arousal (Hoeks & Levelt, 1993) as well as spontaneous noradrenergic activity (Reimer et al., 2016) and changes in pupil diameter. In our data, this shift also aligned with the lags for low- and high-frequency extrema in the cross-correlation analysis (Figure 3B).”

      Figure R2. Figure equivalent to main Figure 4, but with shifting the pupil with respect to the MEG by 500 ms.

      Related to this aspect: For some parts of the analyses, the pupil time series was shifted with regard to the MEG data (e.g., Figure 4). However, for subsequent analyses pupil and MEG data were analyzed in concurrent 2 s time windows (e.g., Figure 5 and 6), without a preceding shift in time. This complicates comparisons of the results across analyses and the reasoning behind this should be discussed.

      The signal has been shifted for all analyses that relate to pupil diameter (but not pupil derivative). We have added versions of the following statement in the respective Results and Methods section to clarify (example from Results section ‘Nonlinear relations between pupil-linked arousal and band-limited cortical activity’):

      “In keeping with previous analyses, we shifted the pupil time series forward by 930 msec, while applying no shift to the pupil derivative.”

      1. The authors refer to simultaneous fMRI-pupil studies in their background section. However, throughout the manuscript, they do not mention recent work linking (task-related) changes in pupil dilation and neural oscillations (e.g., [4-6]) which does seem relevant here, too. This seems especially warranted, as these findings in part appear to disagree with the here-reported observations. For instance, these studies consistently show negative pupil-alpha associations (while the authors mostly show positive associations). Moreover, one of these studies tested for links between pupil dilation and aperiodic EEG activity but did not find a reliable association (again conflicting with the here-reported data). Discussing potential differences between studies could strengthen the manuscript.

      We have added a discussion of the suggested works to our Discussion section. We point out however that a recent study (Podvalny et al., https://doi.org/10.7554/eLife.68265) corroborates our finding while measuring resting-state pupil and MEG simultaneously in a situation very similar to ours. Also, we note that Whitmarsh et al. (2021) (reference [6]) is actually in line with our findings as we find a similar negative relationship between alpha-range activity in somatomotor cortices and pupil size.

      Please also take into account that results from studies of task- or event-related changes in pupil diameter (phasic responses) cannot be straightforwardly compared with the findings reported here (focusing on fluctuations in tonic pupil size) , due to the inverse relationship between tonic (or baseline) and phasic pupil response (e.g. Knapen et al., 2016). This means that on trials with larger baseline pupil diameter, phasic pupil dilation will be smaller and vice versa. Hence, a negative relation between the evoked change in pupil diameter and alpha-band power can very well be consistent with the positive correlation between tonic pupil diameter and alpha-band activity that we report here for visual cortex.

      In section ‘Arousal modulates cortical activity across space, time and frequencies’ we have added:

      “Seemingly contradicting the present findings, previous work on task-related EEG and MEG dynamics reported a negative relationship between pupil-linked arousal and alpha-range activity in occipito-parietal sensors during visual processing (Meindertsma et al, 2017) and fear conditioning (Dahl et al. 2020).Note however that results from task-related experiments, that focus on evoked changes in pupil diameter rather than fluctuations in tonic pupil size, cannot be directly compared with our findings. Similar to noradrenergic neurons in locus coeruleus (Aston-Jones & Cohen, 2005), phasic pupil responses exhibit an inverse relationship with tonic pupil size (Knapen et al., 2016). This means that on trials with larger baseline pupil diameter (e.g. during a pre-stimulus period), the evoked (phasic) pupil response will be smaller and vice versa. As a consequence, a negative correlation between alpha-band activity in the visual cortex and task-related phasic pupil responses does not preclude a positive correlation with tonic pupil size during baseline or rest as reported here. In line with this, Whitmarsh et al., 2021 found a negative relationship between alpha-activity and pupil size in the somatosensory cortex that agrees with our finding. Although using an event-related design to study attention to tactile stimuli, this relationship occurred in the baseline, i.e. before observing any task-related phasic effects on pupil-linked arousal or cortical activity.”

      In section ‘Arousal modulation of cortical excitation-inhibition ratio’ we have added: “The absence of this effect in visual cortices may explain why Kosciessa et al. (2021) found no relationship between pupil-linked arousal and spectral slope when investigating phasic pupil dilation in response to a stimulus during visual task performance. However, this behavioral context, associated with different arousal levels, likely also changes E/I in the visual cortex when compared with the resting state (Pfeffer et al., 2018).”

      Finally, in the Conclusion we added (note: ‘they’ = the present results): “Further, they largely agree with similar findings of a recent independent report (Podvalny et al., 2021).”

      Related to this aspect: The authors frequently relate their findings to recent work in rodents. For this it would be good to consider species differences when comparing frequency bands across rodents and primates (cf. [7,8]).

      Throughout our Results section we have mainly remained agnostic with respect to labeling frequency ranges when drawing between-species comparisons, and have only reverted to it as a justification for a dimension reduction for some of the presented analysis. Following your comment however, we have phrased the following section in the Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, more carefully:

      “The low-frequency regime referred to in rodent work (2—10Hz; e.g., McGinley et al., 2015) includes activity that shares characteristics with human alpha rhythms (3—6Hz; Nestogel and McCormick, 2021; Senzai et al. 2019). The human equivalent however clearly separates from activity in lower frequency bands and,here, showed idiosyncratic relationships with pupil-linked arousal.”

      1. Figure 1 highlights direct neuromodulatory effects in the cortex. However, seminal [9-11] and more recent work [12,13] demonstrates that noradrenaline and acetylcholine also act in the thalamus which seems relevant concerning the interpretation of low frequency effects observed here. Moreover, neural oscillations also influence neuromodulatory activity, thus the one-headed arrows do not seem warranted (panel C) [3,14].

      This is a very good point. First, we would like to note that we have extended on acknowledging thalamic contributions to low-frequency (specifically alpha) effects in response to the Reviewer’s point 11 (‘Recommendations for authors’ section below). Also, we have added a reference to the role of potential top-down (reverse) influences to our Discussion, section ‘Arousal modulates cortical activity across space, time and frequencies’, as follows:

      “Further, we note that our analyses and interpretations focus on arousal-related neuromodulatory influences on cortical activity, whereas recent work also supports a reverse “top-down” route, at least for frontal cortex high-frequency activity on LC spiking activity (Totah et al., 2021).”

      Ultimately, however, we decided to leave the arrows in Figure 1C uni-directional to keep in line with the rationale of our research that stems mostly from rodent work, which also emphasises the indicated directionality. Also, reference [3] is highly interesting for us because it actually aligns with our data: The authors show that a spontaneous peak of high-frequency band activity (>70 Hz) in insular cortex precedes a pupil dilation peak (or plateau) in two of three participants by ~500msec (which mimics a pattern found for task-evoked activity; see their Figure 5b/c). We find a maximum in our cross-correlation between pupil size and high frequency band activity (>64 Hz) that indicates a similar lag (see our Figure 3B). Importantly, both results do not rule out a common source of neuromodulation for the effects. We have added the following to the end of the section ‘An arousal-triggered cascade of activity in the resting human brain’:

      “In fact, Kucyi & Parvizi (2020) found spontaneous peaks of high-frequency band activity (>70 Hz) in the insular cortex of three resting surgically implanted patients that preceded pupil dilation by ~500msec - a time range that is consistent with the lag of our cross-correlation between pupil size and high frequency (>64Hz) activity (see Figure 3B). Importantly, they showed that this sequence mimicked a similar but more pronounced pattern during task performance. Given the purported role of the insula (Menon & Uddin, 2015), this finding lends support to the idea that spontaneous covariations of pupil size and cortical activity signal arousal events related to intermittent 'monitoring sweeps' for behaviourally relevant information.”

      1. In their discussion, the authors propose a pupil-linked temporal cascade of cognitive processes and accompanying power changes. This argument could be strengthened by showing that earlier events in the cascade can predict subsequent ones (e.g., are the earlier low and high frequency effects predictive of the subsequent alpha-beta synchronization?)-

      We added this cascade angle as one possible interpretation of the observed effects. We fully agree that this is an interesting question but would argue that this would ideally be tested in follow-up research specifically designed for that purpose. The suggested analysis would add a post-hoc aspect to our exploratory investigation in the absence of a suitable contrast, while also potentially side-tracking the main aim of the study. We have revised the language in this section and added the following changes (bold) to the last paragraph to emphasise the speculatory aspect, and clarify what we think needs to be done to look into this further and with more explanatory power.

      “The three scenarios described here are not mutually exclusive and may explain one and the same phenomenon from different perspectives. Further, it remains possible that the sequence we observe comprises independent effects with specific timings. A pivotal manipulation to test these assumptions will be to contrast the observed sequence with other potential coupling patterns between pupil-linked arousal and cortical activity during different behavioural states.”

    1. Author Response

      Reviewer #1 (Public Review):

      As far as I can tell, the input to the model are raw diffusion data plus a couple of maps extracted from T2 and MT data. While this is ok for the kind of models used here, it means that the networks trained will not generalise to other diffusion protocols (e.g with different bvecs). This greatly reduces to usefulness of this model and hinders transfer to e.g. human data. Why not use summary measures from the data as an input. There are a number of rotationally invariant summary measures that one can extract. I suspect that the first layers of the network may be performing operations such as averaging that are akin to calculating summary measures, so the authors should consider doing that prior to feeding the network.

      We agree with the reviewer that using summary measures will make the tool less dependent on particular imaging protocols and more translatable than using rawdata as inputs. We have experimented using a set of five summary measures (T2, magnetization transfer ratio (MTR), mean diffusivity, mean kurtosis, and fractional anisotropy) as inputs. The prediction based on these summary measures, although less accurate than predictions based on rawdata in terms of RMSE and SSIM (Figure 2A), still outperformed polynomial fitting up to 2nd order. The result, while promising, also highlights the need for finding a more comprehensive collection of summary measures that match the information available in the raw data. Further experiments with existing or new summary measures may lead to improved performance.

      The noise sensitivity analysis is misleading. The authors add noise to each channel and examine the output, they do this to find which input is important. They find that T2/MT are more important for the prediction of the AF data, But majority of the channels are diffusion data, where there is a lot of redundant information across channels. So it is not surprising that these channels are more robust to noise. In general, the authors make the point that they not only predict histology but can also interpret their model, but I am not sure what to make of either the t-SNE plots or the rose plots. I am not sure that these plots are helping with understanding the model and the contribution of the different modalities to the predictions.

      We agree that there is redundant information across channels, especially among diffusion MRI data. In the revised manuscript, we focused on using the information derived from noise-perturbation experiments to rank the inputs in order to accelerate image acquisition instead of interpreting the model. We removed the figure showing t-SNE plots with noisy inputs because it does not provide additional information.

      Is deep learning really required here? The authors are using a super deep network, mostly doing combinations of modalities. is the mapping really highly nonlinear? How does it compare with a linear or close to linear mapping (e.e. regression of output onto input and quadratic combinations of input)? How many neurons are actually doing any work and how many are silent (this can happen a lot with ReLU nonlinearities)? In general, not much is done to convince the reader that such a complex model is needed and whether a much simpler regression approach can do the job.

      The deep learning network used in the study is indeed quite deep, and there are two main reasons for choosing it over simpler approaches.

      The primary reason to pick the deep learning approach is to accommodate complex relationships between MRI and histology signals. In the revised Figure 2A-B, we have demonstrated that the network can produce better predictions of tissue auto-fluorescence (AF) signals than 1st and 2nd order polynomial fitting. For example, the predicted AF image based on 5 input MR parameters shared more visual resemblance with the reference AF image than images generated by 1st and 2nd order polynomial fittings, which were confirmed by RMSE and SSIM values. The training curves shown in Fig. R1 below demonstrate that, for learning the relationship between MRI and AF signals, at least 10 residual blocks (~ 24 layers) are needed. Later, when learning the relationship between MRI and Nissl signals, 30 residual blocks (~64 layers) were needed, as the relationship between MRI and Nissl signals appears less straightforward than the relationship between MRI and AF/MBP/NF signals, which have a strong myelin component. In the revised manuscript, we have clarified this point, and the provided toolbox allows users to select the number of residual blocks based on their applications.

      Fig. R1: Training curves of MRH-AF with number of residual blocks ranging from 1 to 30 showing decreasing RMSEs with increasing iterations. The curves in the red rectangular box on the right are enlarged to compare the RMSE values. The training curves of 10 and 30 residual blocks are comparable, both converged with lower RMSE values than the results with 1 and 5 residual blocks.

      In addition, the deep learning approach can better accommodate residual mismatches between co-registered histology and MRI than polynomial fitting. Even after careful co-registration, residual mismatches between histology and MRI data can still be found, which pose a challenge for polynomial fittings. We have tested the effect of mismatch by introducing voxel displacements to perfectly co-registered diffusion MRI datasets and demonstrated that the deep learning network used in this study can handle the mismatches (Figure 1 – figure supplement 1).

      Relatedly, the comparison between the MRH approach and some standard measures such as FA, MD, and MTR is unfair. Their network is trained to match the histology data, but the standard measures are not. How does the MRH approach compare to e.g. simply combining FA/MD/MTR to map to histology? This to me would be a more relevant comparison.

      This is a good idea. We have added maps generated by linear fitting of five MR measures (T2, MTR, FA, MD, and MK) to MBP for a proper comparison. Please see the revised Figure 3A-B. The MRH approach provided better prediction than linear fitting of the five MR measures, as shown by the ROC curves in Figure 3C.

      • Not clear if there are 64 layers or 64 residual blocks. Also, is the convolution only doing something across channels? i.e. do we get the same performance by simply averaging the 3x3 voxels?

      We have revised the paragraph on the network architecture to clarify this point in Figure 1 caption as well as the Methods section. We used 30 residual blocks, each consists of 2 layers. There are additional 4 layers at the input and output ends, so we had 64 layers in total.

      The convolution mostly works across channels, which is what we intended as we are interested in finding the local relationship between multiple MRI contrasts and histology. With inputs from modified 3x3 patches, in which all voxels were assigned the same values as the center voxel, the predictions of MRH-AF did not show apparent loss in sensitivity and specificity, and the voxel-wise correlation with reference AF data remained strong (See Fig. R2 below). We think this is an important piece of information and added it as Figure 1 – figure supplement 3. Averaging the 3x3 voxels in each patch produced similar results.

      Fig. R2: Evaluation of MRH-AF results generated using modified 3x3 patches with 9 voxels assigned the same MR signals as the center voxel as inputs. A: Visual inspection showed no apparent differences between results generated using original patches and those using modified patches. B: ROC analysis showed a slight decrease in AUC for the MRH-AF results generated using modified patches (dashed purple curve) compared to the original (solid black curve). C: Correlation between MRH-AF using modified patches as inputs and reference AF signals (purple open circles) was slightly lower than the original (black open circles).

      The result in the shiverer mouse is most impressive. Were the shiverer mice data included in the training? If not, this should be mentioned/highlighted as it is very cool.

      Data from shiverer mice and littermate controls were not included in the training. We have clarified this point in the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This work raises the question of how in plane forces generated at the apical surface of an epithelial cell sheet cause out of plane motion, an important morphogenetic motif. To address this question, a new ontogenetic dominant negative rho1 tool, based on the cry2-CIBN system is presented. The authors use this tool to analyze the well studied biophysical process of ventral furrow formation, and dissect the spatiotemporal requirement of rho1 signaling to modulate myosin accumulation. They separate the effect on morphogenesis into an early phase that becomes significantly slowed down by myosin inhibition, and a late phase where the kinetics is comparable to wild type despite treatment. For interpretation of the data, an older model of cell mechanics treating tissue as a purely elastic material is presented. It fails to reproduce the observations. As a modification, in analogy to buckling of a thin beam under load, a compressive stress exerted by the adjacent ectoderm is introduced. Further analysis of cell behaviors in response to various laser mediated tissue manipulations is presented as support of the proposed mechanism.

      Overall, the manuscript addresses an important aspect of morphogenesis. In particular the use of optogenetic tools promises new insights that might be more challenging to achieve with traditional mutant analysis. However, reservations remain with respect to (1) rigor of the analysis, and (2) interpretation and quality of the data in support of the proposed mechanism; this applies in particular to presentation of biophysical observations, including experiment and simulations.

      The manuscript adds valuable quantitative data, in particular the findings described in Fig 2ab. However, insufficient analysis are performed to fully support the claims of the manuscript by the data presented.

      (I) The manuscript proposes an elasticity based model of tissue mechanics, but provides no experimental evidence in support of this assumption. Many rheology studies performed in a wide range of specimen (including the Drosophila embryo) found a separation of time scales, that shows elasticity is a good approximation of tissue mechanics only for time scales short compared to the process studied here.

      We agree with the reviewer that an elasticity-based model of tissue mechanics is a simplification for the actual tissue properties in the real embryos. To provide justification for this simplification, in the revised manuscript, we have cited a previous biophysical study measuring tissue viscoelasticity in early Drosophila embryos (Doubrovinski et al., 2017). Using a magnetic tweezers-based approach, Doubrovinski et al. shows that the lower bound of the decay time of the elastic response is four minutes (the lower limit on the timescales where tissue behaves elastically). In addition, when history dependence of the response is considered, the decay time increases to nine minutes, which is close to the duration of ventral furrow formation (~ 15 – 20 minutes). Therefore, we consider elasticity is a reasonable approximation of tissue mechanics during ventral furrow formation. The elasticity assumption has been widely used in the previously published modeling work to simulate ventral furrow formation (Allena et al., 2010; Conte et al., 2009; Gracia et al., 2019; Heer et al., 2017; Hocevar Brezavšček et al., 2012; Muñoz et al., 2007; Rauzi et al., 2015).The modeling framework used in our current study, which is initially described in Polyakov et al. 2014, successfully predicts the intermediate and final furrow morphologies with a minimal set of active and passive forces without prescribing individual cell shape changes. It is therefore advantageous to use this model to explore the main novel aspect of the folding mechanics underlying ventral furrow formation. We show that the model can recapitulate the binary tissue response to acute myosin inhibition. In addition, it accurately predicts the intermediate furrow morphology at the transitional state and several other morphological properties associated with myosin inhibition. We therefore believe that this minimalistic model captures the central aspect of the physical mechanism underlying mesoderm bistability observed in the experiments.

      (II) The manuscript uses a method of micro-dissection to soften cells, but does not provide a clear definition of the concept softening, provides no rational for the methods functioning, and does not provide independent validation. The described treatment might affect cells in many alternative ways to the offered interpretation. This data is the central experimental evidence given in support of the proposed ectoderm compression mechanism, and therefore it is essential to provide a precise physical explanation of the method, and validation of measurements that bolster the conclusion.

      We apologize for not explaining the meaning of “softening” clearly in our original manuscript and the rationale for using laser ablation to detect compression. By “softening”, we meant to describe the mechanical status of the cell when the subcellular structures that normally support the mechanical integrity (e.g., cortical actin) are disrupted. We reason that when such a change in mechanical properties happens in a specific region of a tissue that is under compression, the cells in this region should have an impaired ability to resist compression from outside of the region and thereby cause the region to shrink.

      Laser ablation has been widely used to measure tensile stresses in cells and tissues by disruption of cells or subcellular structures. The method we used is adapted from previous described protocols, where a femtosecond near infrared laser is used to disrupt subcellular structures for detection of tissue tension (Rauzi et al., 2015; Rauzi et al., 2008).It has been shown that when laser intensity is properly controlled, the treatment can leave the plasma membrane intact but disrupt subcellular structures associated with the plasma membrane, such as adherens junctions and the cortical actomyosin networks (Rauzi et al., 2015; Rauzi et al., 2008).Using a femtosecond near infrared laser, we were able to ablate embryonic tissues that are under tension and observe tissue recoil after laser ablation, suggesting that our approach has disrupted the cortical cytoskeleton in the laser treated region (e.g., Figure 3 and Authors’ Response Figure 1). In these experiments, the lack of damage on the plasma membrane is indicated by the readily recovery of the plasma membrane signal after laser treatment, as well as the lack of bright burn marks on the tissue.

      As we noted before, we reasoned that if tissue is compressive, similar laser treatment that generates tissue recoil in tissues under tension should result in tissue shrinking within the laser-treated region. The data presented in our original manuscript demonstrate that tissue shrinking is not a non-specific response to our laser treatment – we did not observe such a response when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used (Original Figure 4). We have also obtained additional evidence that supports the use of tissue shrinking as a readout of tissue compression. We tested our laser ablation approach in Stage 8 – 9 embryos at regions where cells are actively dividing/proliferating, which would expect to generate compressive stresses in the tissue. As we perform laser ablation in this region, we observed shrinking of the treated region, which was distinct from the tensile tissue response (Authors’ Response Figure 1). While this preliminary evidence is encouraging, we agree with the reviewer that further independent validations are needed given that the methods for detecting tissue compression have not been well established in the field. Following the editor’s suggestion, we have removed this experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 1: Laser ablation in regions of tissues with active cell proliferation (a) or undergoing apical constriction (b). The movement of tissues is indicated by overlaying membrane signals (Ecadherin-GFP) at T = 0 sec and at T = 10 sec. T = 0 in the “After ablation” panels marks the time immediately after ablation. (a) Stage 8 – 9 embryos. Multiple cells are in the process of cell division, as indicated by mitotic rounding (yellow arrowheads) or the appearance of cleavage furrows (red arrowheads). Immediately after laser ablation, the surrounding cells moved towards the ablated region (cyan arrows). (b) An embryo undergoing ventral furrow formation. Ablation within the constriction domain results in recoil of the surrounding cells away from the ablated region (cyan arrows).

      (III) Mechanical isolation of the mesoderm is a very exciting approach to test the possible involvement of adjacent tissues in folding. Indeed, the authors report a delay of ventral furrow formation. However, there is no evidence provided that (a) the mesoderm is mechanically uncoupled, and (b) that the treatment did not have undesired side effects. For example, a similar procedure (so-called cauterization, see Rauzi 2015) has been used to immobilize cells in the Drosophila embryo. Such an effect could account for the observed delay in furrow formation.

      We agree with the reviewer that “mechanical uncoupling” is merely a prediction based on our observation but has not been directly demonstrated. On the other hand, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used mechanically uncoupled mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the imprecise use of the term “mechanically uncoupling” in our original manuscript and we thank the reviewer for pointing this out.

      As for the reviewer’s point (b), we have several pieces of evidence indicating that our approach did not cause anchoring of the tissue to the vitelline membrane. The major difference between the approach we used and that used by Rauzi et al. 2015 is the location of the tissue where the laser treatment was imposed. In order to anchor the tissue to the vitelline membrane, Rauzi et al. target the laser to the apical side of the tissue, adjacent to the vitelline membrane. The resulting cauterization of the tissue caused anchoring of the tissue to the vitelline membrane, presumably by fusion of the tissue with the vitelline membrane. In our approach, we used similar type of laser (femtosecond near infrared laser) to perform tissue disruption, but instead of targeting the apical side of the tissue, we targeted the basal region of the invaginating cleavage furrows during cellularization, with the goal to block cell formation. While the laser intensity we used is high enough to cause cauterization of the tissue as indicated by the appearance of bright autofluorescence in the laser treated region, these “burn marks” are not located at the apical side of the cells (Authors’ Response Figure 2a). The lack of “burn marks” on the vitelline membrane in our experiment is in sharp contrast to the result shown in Rauzi et al 2015 (see Authors’ Response Figure 2b for an example from Rauzi et al in comparison to our own data in 2a). Because of the difference in the location of cauterization, we do not expect that the tissue would be fused with the vitelline membrane after our treatment. This is further suggested by the observation that the burn marks can move before the onset of gastrulation, which again indicates that the tissue is not anchored to the vitelline membrane (Authors’ Response Figure 2c).

      That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 2: Laser disruption of cell formation in the lateral ectodermal region. (a) Cross-section and en face views showing the basal location of the “burn marks” after laser disruption in the lateral ectodermal region. No burn marks are observed at the level of the vitelline membrane. Blue and red curves in the cross-section views indicate the vitelline membrane and the position where the projections were made for the en face views. Magenta arrows: burn marks. (b) Figure 5a from Rauzi et al., 2015, clear bright burn marks can be seen from the apical surface view. (c) Overlay of the signal at T = -10 min and 0 min (onset of gastrulation) showing the movement of burn marks before gastrulation (yellow arrows).

      (IV) Some panels show two distinct molecules tagged with the same or spectrally overlapping flurophores, that unfortunately localize in similar spatial patterns. This encumbers data validation.

      We agree with the reviewer that having two distinct proteins tagged with the same fluorophore is not ideal for understanding the behavior of the tagged proteins, however, it usually does not affect the evaluation of the cell or tissue morphology, as far as the cell membrane is explicitly labeled. For example, in our original Figure 2 (new Figure 4), although GFP is tagged on both CIBN and Sqh, and mCherry is tagged on both CRY2-Rho1DN and Sqh, the cell and tissue morphology is clearly discernable by these markers, which allowed us to evaluate the progression of ventral furrow formation. In the cases where there was a need to evaluate the behavior of a particular molecule (e.g. Sph), we always repeated the experiments in a way such that the molecule of interest is tagged with a distinct fluorophore that does not spectrally overlap with other fluorophores – this often requires the use of an plasma membrane anchored CIBN that is not fluorescently tagged (e.g. Figure 1, Figure 4 – figure supplement 3).

      (V) The physical model is a central part for data interpretation. In its current form it is very challenging to follow. It is also critical the system be studied with proper cell aspect ratio, as the elasticity of thin sheets has a well established non-linear thickness dependence.

      These are valid critiques of our thin layer physical model (original Figure 5). The original purpose of this model is not to recapitulate the actual furrow morphology or cell shape change observed in the actual embryo, but rather to test the possibility of recapitulating the acceleration in tissue flow during the folding process by combining local constriction and global compression in a spherical (circular in 2D) elastic shell. Developing a dynamic vertex model that contains the realistic cell aspect ratio comparable to the actual cells in the embryo while displaying realistic cellular dynamics during the folding process is nontrivial and need substantial further development of the model. Since the manuscript is now focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave the dynamics vertex model out of the revised manuscript, as suggested by the editor.

      Reviewer #2 (Public Review):

      Guo and colleagues aim to unravel the mechanisms driving the fast process of mesoderm invagination in the Drosophila early developing embryo. While cell apical constriction is known to drive ventral furrowing (1st phase), it is still not clear if apical constriction is necessary/sufficient to drive mesoderm internalization (2nd phase) and weather other mechanisms cooperate during this process. By using 1ph optogenetics, the authors cannot test specifically the role of apical constriction but can systematically affect the overall actomyosin network in ventral cells in a time specific fashion (1-minute resolution). In this way, they come to the conclusion that actomyosin contractility is necessary for the 1st phase but not for the 2nd phase of mesoderm invagination. Interestingly, they conclude that the system is bistable. In the second part of this study, the authors test the role of the coupling between mesoderm and ectoderm by using 2D computational modelling and infrared pulsed laser dissection. They propose that the ectoderm can generate compressive forces on the mesoderm facilitating mesoderm internalization (2nd phase).

      This project is of interest since it tackles a key morphogenetic process that is necessary for the development of the embryo. The conclusion of 'bistability' resulting from the RhoDN optogenetic experiments (1st part of this study) are well supported and quite interesting. The IR laser experiments used to tackle the coupling between ectoderm and mesoderm (2nd part of the study) are key to support main conclusions, nevertheless their experimental design and results are puzzling. It is not clear what the authors are actually doing to the tissues. The experiments performed in the 2nd part of this study need to be revisited and conclusions eventually softened.

      Major comments:

      1) The 920 nm laser ablation of ectoderm cells is a key experiment in this study to support the ectoderm compression hypothesis. Nevertheless, this experiment is puzzling: the rationale of the experimental design, the effect of the laser on cells and the interpretation of the results are unclear.

      The rationale for the laser ablation experiment designed to test tissue compression is analogous to the widely used laser ablation approach for detecting tissue tension (Rauzi et al., 2015; Rauzi et al., 2008). In typical experiments where laser ablation was used to measure tensile stresses in cells and tissues, ablation of cells or subcellular structures that are under tension results in recoil of surrounding cell/tissue structures. We reasoned that if the tissue is under compression, similar laser treatment should result in shrinking of the laser-treated region, as the cells in the laser-treated region are expected to have an impaired ability to resist compressive stresses from outside of the region.

      In our experiment, we used the reduction of the width of the laser treated region within the first 10 sec after laser treatment as the measure for tissue shrinking, which we considered as an indication for the presence of compressive stresses. This tissue response, albeit mild, is not a non-specific tissue response to our laser treatment – we did not observe tissue shrinking when we treat the tissue during cellularization or within the first five minutes of gastrulation, although identical experimental conditions were used. The rate and magnitude of tissue shrinking after laser treatment is determined by multiple factors, including the level of compressive stresses, the difference in cell rigidity before and after laser treatment, and the overall viscosity of the tissue. We acknowledge that the knowledge on these factors is largely lacking, and therefore additional independent validations of our approach are needed to further strengthen our conclusion on the presence of tissue compression. Following the editor’s suggestion, we decided to remove the laser ablation experiment from the current manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      2) The authors propose to use again 920 nm laser ablation but this time to "physically separate" the two ectoderms from the ventral tissue. This is again a key experiment, but it raises some concerns:

      a. "Physical separation" would need to be demonstrated (e.g., EM after laser ablation). From Fig. 6b it is clear that IR laser ablation results in prominent auto-fluorescent zones. This has been already reported in previous work (De Medeiros G. et al. Scientifc Reports 2020) showing that high power and sustained IR fs laser targeting produces auto-fluorescence and highly electron-dense structures in the early developing Drosophila embryo. This process is referred to laser cauterization that does not induce separation between tissues. This structures eventually displace together with the lateral tissue (also shown in Fig.6 b). b. This strong laser "treatment", that should be ectoderm specific, results in perturbation of other non-ectoderm related processes (e.g., mesoderm apical constriction as shown by the authors). This can support the idea that many other processes are affected and that in general this laser heating "treatment" has global effects. These results might invalidate the conclusion proposed by the authors.

      These are both valid critiques. As for the reviewer’s point “a”, we agree with the reviewer that a “physical separation” of the mesoderm from the ectoderm has not been rigorously demonstrated in our original manuscript. As detailed in our response to reviewer #1 comment #3, since the purpose of this experiment is to ask whether the presence of the lateral ectoderm is important for the mesoderm to transition between apical constriction and invagination (and our result shows yes), whether the approach we used physically separated the mesoderm and the ectoderm is no longer an immediately relevant question. We apologize for the vague use of “physical separation” in our original manuscript and we thank the reviewer for pointing this out.

      To address the reviewer’s point “b” and to ask whether the laser treatment used in our experiment has a global effect, we performed a control experiment where we treated the yolk region of the embryo with the identical approach. Despite the appearance of burn marks in the treated yolk region, mesoderm invagination proceeded largely normally under this condition, with a mild reduction in the rate of furrow invagination (Authors’ Response Figure 3). Therefore, the prominent delay in the transitional state we observed after disruption of lateral ectoderm (Original Figure 6) is not likely caused by non-specific laser heating effect. In addition, in both the yolk-ablation and the ectoderm-ablation experiments, cellularization occurred normally outside of the laser-treated regions, in further support of the lack of strong non-specific effect from our laser treatment. That being said, we acknowledge that we do not fully understand the impact of the laser treatment on the embryo (e.g., what causes the reduced rate of apical constriction), and more control experiments are required in order to fully describe the tissue response we observed. As suggested by the editor, we decided to remove the ectoderm-ablation experiment from the revised manuscript and focus on the characterization of the optogenetic tool and the binary tissue response after acute actomyosin inhibition.

      Authors’ Response Figure 3. Laser treatment in the yolk region of the embryo. (a) Cartoon depicting the position of laser treatment. Similar laser condition was used as described in the original Figure 6. Laser ablation was performed during cellularization and the treated embryo was imaged during gastrulation. (b) An example control embryo without laser treatment. (d-e) Two examples showing ventral furrow formation after laser treatment in the yolk region. Only a mild delay in furrow invagination was observed. Red arrowheads indicate the invagination front. Scale bar: 25μm.

      Reviewer #3 (Public Review):

      The authors address how contractile forces near the apical surface of a cell sheet drive out-of-plane bending of the sheet. To determine whether actomyosin contractility is required throughout the folding process and to identify potential actomyosin independent contributions for invagination, they develop an optogenetic-mediated inhibition of myosin and show that myosin contractility is critical to prevent tissue relaxation during the early stage of folding but is dispensable for the deepening of the invagination. Their results support the idea that the mesoderm is mechanically bistable during gastrulation. They propose that this mechanical bistability arises from an in-plane compression from the surrounding ectoderm and that mesoderm invagination is achieved through the combination of apical constriction and tissue compression. Regarding global message of the manuscript, I have two main critics. The authors consider their work as the first to prove that there is a additional mechanism to apical constriction leading to invagination. This is not true. First, the fact that the ectoderm could exert a compressive force on the invaginating mesoderm is not new and has been not only proposed, but tested previously (Rauzi and Leptin, 2015). Second, several recent publications demonstrated that on top of apical constriction, lateral forces were also required for the invagination and the authors ignore these data (Gracia et al, 2019 ; John et al, 2021).

      We thank the reviewer for this important comment. In the original Introduction, we have mentioned several previous studies that suggest the presence of additional mechanisms to apical constriction during ventral furrow formation. We stated: “The observation that the maximal rate of apical constriction and the maximal rate of tissue invagination occur at distinct times suggests that apical constriction does not directly cause tissue invagination (Polyakov et al., 2014; Rauzi et al., 2015). A number of computational models also predict that mesoderm invagination requires additional mechanical input, such as “pushing” forces from the surrounding ectodermal tissues, but experimental evidence for this additional mechanical input remains sparse (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010).”

      To address the reviewer’s comment, in the revised manuscript, we expanded this paragraph to further elaborate the previous contributions: “However, accumulating evidence suggests that apical constriction does not directly drive invagination during the shortening phase. First, it has been observed that the maximal rate of apical constriction (or cell lengthening) and the maximal rate of tissue invagination occur at distinct times (Polyakov et al., 2014; Rauzi et al., 2015). Second, it has been previously proposed, and more recently experimentally demonstrated, that myosin accumulated at the lateral membranes of constricting cells (‘lateral myosin’) facilitates furrow invagination by exerting tension along the apical-basal axis of the cell (Brodland et al., 2010; Conte et al., 2012; Gracia et al., 2019; John and Rauzi, 2021). Finally, a number of computational models predict that mesoderm invagination requires additional mechanical input from outside of the mesoderm, such as “pushing” forces from the surrounding ectodermal tissue (Munoz et al., 2007; Conte et al., 2009; Allena et al., 2010; Brodland et al., 2010). These models are in line with the finding that blocking the movement of the lateral ectoderm by laser cauterization inhibits mesoderm invagination (Rauzi et al., 2015). A similar disruption of ventral furrow formation can also be achieved by increasing actomyosin contractility in the lateral ectoderm (Perez-Mockus et al., 2017). While these pioneer studies highlight the importance of cross-tissue coordination during mesoderm invagination, the actual mechanical mechanism that drives the folding of the mesodermal epithelium and the potential role of the surrounding ectodermal tissue remain to be elucidated.”

      One of the motivations for us to develop experimental approaches to detect compression in the ectoderm (original Figure 4) and to disrupt the ectoderm (original Figure 6) is the lack of direct evidence demonstrating the mechanical contribution of the ectoderm to mesoderm invagination. Several studies have shown that manipulations of the ectodermal tissue can impair ventral furrow formation. One study shows that preventing the movement of the lateral ectoderm, by anchoring ectodermal cell apices to the vitelline membrane, blocks ventral furrow invagination(Rauzi et al., 2015). Another study shows that upregulation of apical myosin contractility in the lateral ectodermal tissues can inhibit or even reverse the furrow invagination process (Perez-Mockus et al., 2017). These results indicate that an increase in the resistance to mesoderm movement can impair mesoderm invagination. However, this would be expected even if the ectoderm does not provide active mechanical input to facilitate mesoderm invagination. Therefore, these experiments, while very informative, did not provide direct evidence for a role of ectodermal compression in mesoderm invagination.

      Another motivation for us to examine potential mechanisms outside of the mesoderm is the observation that ventral furrow invagination continues even when both apical myosin and lateral myosin are disrupted after Ttrans (Late Group embryos). This result indicates that factors other than apical or lateral myosin must be responsible for the invagination of the furrow in Late Group embryos. In the revised manuscript, we used a modeling approach to demonstrate that lateral myosin and ectodermal compression may function in parallel to promote the invagination of the ventral furrow (Figure 7). In the revised Discussion, we propose that “ventral furrow formation is mediated through a joint action of multiple mechanical inputs. Apical constriction drives initial indentation of ventral furrow, which primes the tissue for folding, whereas the subsequent rapid folding of the furrow is promoted by bistable characteristic of the mesoderm and by lateral myosin contractions in the constricting cells.”

      They generated an optogenetic tool, "Opto-Rho1DN", to inhibit Rho1 through light-dependent plasma membrane recruitment of a dominant negative form of Rho1 (Rho1DN). The specificity of local inactivation of Myosin was tested on apical myosin before and during invagination. They observed a strong reduction of Myosin II recruitment and a phenotype that mimicks Rok inhibition. They found that acute loss of myosin contractility during most of the lengthening phase results in immediate relaxation of the constricted tissue, but similar treatment near or after the lengthening-shortening transition does not impede invagination. They conclude that the second part of furrow invagination is not due to myosin activities at the apical or lateral cortices of the mesodermal cells and that actomyosin contractility is required in the early but not the late phase of furrow formation. This part regarding the temporal requirement of Myosin during invagination brings novelty in the field since it has never been tested before.

      We thank the reviewer for the comment on the novelty of our work.

      They observe that ectodermal cells shorten their apico-basal axis prior to Ttrans, and that compression from the ectoderm is independent of ventral furrow formation since it still occurs even if invagination is inhibited.

      They further develop two types of simulations to test theoretically the importance of compressive stress in the invagination process. The theoretical part would need to be further developed and discussed. They would need to integrate all the different components that have been shown to be essential for the invagination (not only apical constriction) and the dynamic aspect of the vertex model has to be clearly explained.

      We thank the reviewer for the suggestions on the modeling parts. In the energy-based vertex model (the Polyakov model, original Figure 3), two previously identified mechanisms, apical constriction and basal relaxation, have been implemented in the model to drive lengthening-shortening cell shape change and furrow invagination. Following the reviewer’s suggestions, we have modified the Polyakov model to include additional mechanisms that have been shown to facilitate ventral furrow invagination. In particular, we focused our analysis on the role of lateral myosin in the constricting cells on furrow invagination (Figure 7). Please refer to our response to the combined comments for details (in the section “ Additional modeling analysis to test the known mechanisms for mesoderm invagination”).

      As for the dynamic vertex model presented in our original manuscript (original Figure 5), as detailed in our response to Reviewer #1’s comment #5, since the revised manuscript is focused on the bistable characteristics of the mesoderm during gastrulation rather than tissue dynamics during the folding process, we decide to leave this part out of our revised manuscript as suggested by the editor.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors analyzed several models for predicting the early onset of T2D, where they trained and tested on a UKB based cohort, aged 40 - 69 and suggest two simple logistic regression models: the anthropometric and the five blood tests models in reference to FINDRISC and GDRS models. Their models achieved better auROC, APS, and decile prevalence OR, and better-calibrated predictions.

      Strengths:

      1.The authors have neatly explained their objectives and performed well-justified analyses.

      2.The authors highlight how using both features - HbA1C% measure and reticulocyte count may provide a better indication of the average blood sugar level during the last two-three months than using just the standard HbA1C% measure.

      3.Further verification of the proposed anthropometric-based and 5 blood-test results-based modelscan discriminate discriminating within a group of normoglycemic participants and within a group of pre-diabetic participants resulted in outperforming the FINDRISC and the GDRS based models.

      Weaknesses:

      1. As the authors point out in the manuscript that these models are suited for the UKB cohort or populations with similar characteristics. It limits the extrapolation of these findings onto another cohort from a different background until analyzed on another country/continent-based cohort.

      We agree with this comment as we indeed pointed in the paper. We recommend to adjust these models when applying it to populations with distinct characteristics.

      1. In the methods section, an additional explanation of how the T2D prevalence bins were formed would be useful to a reader.

      We thank the reviewer for this note, we added the following explanation in section 4.11: “We considered several potential risk score limits that separate T2D onset probability in each of the scores groups, and we chose boundaries that showed a separation between the risk groups on the validation datasets. Once we decided on the boundaries of the score, we report the prevalence in each risk group on the test set and we report these results.”

      1. The authors have mentioned that the prevalence of diabetes has been rising more rapidly in low and middle-income countries (LMICs) than in high-income countries and the objective of the present research was to develop clinically usable models which are easy to use and highly predictive of T2D onset. As lifestyle is also one of the contributory factors for T2D, additional analysis that includes a comparison of groups between low-income and high-income subjects within UKB-based cohort provided such metadata available would help understand if the prevalence for T2D differs or not between such groups.

      We thank the reviewer for this comment, we added below an analysis that we run on our data, showing the deprivation indexes differences between sick and healthy populations. The sick population has a higher deprivation index as expected. When running a Mann-Whitney U Test on the data we get a p value of zero, creating this with a sample of just 1000 participants from each group, we get a p-value of 2.37e-137. This indicates that there is a significant correlation between deprivation index and tendency to develop T2D. We also add this finding to the supplementary material and a reference to it.

      You can also find below a SHAP diagram showing tht higher Townsend deprivation index is pushing the prediction for T2D upwards.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper the authors use a conditional knockout strategy to assess the effects of deletion of the dominant oxygen-sensing hypoxia-inducible factor (HIF) hydroxylase enzyme, prolyl hydroxylase 2 (Phd2) restricted to the regulatory T cell (Treg) lineage. They use a well-established Foxp3-driven Cre recombinase allele. Phd2 is thus silenced in cells that have expressed or continue to express Foxp3 from the time this transcription factor, which is essential for Treg development and function, first occurs. They show that this approach leads to a change in Treg behaviour resulting in loss of some aspects of regulatory function and development of a Th1-like phenotype by the Foxp3 expressing cells. Effects are in general reversed when HIF-2 is silenced alongside Phd2, and may be amplified by simultaneous silencing of the HIF-1 isoform.

      The findings overlap with those reported following generalised silencing of Phd2 and following adoptive transfer of Treg in which Phd2-silencing is induced (Yamamoto et al., 2019) and are broadly compatible with those reported following a similarly Treg-restricted knockout of the von Hippel-Lindau gene (the recognition component of the E2-ubiquitin ligase that targets HIF-alpha chains that have been modified by Phd2) (Lee et al., 2015) but the results reported also differ significantly from these earlier reports in a number of intriguing respects which I feel warrant further discussion and ultimately investigation.

      The Introduction is in general informative and well written but it is a shame that it does not contain more discussion of the current state of knowledge of the interplay between HIF signalling and Treg function. This would provide a platform for a more detailed and scholarly discussion of the similarities and differences between this work and existing literature in the Discussion, where existing papers are currently described rather briefly. The introduction contains the statement 'Further complexity in this pathway has been provided by the identification of additional, non-HIF-related, PHD substrates, suggesting a role of proline hydroxylation in other settings requiring oxygen-dependent regulation', citing a single reference. This does not really represent the complex balance of arguments across the literature about non-HIF substrates for the HIF hydroxylase enzymes.

      The conclusions of this paper are mostly well supported by data, but some aspects need to be clarified and extended.

      We sincerely apologize for our apparent lack of recognition of previous work performed by other colleagues active in this field. We have now modified the Introduction section, to provide a better, yet concise, overview of the current knowledge of hypoxia signalling in regulatory T cell biology.

      A central issue for any conditional knock-out strategy is whether the intended tissue restriction is successfully achieved. The authors acknowledge that some issues have been reported with the Cre-recombinase allele they use. They, however, show the expected restriction to cells of the Treg lineage in two of the lymphoid tissues under investigation (spleen and mesenteric lymph node - Supplementary figure 1b) but do not show similar results for other tissues. Some concerns arise because in Figure 8b YFP (which is expressed alongside the Cre-recombinase) is visible in what appears to be the endothelium of the spleen. Additionally, the spleen sections illustrated show convincing splenomegaly in the Phd2-deficient Treg mice but expansion of the red pulp appears to be at least as prominent as any changes that might have occurred in the white pulp. Furthermore, the gross changes in abdominal appearances described as a 'hemorrhagic abdomen' (Figure 1c) include a more plethoric abdominal wall, prominent intestinal blood vessels and a much darker, and perhaps enlarged, liver compared with the control animal. These appearances might result from increased angiogenesis and / or erythropoiesis, neither of which would be expected to result from Treg lineage restricted Phd2 knockout but are known to occur with Phd2 ablation in other tissues. If there is convincing evidence of haemorrhage it would be nice to see this more obviously displayed macro- or, perhaps better still, microscopically.

      We thank the reviewer for this comment. We have now provided a better description of the haematological status of these mice, in which an elevated haematocrit and increased vascular permeability has been observed (now depicted in supplemental Figure 2). As suggested, we found indeed minimal, yet sizable expression of the Cre recombinase (as judged by YFP expression) in CD45-negative, non-lymphoid cells in all organs examined (as now depicted in supplemental Figure 9). Finally, none of the organs examined displayed an increased expression of erythropoietin (as judged by a sensitive qPCR assay, data not shown), a likely candidate for the haematological abnormalities observed in these mice. The mechanism underlying the apparent extramedullary erythropoiesis occurring in these mice remains therefore to be established. Noteworthy however, an additional experiment performed following a suggestion from one of the reviewers (see Figure 3 and our response 23), strongly suggests that PHD2 affects the Treg phenotype in a cell autonomous fashion. We do however acknowledge that the tissue abnormalities preclude any firm conclusion related to the positioning of Tregs within the spleen and have therefore deleted this section from the manuscript and adapted our conclusion consequently.

      Given that the Cre-recombinase allele used is expressed through the endogenous Foxp3 locus which is located on the X-chromosome and thus subject to random inactivation in the cells of females it is important that the sex of animals used in the experiments is specified.

      This has now been done in the Figure legends

      Experiments show alterations in Phd2-deficient Treg mice compared with control mice in homeostatic proliferation in a lymphopenic environment (Figure 3), the induction of colitis by DSS colitis (Figure 4) and the response to Toxoplasma gondii infection (Figure 4). Given the time courses these effects are likely to be real but interpretation is complicated by the spontaneous effects on the colon of Phd2-deficient Treg mice reported in Figure 1d and e. Given the wide general importance of interferon-gamma in immune / inflammatory responses I am not sure how much weight to place on the observation that concurrent interferon-gamma knockout results in loss of the Phd2-deficient Treg mice pro-inflammatory phenotype (Figure S3). No differences are seen in an in vivo model in which inflammation is induced by injection of anti-CD3 antibodies (Figure S2).

      Although the point is well taken, we felt it was important to perform a few experiments to illustrate the specificity of the inflammatory syndrome observed in these mice. We acknowledge the fact that the effect of concurrent loss of interferon-gamma on the phenotype of PHD2ΔTregs could have been anticipated. Additionnaly, we also think that the fact that these mice retain the same sensitivity to a “Th17-dominated” inflammatory response (also leading to a loss of weight) strengthens one of the messages of the manuscript, i.e. that loss of PHD2 expression affects Treg function in a selective, Th1-oriented fashion.

      An important conceptual difference between the interpretation of results reported here and those reported by Yamamoto et al. is that the 'Phd2-deficient Treg' purified here do not show a change in regulatory function in vitro whereas those used by Yamamoto et al. failed to act normally as regulatory cells. It is unclear whether this is due to differences in the way proliferation was stimulated, the cell purification strategies used (YFP+ in the current work; CD4+;CD25+ in Yamamoto et al.), the silencing of Phd2 (by knockout throughout development here versus through an inducible-shRNA only in mature cells in Yamamoto et al.), some other feature of the experiments (e.g. the use of feeder cells) or whether a difference would be revealed by more extensive titration. The result reported here is somewhat surprising given the presence of a Th1-like immunophenotype in the cells used in these in vitro suppression assays, which at face value might mean that this immunophenotype is not responsible for changes in their regulatory capacity seen in vivo. This may be true, but it is at odds with Bayesian argumentation. It may be a coincidence, but both models in which control Treg and Phd2-deficient Treg behave similarly involve treatment with anti-CD3 antibodies, raising the possibility that these antibodies in some way nullify differences reported with other stimuli, rather than this necessarily being related to the hypothesised difference between Th1 and Th17 responses in the in vivo model.

      We fully agree with the reviewer’s comment, and we were similarly worried that the differences reported in vivo vs in vitro were due to different agonists used. We however attempted to evaluate Treg function in vitro using alternative approaches, including an assay in which allogeneic antigen-presenting cells (including T-cell depleted spleen cells or highly purified dendritic cells) were used as agonists and Interferon-gamma secretion and proliferation as readouts. In another set of experiments, we used in vitro or in vivo derived Th1 cells instead of naïve T cells as responders. In all instances examined to date, PHD2-deficient Tregs displayed an adequate suppressive function in vitro (data not shown).

      Data showing reversal of the Phd2-deficient Treg in vivo phenotype by knockout of HIF-2alpha, but not HIF-1alpha are convincing and support the data of Yamamoto et al. The observation that Treg-specific PHD2-HIF1α double knockout mice were born at sub-mendelian ratios, displayed a marked weight loss during adult life and reduced viability, indicative of a more pronounced pro-inflammatory status is reported but data is not shown. This is certainly of interest and will no doubt receive further attention. The data that Treg-selective HIF1α or HIF2α deficiency does not affect immune homeostasis in naive mice shown in Figure S4 is relevant and compelling. These results are discussed in the context of recent work published by Hsu et al., 2020 which is interesting. Taken together these data highlight the fact that results reported throughout this manuscript arise from a combination of developmental differences with those occurring in the adult animal.

      We thank the reviewer for these positive comments

      The transcriptomic data presented has not, to date, been made available to reviewers or the public. Importantly, it is reported to show a disconnection between changes in glycolytic gene expression pattern and the immune phenotype. Specifically, whilst loss of Phd2 expression in Treg is associated with alterations in their regulatory function and with induction of glycolytic genes, the change in function, but not the change in glycolytic gene expression, is reversed by simultaneous knockout of HIF-2alpha and conversely the gene expression pattern, but not the change in function, is reversed by simultaneous knockout of HIF-1alpha. This will be of great interest to those working on the hypothesis that the switch between oxidative phosphorylation and glycolysis underlies functional changes in T cells, particularly if the changes in glycolytic gene expression actually convert into changes in glycolytic flux (as observed following HIF-induction in other cell types).

      The transcriptomic data are available to the public on GEO with the code: GSE184581

      The authors propose that a change in CXCR3 expression resulting from a change in STAT1 phosphorylation (but not absolute levels of STAT1) consequent on Phd2- inactivation leads to mal-distribution of Treg (at least in the spleen), and that given the broadly paracrine action of Treg this feature alone might explain the loss of regulatory activity in vivo. This is an intriguing hypothesis based at least in part on associative data rather than a formal proof of causality. Changes in STAT1 phosphorylation following interferon-gamma stimulation are far from 'all-or-nothing' (at the timepoint illustrated many cells have normal pSTAT1 levels even though the mean fluorescence intensity is reduced). Results in Figure 7b show that changes in STAT1 phosphorylation are seen in conventional Foxp3 negative T cells; since Phd2 knockout is restricted to the Treg lineage this change is presumably indirect, raising the possibility that the change seen in Treg is also indirect, rather than truly cell autonomous. Changes in pSTAT1 are acknowledged to affect a huge number of genes / processes so picking any one as the total explanation for any change in behaviour may be an over simplification. The analysis of changes in Treg localisation in the spleen is potentially interesting and may reach the correct conclusion but the methodology used is not clearly explained and in particular it is not clear how splenomegaly / changes in gross splenic architecture have been taken into account.

      We fully agree with the reviewer comments and have now deleted the final figure of our manuscript dealing with Treg positioning in the spleen. We indeed agree that due to the morphological changes in spleen size and architecture, more detailed work would be required to confirm our initial hypothesis. Unexpectedly, and thanks to a remark from another reviewer, we found that PHD2-deficient Tregs (which are present at high frequencies in the spleen of PHD2ΔTregs mice) are largely outcompeted both in heterozygous PHD-2fl/fl Cre+/- mice (see Figure 3) and upon equal transfer into WT mice of a 1:1 mix of wt and PHD-2-deficient Tregs, greatly complicating the study of the relative positioning of these cells within lymphoid organs. We do however stand by our previous conclusion suggesting that STAT1-signaling appears as affected in PHD2-deficient Tregs. This conclusion is not only supported by the reduced accumulation of pSTAT1 in these cells, as shown in Figure 8, but also by the bioinformatic analysis of transcriptomic data and the confirmation, at the protein level, of the reduced expression CXCR3 a well characterized STAT1-dependent chemokine receptors (as shown in Figure 8).

      Overall, this work contains many interesting datasets which need to be taken into account as we build our understanding of the intersection between HIF-signalling and regulatory T cell function, particularly as pharmacological manipulation of HIF signalling may provide a route to immunomodulation through alterations in regulatory T cell function.

      We thank again the reviewer for this positive appreciation of our work.

    1. Author Response

      Reviewer #1 (Public Review):

      The key question addressed of this MEG study is whether speech is represented singly or multiplexed in the human brain in the linguistic hierarchy. The authors used state-of-the-art analyses (multivariate Temporal Response Functions) and probablilistic information-theoretic measures (entropy, surprisal) to test distinct contextual speech processing models at three hierarchical levels. The authors report evidence for the coexistence of local and global predictive speech processing in the linguistic hierarchy.

      The work uses time resolved neuroimaging with state-of-the-art analyses and cognitive (here, linguistic) modeling. The study is very well conducted and draws from very different fields of knowledge in convincing ways. I see one limitation of the current study in that the authors focused on phase-locked responses, and I hope future work could extend to induced activity.

      Overall, the flow in the MS could be streamlined. Some smoothing in the introduction would be helpful to extract the main key messages you wish to convey.

      For instance, in the abstract:

      -Can you explain the two views in a simpler way in the abstract and to a non-linguistic audience? Do you mean to say that classic psycholinguistic models tend to follow a strict hierarchically integration (analysis only) but an alternative model is hierarchically inferential (analysis by synthesis)?

      -Indicate early on in abstract or intro where the audience is being led with a concise message on how you address the main question. For instance:

      To contrast our working hypotheses A and B, we used a novel information-theoretic modeling approach and associated measures (entropy, surprisal), which make clear predictions on the latency of brain activity in responses to speech at three hierarchal contextual levels (sublexical, word and sentence).

      We have revised the Abstract and Introduction to reduce the amount of terminology and add additional explanations. Wherever possible, we now use general terms (“bottom up”, “predictions”, “context”, …) instead of terms associated with specific theories. We hope we found a balance between improving accessibility and retaining the qualities seen by Reviewer 2, who thought the Introduction was clearly written and well connected to the psycholinguistics literature.

      All the models we compare are compatible with an analysis by synthesis approach, as long as the generative models are understood to entail making probabilistic predictions about future input. The generative models in analysis by synthesis, then, are one way in which “to organize internal representations in such a way as to minimize the processing cost of future language input“ (Introduction, first paragraph). We have clarified this in the first paragraph of the Introduction.

      • Why did the authors consider that the evoked response is the proper signal to assess as opposed to oscillatory (or non phase-locked) activity?

      The primary reason for our choice of dependent measure is the prior research we based our design on, showing that the linguistic entropy and surprisal effects are measurable in phase-locked responses (Brodbeck et al., 2018; Donhauser and Baillet, 2020). We have made this more explicit in part of the Introduction where we introduce our approach (“To achieve this, we analyzed …”).

      As to oscillatory dependent measures, we consider them an interesting but parallel research question. We are not aware of specific corresponding effects in non-phase locked activity. Accordingly, analyzing oscillatory responses without a clear prior hypothesis would require additional decisions, such as which bands to analyze, which would entail issues of multiple comparison. An additional caveat is that the temporal resolution of oscillatory activity is often lower than that of phase-locked activity, which might potentially make it harder to distinguish responses based on their latency as we did here, to test whether the latency of different context models differ.

      • Parallel processing with different levels of context (hence temporal granularities) sounds compatible with temporal multiplexing of speech representation proposed by Giraud & Poeppel (2012) or do the authors consider it a separate issue?

      We consider our investigation orthogonal to the model discussed by G&P (2012). G&P’s model is about the organization of acoustic information at different time-scales, and does not discuss the influence of linguistic constructs at the word level and above. On the other hand, the information-theoretic models that form the basis of our analysis track the linguistic information that can be extracted from the acoustic signal. The temporal scales invoked by G&P’s model are also different from the ones used here, defined based on acoustic vs. linguistic units. Thus, the kind of neural entrainment as a mechanism for speech processing hypothesized by G&P is fully compatible with our account, but not at all required by it.

      Methods:

      • Figure 2: please spell out TRFs and clarify the measured response

      We have done both in the Figure legend.

      • The sample size (N=12) is very low in today standards but the statistical granularity is that of the full MEG recording. Can a power estimate be provided or clear justification of reliability of statistical measures be described.

      We appreciate and share the reviewers’ concern with statistical power and have made several modifications to better explain and rationalize our choices.

      First, to contextualize our study: The sample size is similar to the most comparable published study, which had 11 participants (Donhauser and Baillet, 2020). Our own previous study (Brodbeck et al., 2018) had more participants (28) but only a fraction of the data per subject (8 minutes of speech in quiet, vs. 47 minutes in the present dataset). We added this consideration to the Methods/Participants section.

      We also added a table with effect-sizes for all the main predictors to make that information more accessible (Table 1). This suggests that the most relevant effects have Cohen’s d > 1. With our sample size 12, we had 94% power to detect an effect with d = 1, and 99% power to detect an effect with d = 1.2. This post-hoc analysis suggests that our sample was adequately powered for the intended purpose.

      Finally, all crucial model comparisons are accompanied by swarm-plots that show each subject as a separate dot, thus showing that these comparisons are highly reproducible across participants (note that there rarely are participants with model difference below 0, indicating that the effects are all seen in most subjects).

      • The inclusion of a left-handed in speech studies in unusual, please comment on any difference (or lack thereof) for this participant and notably the lateralization tests.

      We agree that this warrants further comment, in particular given our lateralization findings. We have made several changes to address this concern. At the same time we hope that the reviewers agree with us that, with proper care, inclusion of a left-handed participants is desirable (Willems et al., 2014), and indeed is becoming more mainstream, at least for studies of naturalistic language processing (e.g. Shain et al., 2020). First, we now draw attention to the presence of a left-hander where we introduce our sample (first paragraph of the Results section). Second, we repeated all tests of lateralization while excluding the left-hander. Because this did not change any of the conclusions, we decided to keep reporting results for the whole sample. However, third, we now mark the left-handed participant in all plots that include single-subject estimates and corresponding source data files. Overall, the left-hander indeed shows stronger right-lateralization than the average participant, but is by no means an outlier.

      • The authors state that eyes were kept open or close. This is again unusual as we know that eye closure affects not only the degree of concentration/fatigue but directly impact alpha activity (which in turn affects evoked responses (1-40 Hz then 20 Hz) that are being estimated here). Please explain.

      Previous comparable studies variably asked subjects to keep their eyes closed (e.g. Brodbeck et al., 2018) or open (e.g. Donhauser and Baillet, 2020). Both modes have advantages and disadvantages, none of which are prohibitive for our target analysis (ocular artifacts were removed with ICA and oscillatory alpha activity should, on average, be orthogonal to time-locked responses to the variables of interest). Importantly however, both modes have subjective disadvantages when enforced: deliberately keeping eyes open can lead to eye strain and excessive blinking, whereas closing eyes can exacerbate sleepiness. For this reason we wanted to allow subjects to self-regulate to optimize the performance on the aspects of the task that mattered – processing meaning in the audiobook. We extended the corresponding Methods section to explain this.

      • It would be helpful to clarify the final temporal granularity of analysis. The TRFs time courses are said to be resampled to 1kHz (p22) but MEG time courses are said to be resampled at 100 Hz (p18).

      Thanks for noting this. We clarified in the TRF time-course section: the deconvolution analysis was performed at 100 Hz, and TRFs were then resampled to 1 kHz for visualization and fine-grained peak analysis.

      • The % of variance explained by acoustic attributes is 15 to 20 folds larger than the that explained by the linguistic models of interest. Can a SNR measure be evaluated on such observations?

      We appreciate this concern, which is indeed reasonable. In order to better clarify this issue we have added a new paragraph, right after Table 1. In brief, since the statistical analysis looks for generality across subjects, the raw % explained values do not directly speak to the SNR or effect size. Rather, the SNR concerns how much variability is in this value across subjects. The individual subject values in Figure 3-B, and effect sizes now reported in Table 1, show that even though the % variability that is uniquely attributable to information-theoretic quantities is small, it is consistently larger than 0 across subjects.

      Results and Figures:

      • The current figures do not give enough credit to the depth of analysis being presented. I understand that this typical for such mTRFs approach but given the level of abstraction being evaluated in the linguistic inputs, it may be helpful to show an exemple of what to expect for low vs. high surprisal for instance from the modeling perspective and over time. For instance, could Figure 1 already illustrate disctinct predictions of the the local vs. global models?

      Thank you for pointing out this gap. We have added two figures to make the results more approachable:

      First, in Figure 3 we now show an example stimulus excerpt with all predictors we used. This makes the complete set of predictors quickly apparent without readers having to collect the information from the different places in the manuscript. It also gives a better sense of the detail that is modeled in the different stimulus representations. Second, we added Figure 6 to show example predictions from the different context models, and explain better how the mTRF approach can decompose brain responses into components related to different stimulus properties.

      • Why are visual cortices highlighted in figures?

      Those were darkened to indicate that they are excluded from the analysis. We have added a corresponding explanation to the legend of Figure 3.

      • Figure 2 Fig 2A and B: can the authors quantitatively illustrate "5-gram generally leads to a reduction of word surprisal but its magnitude varies substantially between words" by simply showing the mean surprisal and its variance?

      Added to the Figure legend.

      Fig 2C: please explain the term "partial response"; please indicate for non M/EEGers what the arrow symbolizes.

      Added to the Figure legend.

      • Figure 3:

      p8: the authors state controlling for the "acoustic features" but do not clearly describe how in the methods and this control comes as a (positive) surprise but still a bit unexpected at first read. Perhaps include the two acoustic features in Fig2C and provide a short couple sentences on how these could impair or confound mTRF performance.

      We thank you for pointing out this lack of explanation. We have added a description of all the control predictors to the end of the Introduction, right after explaining the predictors of main interest. We have also added Figure 3 to give an example and make the nature of all the controls explicit.

      Have the same analysis been conducted on a control region a priori not implicated in linguistic processing? This would be helpful to comfort the current results.

      The analysis has been performed on the whole brain (excluding the insula and the occipital lobe). Figure 4 (previously Figure 3) shows that generally only regions in the temporal lobe exhibit significant contributions from the linguistic models (allowing for some dispersion associated with MEG source localization). Although this is not shown in the figure, regions further away from the significant region generally exhibit a decrease in prediction accuracy from adding linguistic predictors, as is commonly seen with cross-validation when models overfit to irrelevant predictors.

      Fig 3B-C-E: please clearly indicate what single dot or "individual value" represents. Is this average over the full ROI? Was the orientation fixed? Can some measure of variability be provided?

      Explanation of individual dots added to Figure 4-B legend (formerly 3-B). Fixed orientation added to the methods summary in the Figure 2-C legend. To provide more detailed statistics including a measure of variability we added Table 1.

      Fig3E: make bigger / more readable (too many colors: significance bars could be black)

      We have increased the size and made the significance bars black.

      • Figure 4: having to go to the next Fig (Fig5) to understand the time windows is inconvenient and difficult to follow. Please, find a work around or combine the two figures. From which ROI are the times series extracted from?

      We have combined the two figures to facilitate comparison, and have added a brief explanation of the ROI to the figure legend.

      Reviewer #3 (Public Review):

      This manuscript presents a neurophysiological investigation of the hierarchical nature of prediction in natural speech comprehension. The authors record MEG data to speech from an audiobook. And they model that MEG using a number of different speech representations in order to explore how context affects the encoding of that speech. In particular, they are interested in testing how the response to phoneme is affected by context at three different levels: sublexical how the probability of an upcoming phoneme is constrained by previous phonemes; word - how the probability of an upcoming phoneme is affected by its being part of an individual word; sentence - how the probability of an upcoming phoneme is affected by the longer-range context of the speech content. Moreover, the authors are interested in exploring how effects at these different levels might contribute - independently - to explaining the MEG data. In doing so, they argue for parallel contributions to predictive processing from both long-range context and more local context. The authors discuss how this has important implications for how we understand the computational principles underlying natural speech perception, and how it can potentially explain a number of interesting phenomena from the literature.

      Overall, I thought this was a very well written and very interesting manuscript. I thought the authors did a really superb job, in general, of describing their questions against the previous literature, and of discussing their results in the context of that literature. I also thought, in general, that the methods and results were well explained. I have a few comments and queries for the authors too, however, most of which are relatively minor.

      Main comments: 1) One concerns I had was about the fact that context effects are estimated using 5-gram, models. I appreciate the computational cost involved in modeling more context. But, at the same time, I worry a little that examining the previous 4 phonemes or (especially) words is simply not enough to capture longer-term dependencies that surely exist. The reason I am concerned about this is that the sentence level context you are incorporating here is surely suboptimal. As such, could it be the case that the more local models are performing as well as they are simply because the sentence level context has not been modeled as well as it should be? I appreciate the temporal and spatial patterns appear to differ for the sentence level relative to the other two, so that is good support for the idea that they are genuinely capturing different things. However, I think some discussion of the potential shortcomings of only including 4 tokens of context is worth adding. Particularly when you make strong claims like that on lines 252.

      We strongly agree with the reviewer that the 5-gram model is not the ultimate model of human context representations. We have added a section to acknowledge this (Limitations of the sentence context model).

      While we see much potential for future work to investigate context processing by using more advanced language models, a preliminary investigation suggests that it might not be trivial. We compared the ability of a pre-trained LSTM (Gulordava et al., 2018) to predict the brain response to words in our dataset with that of the 5-gram model. The LSTM performed substantially worse than the 5-gram model. An important difference between the two models is that our 5-gram model was trained on the Corpus of Contemporary American English (COCA), whereas the LSTM was trained on Wikipedia. COCA provides a large and highly realistic sample of English, whereas the language in Wikipedia might be a more idiosyncratic subsample. Thus, the LSTM might be worse just because it has been trained on a less representative sample of English. As an initial step we thus ought to train the LSTM on the superior COCA database, but this simple step alone would already be associated with a substantial computational cost, given the size of COCA at more than a billion words (we estimated 3 weeks on 32 GPUs in a computing cluster). Furthermore, while we acknowledge the limitations of the 5-gram model, we consider it very unlikely that its limitations are the reason that the more local models are performing well. In general, as more context is considered, the model’s predictions should become more different from the local model, i.e., a more sophisticated model should be less correlated with the local models, and should thus allow the local models to perform even better.

      2) I found myself confused about what exactly was being modeled on my first reading of pages 4 through 7. I realized then that all of the models are based on estimating a probability distribution based on phonemes (stated on line 167). I think why I found it so confusing was that the previous section talked about using word forms and phonemes as units of representation (lines 118-119; Fig 2A), and I failed to grasp that, in fact, you were not going to be modeling surprisal or entropy at the word level, but always at the phoneme level (just with different context). Anyway, I just thought I would flag that as other readers might also find themselves thinking in one direction as they read pages 4 and 5, only to find themselves confused further down.

      Thank you for pointing out this ambiguity; we now make it explicit that “all our predictors reflect information-theoretic quantities at the rate of phonemes” early on in the Expressing the use of context through information theory section.

      3) I also thought some the formal explanations of surprisal and entropy on lines 610-617 would be valuable if added to the first paragraph on page 6, which, at the moment, is really quite abstract and not as digestible as it could be, particularly for entropy.

      We appreciate that this needs to be much clearer for readers with different backgrounds. As suggested, we have added the formal definition to the Introduction, and we now also point readers explicitly to the Methods subsection that explains these definitions in more detail.

      4) I like the analysis examining the possibility of tradeoffs between context models. I wonder might such tradeoffs exist as conversational environments vary - if the complexity of the speech varies and/or listening conditions vary might there be more reliance on local vs global context then. If that seems plausible, then it might be worth adding a caveat that you found no evidence for any tradeoff, but that your experiment was pretty homogenous in terms of speech content.

      Thank you for this suggestion. We added this idea to the Discussion in the Implications for speech processing section.

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Carrasquilla and colleagues applied Mendelian Randomization (MR) techniques to study causal relationship of physical activity and obesity. Their results support the causal effects of physical activity on obesity, and bi-directional causal effects of sedentary time and obesity. One strength of this work is the use of CAUSE, a recently developed MR method that is robust to common violations of MR assumptions. The conclusion reached could potentially have a large impact on an important public health problem.

      Major comments:

      (1) While the effect of physical activity on obesity is in line with earlier studies, the finding that BMI has a causal effect on sedendary time is somewhat unexpected. In particular, the authors found this effect only with CAUSE, but the evidence from other MR methods do not reach statistical significance cutoff. The strength of CAUSE is more about the control of false positive, instead of high power. In general, the power of CAUSE is lower than the simple IVW method. This is also the case in this setting, of high power of exposure (BMI) but lower power of outcome (sedentary time) - see Fig. 2B of the CAUSE paper.

      It does not necessarily mean that the results are wrong. It's possible for example, by better modeling pleiotropic effects, CAUSE better captures the causal effects and have higher power. Nevertheless, it would be helpful to better understand why CAUSE gives high statistical significance while others not. Two suggestions here:

      (a) It is useful to visualize the MR analysis with scatter plot of the effect sizes of variants on the exposure (BMI) and outcome (sedentary time). In the plot, the variants can be colored by their contribution to the CAUSE statistics, see Fig. 4 of the CAUSE paper. This plot would help show, for example, whether there are outlier variants; or whether the results are largely driven by just a small number of variants.

      We agree and have now added a scatter plot of the expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Figure 2-figure supplement 1 panel A). We identified one clear outlier variant (red circle) that we thus decided to remove before re-running the CAUSE analysis (panel B). We found that the causal effect of BMI on sedentary time remained of similar magnitude before and after the removal of this outlier variant (beta=0.13, P=6x10-4 and beta=0.13, P=3x10-5, respectively) (Supplementary File 1 and 2).

      We have added a paragraph in the Results section to describe these new findings:

      Lines 204-210: “We checked for outlier variants by producing a scatter plot of expected log pointwise posterior density (ELPD) contributions of the variants to BMI and sedentary time (Supplementary File 1), identifying one clear outlier variant (rs6567160 in MC4R gene) (Figure 2, Appendix 1—figure 2). However, the causal effect of BMI on sedentary time remained consistent even after removing this outlier variant from the CAUSE analysis (Supplementary File 1 and 2).”

      (b) CAUSE is susceptible to false positives when the value of q, a measure of the proportion of shared variants, is high. The authors stated that q is about 0.2, which is pretty small. However, it is unclear if this is q under the causal model or the sharing model. If q is small under the sharing model, the result would be quite convincing. This needs to be clarified.

      We thank the reviewer for a very relevant question. We have now clarified in the manuscript that all of the reported q values (~0.2) were under the causal model (lines 202-203). We applied the strict parameters for the priors in CAUSE in all of our analyses, which leads to high shared model q values (q=0.7-0.9). To examine whether our bidirectional causal findings for BMI and sedentary time may represent false positive results, we performed a further analysis to identify and exclude outlier variants, as described in our response to Question 7. I.e. we produced a scatter plot of expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Supplementary Figure 2 panel A, shown above). We identified one clear outlier variant (red circle) that we thus removed (panel B), but the magnitude of the causal estimates was not affected by the exclusion of the variant (Supplementary File 1 and 2).

      (2) Given the concern above, it may be helpful to strengthen the results using additional strategy. Note that the biggest worry with BMI-sedentary time relation is that the two traits are both affected by an unobserved heritable factor. This hidden factor likely affects some behavior component, so most likely act through the brain. On the other hand, BMI may involve multiple tissue types, e.g. adipose. So the idea is: suppose we can partition BMI variants into different tissues, those acted via brain or via adipose, say; then we can test MR using only BMI variants in a certain tissue. If there is a causal effect of BMI on sedentary time, we expect to see similar results from MR with different tissues. If the two are affected by the hidden factor, then the MR analysis using BMI variants acted in adipose would not show significant results.

      While I think this strategy is feasible conceptually, I realize that it may be difficult to implement. BMI heritability were found to be primarily enriched in brain regulatory elements [PMID:29632380], so even if there are other tissue components, their contribution may be small. One paper does report that BMI is enriched in CD19 cells [PMID: 28892062], though. A second challenge is to figure out the tissue of origin of GWAS variants. This probably require fine-mapping analysis to pinpoint causal variants, and overlap with tissue-specific enhancer maps, not a small task. So I'd strongly encourage the authors to pursue some analysis along this line, but it would be understandable if the results of this analysis are negative.

      We thank the reviewer for a very interesting point to address. We cannot exclude the possibility of an unobserved heritable factor acting through the brain, and tissue-specific MR analyses would be one possible way to investigate this possibility. However, we agree with the reviewer that partitioning BMI variants into different tissues is not currently feasible as the causal tissues and cell types of the GWAS variants are not known. Nevertheless, we have now implemented a new analysis where we tried to stratify genetic variants into “brain-enriched” and “adipose tissue-enriched” groups, using a simple method based on the genetic variants’ effect sizes on BMI and body fat percentage.

      Our rationale for stratifying variants by comparing their effect sizes on BMI and body fat percentage is the following:

      BMI is calculated based on body weight and height (kg/m2) and it thus does not distinguish between body fat mass and body lean mass. Body fat percentage is calculated by dividing body fat mass by body weight (fat mass / weight * 100%) and it thus distinguishes body fat mass from body lean mass. Thus, higher BMI may reflect both increased fat mass and increased lean mass, whereas higher body fat percentage reflects that fat mass has increased more than lean mass.

      In case a genetic variant influences BMI through the CNS control of energy balance, its effect on body fat mass and body lean mass would be expected to follow the usual correlation between the traits in the population, where higher fat mass is strongly correlated with higher lean mass. In such a scenario, the variant would show a larger standardized effect size on BMI than on body fat percentage. In case a genetic variant more specifically affects adipose tissue, the variant would be expected to have a more specific effect on fat mass and less effect on lean mass. In such scenario, the variant would show a larger standardized effect size on body fat percentage than on BMI.

      We therefore stratified BMI variants into brain-specific and adipose tissue-specific variants by comparing their standardized effect sizes on BMI body body fat percentage. Of the 12,790 variants included in the BMI-sedentary time CAUSE analysis, 12,266 had stronger effects on BMI than on body fat percentage and were thus classified as “brain-specific”. The remaining 524 variants had stronger effects on body fat percentage than on BMI (“adipose tissue-specific”). To assess whether the stratification of the variants led to biologically meaningful groups, we performed DEPICT tissue-enrichment analyses. The analyses showed that the genes expressed near the “brain-specific” variants were enriched in the CNS (figure below, panel A), whereas the genes expressed near the “adipose tissue-specific” variants did not reach significant enrichment at any tissue, but the showed strongest evidence of being linked to adipocytes and adipose tissue (figure below, panel B).

      Figure legend: DEPICT cell, tissue and system enrichment bar plots for BMI-sedentary time analysis.

      Having established that the two groups of genetic variants likely represent tissue-specific groups, we re-estimated the causal relationship between BMI and sedentary time using CAUSE, separately for the two groups of variants. We found that the 12,266 “brain-specific” genetic variants showed a significant causal effect on sedentary time (P=0.003), but the effect was attenuated compared to the CAUSE analysis where all 12,790 variants (i.e. also including the 524 “adipose tissue-specific” variants) were included in the analysis (P=6.3.x10-4). The statistical power was much more limited for the “adipose tissue-specific” variants, and we did not find a statistically significant causal relationship between BMI and sedentary time using the 524 “adipose tissue-specific” variants only (P=0.19). However, the direction of the effect suggested the possibility of a causal effect in case a stronger genetic instrument was available. Taken together, our analyses suggest that both brain-enriched and adipose tissue-enriched genetic variants are likely to show a causal relationship between BMI and sedentary time, which would suggest that the causal relationship between BMI and sedentary time is unlikely to be driven by an unobserved heritable factor.

      Minor comments

      The term "causally associated" are confusing, e.g. in l32. If it's causal, then use the term "causal".

      We have now changed the term “causally associated” to “causal” throughout the manuscript.

      Reviewer #3 (Public Review):

      Given previous reports of an observational relationship between physical inactivity and obesity, Carrasquilla and colleagues aimed to investigate the causal relationship between these traits and establish the direction of effect using Mendelian Randomization. In doing so, the authors report strong evidence of a bidirectional causal relationship between sedentary time and BMI, where genetic liability for longer sedentary time increases BMI, and genetic liability for higher BMI causally increases sedentary time. The authors also give evidence of higher moderate and vigorous physical activity causally reducing BMI. However they do note that in the reverse direction there was evidence of horizontal pleiotropy where higher BMI causally influences lower levels of physical activity through alternative pathways.

      The authors have used a number of methods to investigate and address potential limiting factors of the study. A major strength of the study is the use of the CAUSE method. This allowed the authors to investigate all exposures of interest, in spite of a low number of suitable genetic instruments (associated SNPs with P-value < 5E-08) being available, which may not have been possible with the use of the more conventional MR methods alone. The authors were also able to overcome sample overlap with this method, and hence obtain strong causal estimates for the study. The authors have compared causal estimates obtained from other MR methods including IVW, MR Egger, the weighted median and weighted mode methods. In doing so, they were able to demonstrate consistent directions of effects for most causal estimates when comparing with those obtained from the CAUSE method. This helps to increase confidence in the results obtained and supports the conclusions made. This study is limited in the fact that the findings are not generalizable across different age-groups or populations - although the authors do state that similar results have been found in childhood studies. As the authors also make reference to, due to the nature of the BMI genetic instruments used, the findings of this study can only inform on the lifetime impact of higher BMI, and not the effect of a short-term intervention.

      The findings of this study will be of interest to those in the field of public health, and support current guidelines for the management of obesity.

      We thank the Reviewer for the valuable feedback and insights. We agree that the lack of generalizability of the findings across age groups and populations is an important limitation. We have now mentioned this in lines 341-342 of the manuscript:

      “The present study is also limited in the fact that the findings are not generalizable across different age-groups or populations.”

    1. Author Response

      Reviewer #2 (Public Review):

      This paper combines neuroimaging, behavioral experiments, and computational modeling to argue that (a) there is a network of brain areas that represent physical stability, (b) these areas do so in a way that generalizes across many kinds of instability (e.g., not only a tower of blocks about to fall over, but also a person about to fall off a ladder), and (c) that this supports a simulation account of physical reasoning, rather than one based on feedforward processing; this last claim arises through a comparison of humans to CNNs, which do an OK job classifying physical instability but not in a way that transfers across these different stability classes. In my opinion, this is a lovely contribution to the literatures on both intuitive physical reasoning and (un)humanlike machine vision. At the same time, I wasn't sure that the broader conclusions followed from the data in the way the authors preferred, and I also had some concerns about some of the methodological choices made here.

      1. The following framing puzzled me a bit, and even seemed to raise an unaddressed confound in the paper: "Here we investigate how the brain makes the most basic prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future".

      Consider the following minor worry, which sets up a more major one: This framing, which connects 'stability' to 'change' and which continues throughout the paper, seems to equivocate on the notion of 'stability'. One meaning of 'stable' is, roughly, 'unchanging'. Another meaning is 'unlikely to fall over'. The above quotation, along with others like it, makes it seem like the authors are investigating the former, since that's the only meaning that makes this quotation make sense. But in fact the experiments are about the latter -- towers falling down, people falling off ladders, etc. But these aren't the same thing! So there's a bit of wordplay happening here, it seemed to me.

      This sets up the more serious worry. As this framing reveals, unstable scenes (in the likely-to-fall-over sense) are, by their nature, scenes where something is likely to change. In that case, how do we know that the brain areas this project has identified aren't representing 'likeliness to change', rather than physical stability? There are, of course, many objects and scenes that might be highly likely to change without being at all physically unstable. Even the first example in the paper ("a dog about to give chase") is about likely changes without any physical instability. But isn't this a confound? All of the examples of physical instability explored here also involve likeliness to change! So these could be 'likely to change' brain areas, not 'physically unstable' brain areas. Right? Or if not, what am I missing?

      The caption of Figure 1 seems to get at this a bit, but in a way I admit I just found a bit confusing. If authors do after all intend "physically unstable" to mean "likely to change", then many classes of scenarios that are unexplored here seem like they would be relevant: a line of sprinters about to dash off in a race, someone about to turn off all the lights in a home, a spectacular chemical reaction about to start, etc. But the authors don't intend those scenarios to fall under the current project, right?

      The reviewer is correct that "stability" has (at least) these two different meanings, and also correct that we are investigating here the situation in which a configuration is not changing now but would be likely to change with just the slightest perturbation. Our hypothesis is that the “Physics Network” will be sensitive to the likelihood that a physical configuration will change for physical (not social) reasons. That is what our data show: we do not find the same univariate and multivariate effects for situations that are likely to change because of the behavior of an animal. This indicates that what we are decoding is not general ‘likeliness to change’ but rather physical instability in particular.

      (Also: Is stability really 'the most basic prediction' we make about the world? Who is to say that stable vs. unstable is a more basic judgment than, say, present vs. absent, or expected vs. unexpected, or safe vs. unsafe, etc? I know this is mostly just trying to get the reader excited about the results, but I stumbled there.)

      We have now modified the sentence to say: “…how the brain makes a fundamental prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future.”

      1. Laying out these issues in terms of feedforward processing vs. simulation felt a bit misleading and/or unfair to those views, given the substance of what this paper is actually doing. In particular, the feedforward view ends up getting assimilated to "what CNNs do"; but these are completely different hypotheses (or at least can be). Note, for example, that many vision researchers who don't think CNNs are good models of human vision nevertheless do think that lots of what human vision does is feedforward; that view could only be coherent if there are kinds of feedforward processing that are un-CNN-like. It would be better not to conflate these two and just say that the pattern of results rules out CNN-like feedforward processing without ruling out feedforward processing in general.

      This is a fair point, and we certainly agree that we cannot rule out all feedforward models. We have tried to be clear about this claim, e.g., here (in the Discussion: “Three lines of evidence from the present study indicate that pattern recognition alone – as instantiated in feedforward CNNs and the ventral visual pathway – is unlikely to explain physical inference in humans, at least for the case of physical stability."

      3a. I wasn't sure how impressed to be by the fact that, say, 60% classification accuracy one class of stable/unstable scenes doesn't lead to above-chance performance on another class of stable/unstable scenes. Put differently, it seems that the CNNs simply didn't do a great job classifying physical stability in the first place; in that case, how general should we expect their representations to be anyway? Now, on one hand, I could see this worry only further supporting the authors' case, since you could think of this as all the more evidence that CNNs won't have representations of stability in them. But since (a) the claims the authors are making are about feedforward processing in principle, not just in one or two CNNs, and (b) the purpose of this paper is to explore the issue of generality per se, rather than just stability, this seems inadequate. It could be that a CNN that does achieve high accuracy on physical stability judgments (90%?) would actually show this kind of general transfer; but we don't know that from the data presented here, because it's possible that the lack of generality arises from poor performance to begin with.

      You are correct in noting that CNNs don’t do a great job in classifying physical stability, which reinforces our point that pattern recognition systems are not very good at discerning physical stability. In fact, the classification accuracy that we have reported is close to the baseline performance in literature (Lerer et al 2016). Interestingly, training on the block tower dataset itself could only bring up the stability classification accuracy to 68.8% on the real-world block tower images. While this is true of the current best model of stability detection, we think that CNNs trained on large-scale datasets of stability under varying scenarios may in future be able to potentially generalize to other natural scenarios. However, to our knowledge no such datasets exist.

      3b. I wasn't sure how to think about whether showing CNNs stable and unstable scenes is a fair test of their ability to represent physical stability. Do we know that stability is all that these images have in common? Maybe the CNN is doing a great job learning some other representation. This sort of thing comes up in some recent discussions of 'shortcuts' and/or the 'fairness' of comparisons between human and machine vision, including some recent theoretical papers (see author recommendations for specific suggestions here).

      If our point were that CNNs do a great job at representing physical stability, we would indeed have to worry about low-level image confounds or “shortcuts” enabling this performance. But our point is that they do badly. If some of their already bad performance is due to image confounds/shortcuts then they are in fact doing even worse, and that only makes our point stronger.

      4a. I didn't really follow this passage, which is relied on to interpret greater activity for unstable vs stable scenes: "we reasoned that if the candidate physics regions are engaged automatically in simulating what will happen next, they should show a higher mean response when viewing physically unstable scenes (because there is more to simulate) than stable scenes (where nothing is predicted to happen)." It seems true enough that, once one knows that a scene is stable, one doesn't then need a dynamically updated representation of its unfolding. But the question that this paper is about is how we determine, in the first place, that a scene is stable or not. The simulations at issue are simulations one runs before one knows their outcome, and so it wasn't clear at all to me that there is always more to simulate in an unstable scene. Stable scenes may well have a lot to simulate, even if we determine after those hefty simulations that the scene is stable after all. And of course unstable scenes might well have very little to simulate, if the scene is simple and the instability is straightforwardly evident. Can the authors say more about why it's easier to determine that a stable scene is stable than that an unstable scene is unstable? They may have a good answer! It would just be better to see it in the paper.

      The idea here is that forward simulation happens in all cases but stops if no change has occurred since the last frame. That stopping, both represents the stability of the configuration and produces less activity. This idea is akin to the “sleep state” used for nonmoving objects in a physics engine: they do not need to be re-simulated or re-rendered if they have not moved since the last frame (Ullman et al, 2017 TICS).

      4b. I was confused a bit by the Animals-People condition, and whether to think of it as a control condition or not. The image of it in Figure 1a makes it seem like it is meant to be interpreted along the usual "physical stability" lines, just like falling towers and people on ladders, and the caption seems to say this too; it also makes intuitive sense since the man in the boat looks like he'll fall if and when the alligator attacks. But then in the main text the authors predict that the representations of stability would not extend to the Animals-People condition, because they are just supposed to be about peril but not stability. Why not? And then the results themselves are equivocal, with some findings generalizing to Animals-People and some not. I don't have much more to say here other than that I found this hard to follow.

      We used the Animals-People as a control for peril/instability that is not caused by the physical situation (but rather by another agent). Our hypothesis was that the “Physics Network” would hold information about physical stability, not just any kind of propensity for change for any reason. Hence, we predicted, that any brain region responding (only) to physical stability should not respond in a similar way to peril/non-peril conditions in the Animals-People scenario as they involve a more biological-agent driven interaction. That is what we found.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Pardo et al. describes the identification and characterization of a novel subpopulation of delta cells that normally resides in the zebrafish pancreatic islet. Using two models of beta cell ablation, the authors demonstrate that this delta cell subpopulation efficiently converts into an insulin/somatostatin co-expressing cell population to restore euglycemia. The study includes robust transcriptome data to determine that this delta cell subpopulation is characterized by the expression of sst1.1 (rather than sst2) and expresses many beta cell genes. Furthermore, the resulting insulin/sst1.1 co-expressing cells represent a long-lived population that are sufficiently functional to restore euglycemia. The study goes on to suggest that inhibition of the p53 pathway compromises formation of the bihormonal population; however this data is not as convincing. Overall, this is a novel study that suggests the existence of heterogeneous delta cell populations in the zebrafish islets and supports previous findings related to adult islet cell plasticity.

      Strengths:

      1. Although several studies have identified heterogeneous populations of islet alpha and beta cells, this is one of the first studies to demonstrate two apparently distinct delta cell populations; the study provides sufficient characterization that it should be easy to test whether a similar population exists in mammals.
      2. Demonstration that the induction of Ins/SST biohormonal cells is triggered in two independent models of beta cell ablation
      3. The use of several different transgenic fish lines to characterize the relative numbers of different islet cell populations in control and ablation conditions.
      4. High quality data, including immunofluorescence images, RNA-seq data and validation studies with appropriate controls.
      5. The extensive use of comparative transcriptome data to validate islet lineage relationships.

      Weaknesses:

      1. Although the data suggests that the newly formed bihormonal cells have sufficient function to rescue the hyperglycemic phenotype, there are no experiments to directly test the functionality of these cells.

      To address the question of the functionality of bihormonal cells, we opted for a complementary approach, ie a glucose tolerance test in adults. We showed that bihormonal cells represent the vast majority (95-99%) of all Insulin-expressing cells throughout the pancreas (see Figure 1 and new Figure 4), thereby minimizing the possibility that a putative population of pure beta cells (SST negative) would significantly contribute to regulate glycemia. The glucose tolerance test reveals that the regulation of blood glucose in regenerated fish after glucose injection is identical to CTL fish (new Figure 4). These observations strongly support that bihormonal cells are the main sources of Insulin in regenerated fish and that they are responsible for blood glucose homeostasis in the near absence of beta cells.

      1. Many of the genes cited as "beta cell specific" are also expressed in delta cells in mouse and human islets - although this could relate to species differences, it causes some confusion and could affect the ultimate interpretation.

      We include now a table Figure3-figure supplement 2.

      1. Although it is clear from the images that are presented in the manuscript that a large number of bihormonal cells arise upon beta cell ablation, the relative numbers of bihormonal cells to monohormonal Sst and insulin cells is not clearly indicated. In some cases, it appears that a large percentage convert, while in others there are only a fraction. One can extrapolate this information from the presented data (ie figure D and E), but it would have been more informative if the direct analysis was provided.

      To be more explicit, we compare now the relative number of bihormonal cells compared to sst1.1 delta-cells in the revised paper (Discussion lines 433-435). Also, for a better representation of the size of each cell population, we present now the absolute number of cells instead of % of islet cells shown in the first version in Figure 1 and new Figure 6. These quantifications reveal that the number of BH cells that are formed after ablation exceeds the number of monohormonal Sst1.1 cells. This indicates a more complex mechanism than simply direct conversion of sst1.1 cells to bihormonal cells, including neogenesis from ducts and proliferation, that we directly address now in the revised manuscript (Figure 4 and Figure 6). See also explanation in our response to Reviewer 3.

      1. The authors only refer to the fact that Pdx1 is known to be expressed in beta and delta cells in a small paragraph in the discussion; it would have been helpful if this information were introduced in the introduction and in the relevant experimental sections.

      We think that presenting Pdx1 in the Introduction section would anticipate too much on the results, so we chose to refer to Pdx1 in the Results and Discussion sections.

      1. The authors make the strong conclusion that sst1 cells directly convert into bihormonal cells based on time lapse imaging. Genetic lineage tracing would be needed to absolutely make this conclusion. The time lapse imaging can only suggest that direct conversion might be occurring.

      See our response to point 1 of Essential revisions and explanation and experimental exploration of alternative mechanisms.

      1. The inhibition of p53 appears to only cause a relatively small decrease in the number of bihormonal cells (from ~20 to ~15), somewhat undermining the conclusion that p53 promotes the formation of this cell population.

      To augment the data on p53, we present now validations of the activation in the islet of the p53 pathway by in situ hybridization with ccng1 and mdm2 (shown now in Figure 6G), two established p53 target genes that were identified in our transcriptomes. We also explore the cell cycle signatures.

      We decided to remove the experiment with pifithrin alpha. Indeed, using different timely treatments with the p53 inhibitor pifithrin alpha, we obtained two opposite responses: one that confirms the results shown in the first version of the paper (a decrease of bihormonal cells that is moreover paralleled by an increase of sst1.1:GFP cells), the other showing an increase. We think that p53 acts at different levels, possibly in monohormonal sst1.1 delta cells and in bihormonal cells and the understanding of these observations would be the focus of another project.

      Reviewer #2 (Public Review):

      This is an interesting and potentially exciting manuscript that reports, based on a series of zebrafish reporter lines, that there exists a subset of delta cells that can rapidly assume partial beta cell-like identity following beta cell ablation. This conversion correlates with the restoration of (near) normal glucose levels within 3 weeks. The major strengths are a series of technically well executed experiments that report an interesting observation of two discernable populations of delta cells. These populations are supported by transcriptome data, which validate the differences between these populations established using FISH or immunofluorescence. Major weaknesses are the lack of lineage tracing of delta cells and questions on the mechanisms underlying the origins of the bihormonal cells reported in this paper. The observation of the rapid appearance of bihormonal cells is potentially exciting and important. However, directionality of the conversion is insufficiently established. The conversation of delta to beta cells needs to be supported by direct lineage tracing. The alternative explanation that these cells are surviving beta cells that turn on somatostatin expression cannot be ruled out on the basis of the current experiments. The authors tend to extrapolate too much from their transcriptome data and subsequent pathway analyses to make claims that would be better supported by additional experiments, or toned down. The authors are right to point out the major differences in zebrafish beta cell regenerative potential and plasticity compared to mammalian models, but this diminishes the credibility of the claims of translational potential. There is value in conducting careful experiments into islet cell plasticity in a zebrafish without having to make a promise of direct translational relevance.

      All these points have been addressed.

      This paper suggests the presence of two sets of delta cells, marked by Sst1.1 and Sst1.2. The Sst1.1 cells are marked by GFP in a Sst1.1:GFP transgenic reporter. This reporter clearly is not selective for Sst1.1 cells only, as a majority of delta cells expresses GFP at dimmer levels and is Sst2 positive. This is in good agreement with the lower - but not absent - Sst1.2 and Sst2 mRNA profiles in Figure 4, but complicates the claim that it is specifically Sst1.1 delta cells that convert into bihormonal cells. An overlay between Sst1:1 and Sst1:2 or Sst2 mRNA to demonstrate that it is specifically the Sst1:1 expressing delta cells that become INS positive (Figure 1B) would help. Formal lineage tracing of the Sst1:1 delta cells is the accepted way to solidify support for this claim, but such data are absent from this paper.

      Unfortunately, we did not succeed in performing genetic lineage tracing of the sst1.1 delta-cells.

      However, we now explored alternative cellular origins of bihormonal cells such as the ducts and proliferation (new Figure 4 and Figure 6). We toned down our previous conclusion that ruled out beta cells as an origin of bihormonal cells (Figure 2).

      To follow the suggestion of Reviewer 3, we provide now the comparative expression by double fluorescent ISH of sst1.1 and sst2 mRNA with the insulin mRNA (performed in larvae, see new Figure 2C). The overlays show that insulin is coexpressed with sst1.1 specifically, but not with sst2. This demonstrates that bihormonal cells express selectively the sst1.1 somatostatin gene and provides support, though still does not demonstrate, to the hypothesis that it is specifically the Sst1:1 expressing delta cells that become INS positive.

      The model is presented as a 'beta cell ablation' model, but there are some concerns with the flow of islet cells between islet cell populations immediately following ablation and during recovery that require clarification. The beta cell population size measures between 25-35% of islet cells (Figure 1D/Figure 1Suppl2). If these cells are all ablated acutely, this should immediately lead to significant increase in the remaining non-beta cell populations, including Sst1:1 delta cells. However, this is not observed as Sst1:1 GFP+ cells are steady as a fraction of total islet cell number (Figure 1F). Instead, the population that is increased at 3 days following ablation is the mCherry-GFP double positive cell population, which accounts for approximately half of the loss of beta cells. The scenario that a portion of beta cells is not actually ablated but is instead converted into a bi-hormonal state is insufficiently explored as detailed below. If the rapid appearance of these cells were indeed attributable to the conversion of GFP cells into co-positive cells, this should have been reflected in the data of Figure 1F. However, the GFP population appears to be neither increasing to reflect the loss of beta cells, or decreasing in response to the co-expression of mCherry. In Figure 5, a drop in GFPhigh cells specifically is shown, but this reflects only a potential 5% shift of islet cell numbers from GFPhigh to potentially bihormonal cells. The live imaging data in Figure 5B are not helping as there is simply not enough spatial and axial resolution to place the mCherry signal in GFP+ cells. If both processes are balancing each other out to maintain steady numbers of GFP+ delta cells, this implies rapid proliferation of GFP positive delta cells to replenish the delta cells that become bihormonal, or the rapid proliferation of bihormonal cells shortly after they arise. Either of these scenarios should be readily demonstratable.

      This ablation model has been shown to lead to a massive destruction of beta cells through apoptosis (Curado et al, 2007) (Bergemann et al, 2018). In line with the loss of beta cells, the total number of cells (new Figure 1G), shows a downward shift after ablation. We also quantified islet cells in situ on paraffin section in Figure 6-figure supplement 1. Due to the difficulty to detect INS+ or mCherry after ablation (very low expression in bihormonal cells), we used Pdx1 as a proxy for beta and sst1.1 delta and bihormonal cells. The decrease of Pdx1+ nuclei we observed is consistent with the extent of the loss of β-cells.

      Together with the fact that we do not detect a lot of spared beta cells after ablation by lineage tracing, all these observations support that we have an efficient model of ablation. Despite this efficient ablation, we nevertheless observed some bihormonal cells derived from pre-existing beta cells (Figure 2G and close-up in E’) and now openly discuss this possible cellular source.

      We realized that our initial representation in terms of percentages “% of cells / islet” was misleading. For a more accurate representation of population size, we now present in this revised manuscript the absolute number of cells (instead of %) detected for each population (per fish), as this reflects the real size of the populations present in the dissected tissue, which contain all cells of the main islet, and make easier the comparisons between conditions and cell types.

      As pointed out by the Reviewer, the respective size of GFP monohormonal, bihormonal and beta cell populations indicate that the flow between islet cells (and potentially with non-islet cell types) is too complex to infer directionality of conversion. While ~3300 beta cells are lost and ~1500 bihormonal cells are gained, there are only ~900 monohormonal sst1.1 delta cells before ablation (GFPhigh), which is inferior to the number of BH cells formed after ablation. This suggest multiple origins of bihormonal cells, and/or proliferation. In the revised manuscript, we consider the following scenarios: i) the contribution of non-ablated beta cells to bihormonal cells (Figure 2), ii) neogenesis from ducts (Figure 4) and iii) expansion of GFP and/or bihormonal cells by proliferation (Figure 6). We discuss these results lines 433-447). These mechanisms are not mutually exclusive and are compatible with a “direct” conversion sst1.1 delta cells.

      The presumption is that new beta cells are formed, and this is based in part on lineage tracing data using the zsYellow label in conjunction with an inducible beta cell specific Cre driver strain. It is not clear why this experiment was done in developing embryos instead of during the adult stage where the original observation of the appearance of bihormonal cells that is associated with normalization of glucose levels was made. It appears that in that crucial lineage tracing experiments, the authors are ambiguous about the use of mCherry to detect beta cells after ablation. They describe beta cells as mCherry+ beta cells in the text, while they indicate in the legend and figure labels to have used INS antibody staining to detect these cells. The punctate staining that is different from the mCherry staining elsewhere in the manuscript certainly is compatible with the use of an INS antibody, but raises the question why mCherry was not used to detect beta cells which is what was used throughout the rest of the paper. This is relevant as the lack of zsYellow positivity is interpreted as a sign of beta cell neogenesis. However, these cells might have lost zsYellow precisely because they were killed and have lost their fluorescence lineage markers, including mCherry, but are still detectable by INS immunofluorescence as they have not been cleared from the islet tissue.

      The genetic tracing of beta cells was performed in larvae. The experimental details are now shown in Figure 2D. CRE recombination by 4-OHT was induced at 6 dpf before ablation at 7 dpf and the larvae were analysed at 14 dpf. We opted for larval stages since bihormonal cells appear at any stage and young small animals are more amenable to fast and efficient inducible CRE recombination (Hans et al, Plos One, 2009; Mosimann et al, Development, 2011).

      We thank the reviewer for highlighting the discrepancy about the INS/mCherry antibodies. It is indeed an anti-Insulin detection with typical punctate staining that is shown Figure 2E and quantified in Figure 2F-H, and not anti-Cherry, because of species incompatibility between antibodies in the immunodetection assay (both Cherry and zsYellow antibodies are from rabbit while INS is made in guinea pig). We have rectified in the Figure and in the corresponding text and legend.

      We think that, were the INS protein to persist in the ablated islet, its presence specifically in sst1:GFP+ cells is consistent with our transcriptomic data and with true expression of the insulin gene in bihormonal cells rather than with persistence of killed beta cells.

      However, we agree that the absence of zsYellow lineage marker as a sign of neogenesis was overinterpreted. Indeed, we clearly detect some (5.8 cells, 12% of all INS+) INS+ zsYellow+ cells (Figure 2E and E’) confirming the persistence of some traced beta cells. In fact, 4 of the 5.8 cells are sst1.1GFP+, indicating that preexisting beta cells become bihormonal. For this reason, we do not rule out anymore the beta cell origin of bihormonal cells.

      Although it is possible that the number of spared beta cells (and beta-derived bihormonal cells) is underestimated as some beta cells could have escaped excision of the Lox cassette before ablation (therefore, surviving beta cells would be zsYellow negative), we would like to stress that the ablation efficiency is very good and does not favour (but yet does not exclude) a huge contribution of beta cells to bihormonal cells.

      In the revised paper, we tone down our conclusions and consider alternative origins and mechanisms of bihormonal cells.

      The enrichment of Sst1.1 mRNA in biohormonal cells is an important piece of data that should be included instead of 'not shown'. The same is true for the statements that ROS, lack of insulin signaling and hyperglycemia all do not drive INS expression in Sst1.1 cells, which amplifies concerns that the appearance of bihormonal cells is contingent on the administration of beta cell toxins.

      We include now our “data not shown”. See new Figure 1-figure supplement 1 and Figure 6-figure supplement 2.

      To relate the interesting observation on biohormonal beta cells in zebrafish to human pancreas biology, the authors point at single cell sequencing data and then claim that 'the occurrence of SST+ and INS+ beta cells in mammals remains largely undocumented'. It strikes me that there must be dozens of papers that show high quality insulin and somatostatin co-labeling in human, primate and rodent pancreas with no evidence of clear colocalization (unless following severe beta cell ablation, see Chera et al., 2014). That actually is clear documentation of their absence.

      We realize that this point was not clear. By referring to scRNAseq data, our goal was to suggest that some Ins+ Sst+ cells could be detected at the mRNA level while we admit that there is poor, if any, evidence of naturally occurring bihormonal cells at the protein level in mammals. This part was too speculative and we removed it.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report the discovery of a new bacterium, termed HS-3, that displays a novel form of multicellularity consisting of long filamentous structures tightly packed into a two-dimensional structure with characteristics reminiscent of liquid crystals. Motivated by the occasional immersion of the bacterial structures in water due to flooding in their cave environment, laboratory immersion is found to disrupt these structures, which can transform into clusters of coccobacillus daughter cells released by contact with water.

      As a discovery, this paper will certainly trigger great interest in this bacterium for these unusual properties. In particular biophysicists studying active matter will be fascinated by the liquid crystalline order and topological defects, which are reminiscent of those in motor/microtubule systems studied recently. The observations of filamentous forms reminds me of the work of Mendelson many years ago on a mutant of B. subtilis that fails to separate daughter colonies after division, leading to growing filaments. But those were not in a colonial form seen here.

      The paper is, however, rather descriptive, without much physical quantification of the biophysical properties. More importantly, the presentation does not make contact with much recent (and not-so-recent) work on the problem of understanding evolutionary driving forces toward multicellularity, particularly as seen in green algae and choanoflagellates.

      We introduced a series of works in the Introduction, Discussion, and Figure 1, in terms of arguments of how single cell organisms could self-organize and sustain the cells in a certain order in the evolutionary process towards multicellularity. Together with the consideration about environmental settings in the cave as an ‘Ecological scaffolding’ and the liquid crystal-like self-organization, the finding of HS-3 was properly contextualized as a new example of multicellularity. As seen in Mendelson’s pioneering work, as well as in recent works on the field of applied hydrodynamics in biology, bacteria have potential to self-organize their cells. However, as far as we know, there is no extant species that clearly shows a relation between liquid crystal phenomenon and the origin of multicellularity. We think the features of HS-3 that we report would serve as an attractive model of bacterial multicellularity useful for future studies including physical analysis and theoretical study.

      Reviewer #2 (Public Review):

      I thought this was a very cool example of bacterial multicellularity, with the description of a newly discovered bacterium that forms a sort of simply differentiated colony- a sheet of cells which then develops to contain a large bolus of small, coccoid cells, which then release into the water column upon submergence. I wasn't totally convinced that this release was developmental, as suggested by the authors- evidence that other colonies released cells at the same time could be due to multiple colonies sharing the same biophysical basis of colony formation that is disrupted by immersion in water (diffusion of extracellular polysaccharides, or even the pressure from being underwater). However, it's notoriously difficult to rigorously test evolutionary hypotheses, and I think that the microbiology here is compelling- it's a form of bacterial multicellularity that I have never seen before.

      My largest issue with the paper is that it does a very poor job of contextualizing how the research affects our understanding of the evolution of multicellularity more broadly. This paper suggests that little is known about the ecological factors selecting for simple multicellularity, but there has actually been quite a bit of work on this topic. This list is far from exhaustive, but prior work has examined a range of selective agents that can favor simple multicellularity- these include predation (Boraas 1998, Herron 2020, Bernardes, 2021), protection from antibiotics (Smukulla 2008), cooperative metabolism (Koschwanez, 2011), dispersal (smith 2014), syntrophy (Libby and Ratcliff, 2021), resource competition (Heaton 2020), and motility / division of labor (Solari 2006). Indeed, one of the things about the evolution of multicellularity is that there is no one 'route'- there are many different reasons different lineages evolve to be multicellular

      The paper is focused around the idea that 'group life' is a hypothetical "missing link" to multicellularity (see Figure 1), but this is not an open hypothesis in the field. It's been a universally accepted fact for more than 50 years. Multicellular organisms had to have evolved from simpler social groups of cells- given their phylogenetic nesting in clades of unicellular organisms, there's no other way they could have come into existence. But there is also been a great deal of work examining simple multicellular relatives of complex multicellular lineages, most notably in the volvocine green algae, holozoans (e.g., choanoflagellates and ichthosporeans), fungi, charophyte algae leading to land plants, and red algae. There is also a body of work using experimental evolution of evolve progressively more complex multicellular lineages (e.g., snowflake yeast). My central problem with this paper is that the 'group phase' they have described is far less compelling than existing work showing a 'group phase' being ancestral to more complex lineages of multicellular organisms, particularly because this multicellular lineage is not contextualized within a clade that has ultimately evolved complex multicellularity.

      In the "recommendations for authors" section, I make suggestions for how to reframe the work to better highlight its novelty, focusing it around a) the discovery of a new form of bacterial multicellularity, and b) the possibility that this reflects ecological scaffolding, a hypothesis for how multicellular organisms could have evolved by developmentally co-opting ecologically-mediated life cycles.

      The manuscript submitted to eLife was actually a different version from the preprint version in bioRxiv, but we noted the comments were based on the preprint version. We apologize for this confusion, if we have missed some submission procedure. The term ‘group life’ has been amended in the present manuscript, and instead we used the term ‘ecological scaffolding’ at the center of the Figure 1, and we think this could correct the wrong impression that evolutionary process is ‘one-route’. We also revised the Introduction to appropriately contextualize HS-3 as a new example of multicellularity among the preceding works, together with references about physiological significance. In the Discussion, we also mentioned some experimental work on evolution including ‘snowflake yeast’ (reference 48 and 49).

      As for the comment about the release of coccoid cells, we also agree that release in water itself is not a programmed developmental process. The “crowded-out” phenomenon was seen on solid agar surface (not in water, Figure 4C), but if we consider the natural niche of HS-3, the significance of the formed structure is the capability to release coccoid cells upon the trigger of immersion in water.

    1. Author Response:

      Reviewer #1:

      Bandyopadhaya et al have sought out to elucidate the immunometabolic mechanisms of monocyte tolerance induced by 2-AA, a quorum-sensing signal that is produced by Pseudomonas aeruginosa. An interesting topic, since elucidating how p. aeruginosa escapes the immune system could be very relevant from a clinical perspective.

      In previous publications, they showed that 2-AA can induce immune tolerance, leading to decreased cytokine production and epigenetic changes mediated via increased HDAC activity. In this follow-up paper, they tried to elucidate what immunometabolic changes are observed in 2-AA tolerized cells (both mouse and human cell lines) and how this can explain the improved intracellular survival of P. aeruginosa.

      The authors must be praised for the effort they put in to proof their point. They have undertaken a tremendous amount of experiments and measurements with so many different cell lines, stimuli, inhibitors and readouts. Unfortunately, the amount of figures and data also makes it very confusing and hard to read and in my opinion, they draw the wrong conclusions from the results of the experiments. Therefore, I cannot agree with some of the important statements, for example that 2-AA induces a Warburg effect. In addition, the methods are written in such a limited way, that it is hard to conclude if their conclusions are correct or to repeat these experiments.

      We thank the reviewer for their constructing comments and for appreciating the complexity of the study. We apologize for the brevity of material and methods. We hope that in addition to the data already presented, our revised manuscript will thoroughly address this reviewer’s concern on whether the Pseudomonas aeruginosa MvfR-regulated small molecule, 2-AA, indeed promotes a “Warburg-like” metabolic reprogramming in macrophages. The additional ongoing experiments, including seahorse studies and more detailed information in the materials and methods section of our manuscript, should ease this reviewer’s concerns.

      Reviewer #2:

      In the manuscript "Immunometabolic hijacking of immune cells by a Pseudomonas aeruginosa quorum-sensing signal" the authors studied the mechanism by which the quorum sensing signal 2-aminoacetophenone (2-AA), produced by the pathogen Pseudomonas aeruginosa, enables persistence of this pathogen in host tissues.

      Lactate, the fermentative product of glycolysis, reflects glycolytic fluxes and represses immune signaling activation decreasing inflammation in macrophages. Therefore, lactate levels reflect the metabolic status of the cells and has consequences for the inflammatory levels of the cells.

      In this study the authors show that 2-AA can affect the metabolic state of macrophages by increasing the glycolytic flux with the consequent increase in lactate levels and decrease in TCA flux. They also show that lactate decreases inflammation by suppressing 2-AA activation of NF-kBeta signaling and proinflammatory cytokine production.

      Using a murine model they show that addition of that 2-AA in mice infected with Pseudomonas aeruginosa results in an increase production of lactate and decrease of ATP in mice tissue, thus providing for 2-AA-mediated metabolic changes in vivo.

      The study described here is well written and the conclusions are generally well supported by the data. While they tested the direct effect of the 2-AA signal in macrophages, this was not tested in vivo in the absence of infection, and I think it is important to address the direct impact of the signal on the host.

      The study reported here proposes that a quorum sensing signal has an impact in pathogen persistence through immunometabolic reprograming properties, and provides evidence for a novel mechanism by which bacteria use quorum sensing signals to persist in the host.

      We thank the reviewer for appreciating our work, the experimental strategy, and the conclusions. We agree that the proposed additional 2-AA in vivo experiments in the absence of infection will further strengthen the in vitro studies. Additionally, they will corroborate our previously published in vivo studies on the immune responses triggered by 2-AA in absence of infection (Bandyopadhaya et al., PLoS Pathogens 2012 & Bandyopadhaya et al., Nat Microbiology, 2016).

      Reviewer #3:

      Tolerance in macrophages involves a global transcriptional shift from a pro-inflammatory response toward one characterized by the expression of anti-inflammatory and pro-resolution factors. In the case of TLR-mediated tolerance, pro-inflammatory cytokines are not universally suppressed in all tolerant cells, but distinct patterns of cytokine expression distinguished TLR-specific tolerance. (10.3389/fimmu.2018.00933, 10.1615/critrevimmunol.2015015495). However, the authors only show differences in TFNa. Thus, I strongly suggest the authors to determine anti-inflammatory cytokines, such as IL-10.

      We appreciate the reviewer’s comment and thank the reviewer for the suggestion to determine the levels of the anti-inflammatory cytokine IL-10. Indeed, we could not detect IL-10 in 2-AA tolerized cells; we will refer to this in the revised manuscript. Most likely because, as we previously demonstrated, 2-AA-mediated tolerance is markedly different from LPS mediated tolerance (Bandyopadhaya et al., PLoS Pathogens 2012 & Bandyopadhaya et al., Nat Microbiology, 2016) and TLR-regulated tolerance is primarily LPS mediated. In the previous publication (Bandyopadhaya et al. PLoS Pathogens 2012), we have also reported the IFN and anti-inflammatory TGF levels in 2-AA tolerized mouse macrophages, but we could not detect IL-10.

    1. Author Response

      Reviewer #1 (Public Review):

      In Wang et al., the authors investigate issues related to the relative proportion of flux for the enzymatic decarboxylation of pyruvate between PDH (pyruvate dehydrogenase) and PFOR (pyruvate-ferredoxin oxoreductase) in the model organism Synechococystis. The manuscript provides evidence that PDH becomes increasingly inactivated by a high ratio of NADH:NAD+ as well as evidence to suggest that PFOR is transcribed and remains intact under aerobic conditions. The authors put forward the theory that both PDH and PFOR are functionally active routes for pyruvate decarboxylation under aerobic conditions, whereas PFOR has previously been assumed to be inactive under growth conditions containing oxygen. This distinction is particularly highlighted by conditions where Synechocystis is grown photomixotrophically - and where the NADH:NAD+ pool may be relatively over-reduced because of two parallel inputs of reductant (water-splitting at PSII and catabolism of glucose). The authors examine growth under photoautotrophic and photomixotrophic conditions for a number of relevant mutants including members of the ferredoxin/flavodoxin family, PFOR, and NDH-1 complex subunits.

      The theory put forward in this manuscript is of general interest regarding electron flux through the combined electron transport chain (photosynthetic + respiratory) of cyanobacteria. The authors further broaden the potential audience for the manuscript by elaborating on the potential significance of these results in the context of a switch from PFOR (ancestral) to PDH (oxygenic/modern).

      Comments:

      Generally, theories put forward in this manuscript are intriguing and have a number of potential implications for understanding electron flux and regulation of central metabolic processes in photosynthetic microorganisms. If these theories are supported and become more generally adopted, they would have significant impact on the understanding of the regulation of central carbon metabolism in cyanobacteria. That said (due in no small part to the complexity of some of these pathways), the evidence provided to support the hypotheses is indirect in many instances. In some cases, there is a pairing of indirect data with broad statements that can come across as over-reach. These problems can be somewhat exacerbated by an unclear organization at parts of the Discussion, a lack of succinctly defined claims, and numerous typographical considerations.

      Thank you very much for this point. We now reorganized the discussion and overhauled it completely. It starts with aspects that are best supported by our data. We then added two sentences to stress that the following lines include hypothetical considerations that are meant as thought-provoking impulses. We hope that thereby over-reach is prevented.

      Major considerations:

      A major component of the proposed theories in this manuscript rest upon the assumption that PFOR is an active enzyme under highly aerobic conditions: this claim is never directly demonstrated.

      This is true. We could show though that PFOR of Synechocystis is in constrast to most bacterial PFORs stable in the presence of oxygen. However, as stated likewise for the oxygen stable PFOR of the obligate aerobe Sulfolobus acidocaldarius (3), and PFOR from E. coli, which was recently shown to contribute to metabolism in the presence of oxygen in vivo (1) we as well had to remove oxygen for enzyme acitivty in vitro. This point is discussed frankly.

      Indirect evidence of altered growth of pfor mutants, increased repression of PDH, and the higher NADH:NAD+ ratio under photomixotrophic conditions is in general alignment with this theory. However, while deletion of pfor does indeed result in altered growth dynamics in Synechocystis under periods of photomixotrophy, the alterations do not entirely align with the idea that this pathway is critical for rapid growth under aerobic conditions. For instance, pfor and most of the highlighted mutants (fdx 3, fdx 9, isiB) presented in Figure 3 show the greatest defects in their OD after reaching stationary phase (more rapid decline in OD on/after Day 6) relative to WT. This doesn't align as nicely with the highest NADH:NAD+ seen in Days 3-5 (which is also specifically called out: e.g., Line 146, Supplemental Figure S8).

      We are very cautious to compare growth experiments day by day. This is due to the fact that the growth behaviour of WT and mutants differ between experiments. We therefore repeat these experiments in several independent experiments including at least three replicates and show the data of typical growth experiments. In the case of the shown growth behaviour of WT and pfor and the NADH/NAD+ ratios under photoautotrophic and photomixotrophic conditions shown in figure 1, NADH/NAD+ ratios were determined in exactly those cultures for which growth data are shown. It is therefore legitimate to directly compare these results day by day. However, we did not determine the NADH/NAD+ ratios of the cultures shown in Fig. 3. The rise in NADH might have started with a delay here.

      In this context, the deletion of F-GOGAT is much more convincing in it's severity and timing, yet for this mutation to have a more severe phenotype is unexpected if PFOR is one of the primary/sole electron donors to the ferredoxin pool from glucose utilization as proposed (i.e., stated differently, F-GOGAT is only one of the enzymes downstream of ferrodoxin and might be expected to have a more subtle phenotype in comparison to the KO of PFOR if that is a primary source for electrons to ferredoxin under photoheterotrophic conditions).

      F-GOGAT requires reduced ferredoxin which can be provided by PFOR and in addition also by PSI. As electrons from glucose oxidation can be fed via photosynthetic complex I into the PQ-pool they will eventually arrive at PSI (Fig. 3C) where ferredoxin can be reduced and transfer electrons to F-GOGAT. However, to get a truly complete picture of the situation several issues will have to be addressed in the future: we do not know which of the low abundant ferredoxins as well as high abundant ferredoxin 1 interact with PSI, F-GOGAT, PFOR and photosynthetic complex I. It would be furthermore helpful to know all midpoint potentials of the different ferredoxins. Without this information it might be too much to ask for a simple interpretation.

      A central tenant of the argument put forward on the evolutionary importance of using either PFOR vs. PDH is the conservation of extra free energy by the former reaction. However, additional information on the ferredoxin paralog(s) that accept electrons from PFOR is necessary to evaluate these claims. Based on the data within these manuscripts, Fdx3, Fdx9, and IsiB have the strongest links to PFOR: though the authors do take care to never state directly that they have evidence that these are the acceptors in vivo. Given the variability in the midpoint potentials of different ferredoxins, some ferredoxin acceptors may better conserve the free energy in pyruvate, while others may actually be more 'wasteful' than NAD+ as the acceptor through PDH. Unfortunately, the midpoint potentials for Fdx3, Fdx9, and IsiB are unknown or not stated in this manuscript. It is therefore unclear what ferredoxin is being used as the reference point for conservation of Gibbs free energy in Figure 4C and referenced multiple times in the text.

      We agree that it would be great if we already knew the redox potentials of all the ferredoxins involved. We are currently working on this issue. All that we know for now is that the redox potentials of ferredoxins lay between -240 mV to -680 mV whereas the redox potential is around -320 mV for NAD(P)H/NAD(P)+. Unpublished data that require further validation reveal that the redox potential of Fdx9 is definitely more negative than the redox potential of Fdx1 (-412 mV) in Synechocystis and is thereby clearly more negative that -320 mV. However, as these data require further validation, we did not name numbers. In addition, interaction studies on PFOR and low abundant ferredoxins are planed and preparations are in progress.

      Finally, the measurements of NADH:NAD+ (most prominently used for measurements in Fig 1B) utilized kits that require multiple, long centrifugation steps in the dark prior to assaying this rapidly exchanging pool. While it appears that the authors were able to get reproducible results with these kits, it is difficult to interpret what the increase in relative NADH levels in glucose-fed cells means given that 10+ minutes of incubation in the dark and/or changing temperatures elapsed after the cyanobacteria were removed from the incubator before the NADH:NAD ratio was assessed. While it superficially makes logical sense that the cytosol would be over-reduced when illuminated and under glucose feeding relative to illumination alone, it shouldn't be assumed that these measurements are representative of this rapidly-exchanging pool under the steady-state growth conditions.

      Thank you very much for raising this important point. We are very much aware of the difficulties to determine the redox state of NADH:NAD+ using these kits. However, there is no other method available that properly distinguishes NADH and NADPH. Furthermore, the centrifugation step was done at -9°C which should minimize metabolic reactions during this step. However, we now added in vivo measurements using the NAD(P)H-module available for the PAM and using the Dual-KLAS/NIR to determine the redox state of ferredoxin (newly added Fig. S4). Both methods show that NAD(P)H as well as ferredoxin are more strongly reduced under photomixotrphic conditions in comparison to photoautotrophic conditions and thus support our previous data.

      Reviewer #2 (Public Review):

      The observation that cyanobacteria can use two alternative pyruvate decarboxylating enzymes using either NAD+ or ferredoxin is an interesting and the work is useful contribution. The authors very nicely characterize the enzymatic properties of the two pyruvate metabolizing enzymes and also are able to connect the ideas of redox balance with a set of ferredoxins. Even though they are not able to definitively characterized the specific ferredoxin which interacts with the enzyme, the analysis is nicely conducted and it's clear that the suggestion they're making regarding the involvement of the minor ferredoxins is compelling. However, the work could be written in a way that might be more useful.

      Specific comments:

      Overall this is an interesting study, but the arguments could be sharpened and better connected with the literature. The introduction needs to be considerably revised in my opinion. It is not obvious whether it is even appropriate to discuss the enzymes as an aerobic enzymes or aerobic enzymes, since this concept is simplistic and perhaps, archaic. Indeed, placing the results of the present study in the context of "aerobic enzymes versus aerobic enzymes" is a bit of a 'strawman' argument. For example, the counter examples of O2-tolerant enzymes cited seem to suggest that PFORs have been capable of evolving into O2-tolerant enzymes quite readily and that two types of decarboxylase have evolved for quite different reasons than simple replacement for a new environment. Instead, I think a more current and general perspective relates more to the interpretation that the authors are already putting forth. Namely, the enzymes are utilized according to redox balance considerations rather than sensitivity to oxygen.

      Therefore, I think the very long and pedantic introduction is useful for review, but only if it is shortened and also includes the alternative interpretation regarding adaptations to redox potential in the cytoplasm. My guess is that there are plenty of examples of redox balance function arguments in the literature to refer to in contrast to the evolutionary replacement argument used. Certainly, there are good examples regarding glucose toxicity in mutants of Synechocystis that can be considered.

      Thank you very much for this point. The O2-tolerant PFORs mentioned were merely shown to be stable in the presence of oxygen in vitro which means that they can be isolated under anaerobic conditions. However, all enzymatic in vitro assays required anaerobic conditions. Only one PFOR was shown to be active in the presence of oxygen in vitro. Physiological studies on the importance of these enzymes under aerobic conditions in vivo are completely missing. However, animated by the requests of the reviewers we searched the literature intensively again and indeed found a recent report, which describes the involvement of PFOR in redox regulation in an aerobic culture of an E. coli mutant, in which glucose-6P dehydrogenase (ZWF) was down-regulated (1). We included this study both in our introduction and discussion. It very much supports our own findings, as the E. coli PFOR requires likewise anoxic conditions in in vitro enzyme tests. We agree that the idea that PDH complex and PFOR are exclusively regulated by oxygen availability might sound simplistic. However, we do not fully agree that this is a strawman argument as both enzyme systems are still mostly discussed as counterparts for either aerobic respiration (PDH complex) or anaerobic fermentation (PFOR)(4). To the best of our knowledge, the study that was included now and our own data, are the very first ones that put clearly forward the idea, that redox control governs the activity of these enzyme systems at the pyruvate node independent of oxygen. However, doubts about the rather simplistic distinction between aerobic versus anaerobic enzymes in general have indeed been expressed. Even though these studies in general lack physiological in vivo experiments. We therefore included this information in the introduction as well. (line 76: There are several reports on the aerobic expression of enzymes that are assigned to anaerobic metabolism in prokaryotes and eukaryotes and therefore challenge the simplistic distinction between aerobic versus anaerobic enzymes (5-7). Their physiological significance and regulation are only partly understood.) This did not result in a shortened introduction though as additional information was added. The new introduction thus includes alternative interpretations as requested and is therefore hopefully more balanced.

      Given the interpretation that the alternative forms of the enzyme help cells adjust their redox balance to different conditions, such as photomixotrophic growth, the very nice enzymatic analysis and growth studies of the mutants work would be significantly strengthened by more direct physiological measurements that report intracellular redox states.

      Thank you very much for this important point. Intracellular redox states were shown by measurements of the NAD+/NADP level (Figure 1B) and were now extended by new in vivo measurements that show that both the NAD(P)H and the ferredoxin pools are more reduced under photomixotrophic in contrast to photoautotrophic conditions (new Fig. S4).

      Minor comments:

      line 211: Perhaps, "..the deleted alleles failed to segregate, keeping some wild type copies."

      This was changed to: the deleted alleles of fx2 (sll1382) and fx5 (slr0148) failed to segregate, keeping some wild type copies.

      It would be interesting to characterize whether the observed distribution of PFOR correlates with specific physiological features. In other words, PFOR seems to become important upon the addition of an external carbon source in way that must integrate with autotrophic metabolism (i.e. mixotrophic growth) altering the balance of the oxidized and reduced form of redox cofactors--does the observed distribution correlate at least with the metabolic characteristics of the handful that have been studied in the lab?

      Thank you very much for this suggestion. We checked the lists of cyanobacteria that either possess or do not possess a PFOR in order to search for shared known physiological features. However, the challenge is currently that the number of uncharacterized cyanobacteria in our list is too large. It is therefore impossible to find solid correlations. But we fully agree that it would be interesting to find these.

      A more detailed set of calculations that help explain panel C in figure 4 need to be included to support the quoted values for redox potential in free energy. I assume these are standard values and and the specific superscripts and subscription associate with the ΔG nomenclature needs to be defined.

      The calculations are shown in the materials and methods part. A respective notice (for calculations see materials and methods part) is now given in the legend of Fig. 4C. Information concerning the nomenclature is found in the cited literature in the materials and methods part as well.

      Reviewer #3 (Public Review):

      The manuscript by Wang et al. conclusively demonstrates that the cyanobacterium Synechocystis sp. PCC6803 prefers to use the ferredoxin-reducing enzyme PFOR over the NAD+-reducing PDH-pathway when grown under photomixotrophic conditions while the PDH-route is favored under photoautotrophic conditions. Both the potential physiological meaning of this switch and implications for the evolutionary history of the role of the respective enzymes and their pathways are discussed.

      The main hypothesis of this work considers that PFOR-mediated decarboxylation of pyruvate replaces the PDH-based one when cells shift from photoautotrophic to photomixotrophic growth conditions. This hypothesis is assessed via the comparison of growth curves measured on a host of deletion mutants and via direct detection of expression levels of certain enzymes. The authors' hypothesis is robustly supported by the majority of the reported experiments and the reviewer is fully convinced by these data. However, I would hold that the data shown with respect to phosphorylation of PDH (Fig. S4) are unconvincing. I can't see a clear difference in growth-curves for the incriminated mutants deltaspkB and L which would convincingly exceed the variation observed for the entire dataset.

      We agree that the data on the phosphorylation of the PDH complex including the kinase mutants are not very convincing. We were uncertain from the beginning on whether it would be a good idea to include these data sets and therefore discussed them very cautiously in the manuscript. Anyway, as the enzymatic tests with the E3 subunit of the PDH complex at different NADH concentrations show convincingly that high NADH levels have an inhibitory effect on the complex, we now decided to delete both data sets out of the manuscript, as they are not really required for the statement of the manuscript.

      1) S. Li et al., Dynamic control over feedback regulatory mechanisms improves NADPH flux and xylitol biosynthesis in engineered E. coli. Metab Eng 64, 26-40 (2021).

      2) T. Nakayama, S. Yonekura, S. Yonei, Q. M. Zhang-Akiyama, Escherichia coli pyruvate:flavodoxin oxidoreductase, YdbK - regulation of expression and biological roles in protection against oxidative stress. Genes Genet Syst 88, 175-188 (2013).

      3) A. Witt, R. Pozzi, S. Diesch, O. Hädicke, H. Grammel, New light on ancient enzymes – in vitro CO2 Fixation by Pyruvate Synthase of Desulfovibrio africanus and Sulfolobus acidocaldarius. The FEBS Journal 286, 4494-4508 (2019).

      4) M. Müller et al., Biochemistry and Evolution of Anaerobic Energy Metabolism in Eukaryotes. Microbiology and Molecular Biology Reviews 76, 444 (2012).

      5) S. B. Gould et al., Adaptation to life on land at high O2 via transition from ferredoxin-to NADH-dependent redox balance. Proceedings of the Royal Society B: Biological Sciences 286, 20191491 (2019).

      6) O. Schmitz, J. Gurke, H. Bothe, Molecular evidence for the aerobic expression of nifJ, encoding pyruvate : ferredoxin oxidoreductase, in cyanobacteria. FEMS Microbiol. Lett. 195, 97-102 (2001).

      7) K. Gutekunst et al., LexA regulates the bidirectional hydrogenase in the cyanobacterium Synechocystis sp. PCC 6803 as a transcription activator. Molecular Microbiology 58, 810-823 (2005).

    1. Author Response

      Reviewer #1 (Public Review):

      Kano and authors present a very interesting and unique study investigating whether the white sclera, uniquely characteristic of human eyes, contributes to better gaze detection by individuals, a key prediction of the gaze-signaling and cooperative-eye hypotheses. They test both humans and chimpanzees in a well designed, counter balanced, experiment where they examine both within and cross-species evaluations of gaze from static, controlled images. Overall, they provide compelling evidence that the white sclera not only contribute to better gaze discrimination by both humans and chimpanzees, but that the white sclera also aid gaze discrimination when visibility conditions are poor.

      I found the experiments well designed and carefully thought out. The statistical methods are also appropriately applied in my opinion, although it would be helpful to have the exact R code the authors used as an additional supplement. In general however, the authors should be commended on the transparency with which they describe both the training and testing of individuals for both species.

      One clear weakness of the paper is that the evidence for chimpanzees is limited to only 3 (sometimes 2) individuals, but one can appreciate that this kind of experimental set up and task would have been quite difficult for them. Additionally, although the authors were diligent in selecting a cross-cultural sample of human images, the test subjects were all primarily of one cultural background. Although these weaknesses mean that the generalization of their results need to be taken with caution, I find the methods and results are compelling and provide a significant contribution to the on-going discussion of the importance of external eye morphology in facilitating cooperation and communication.

      Importantly, they show evidence for both white sclera and eye shape/size enhancing gaze discrimination when visibility is compromised, adding empirical evidence for a critical component to the gaze-signaling and cooperative eye hypotheses. I am confident their experimental approach will be useful to other scholars investigating this topic and will provide a comparative framework with which to test other species or test more individuals from different populations of humans and chimpanzees.

      Thank you very much for your positive evaluation. We added our R formula in Table S1. Although the majority of our participants were from similar cultural backgrounds, we believe that the results from the previous two experimental studies on the same topic (Ricciardelli et al. 2000; Yorzinski and Miller, 2020) complement our results because these previous studies tested participants from other cultures. We also added a paragraph explicitly addressing the limitations of our study, including the small number of chimpanzee participants in our test conditions.

      Reviewer #2 (Public Review):

      The proclaimed goal of Kano et al. is to provide "experimental evidence answering the question of whether the human white sclera serves any communicative function for eye-gaze signaling". This is indeed an important gap in the literature, although it has recently been addressed by e.g., Yorzinski & Miller, J (PLoS ONE 15(2), e0228275 (2020)) in a set-up with human subjects. This study, however, includes the first experimental approach to this issue that is built on an interspecific comparison: The authors tested how well humans and chimpanzees can evaluate eye gaze direction in face pictures deriving from both their own and the other species. Additionally, the human and chimpanzee subjects also had to score manipulated photos, in which irido-scleral colors were inverted.

      The experimental protocol is one of the strengths of the study. The experimental stimuli were thoughtfully crafted to avoid unwanted biases and variable shading and size dimensions of stimulus pictures address relevant perceptual challenges of glance identification in the real world. Minor aspects of stimulus design (e.g. inverting pupil colors) are not justified, though. Research hypotheses are clearly stated and are relevant to the current scientific discourse on the topic. The training procedure for the chimpanzees was made fully transparent, impressively demonstrating the efforts involved in preparing them for the study.

      The results are straight-forward and I have no criticism towards analyses and data presentation in the manuscript, which I believe are all well done. Nevertheless, I want to point out that only two chimpanzee subjects participated in all tests, which limits the conclusiveness of the data. This is particularly true, because several chimpanzees that later dropped out of the training performed better when conspecific rather than human stimuli were presented. This issue should receive more attention in the manuscript.

      In general, I believe that many of the interpretations and a priori assumptions of the authors are problematic, constituting the most important weaknesses of the manuscript. Even key claims of the study are only partially supported by the collected data or by results previously reported in the literature:

      From a methodological perspective, this manuscript simply addresses the question: "Is human eye-gaze more conspicuous than that of chimpanzees?" The authors answer this questions positively, which is an expected result and in line with previous research. Nevertheless, the Introduction and Discussion sections of the manuscript prominently discuss the question "Why is the human eye more conspicuous?". For this, an evolutionary perspective needs to be taken into account (see below) and, if an adaptive conjecture is adopted, potential functions need to be proposed.

      The study endorses social drivers behind the depigmentation of the human sclera. However, social functions of eye gaze were not explored in the experiments, as subjects simply needed to extract basic information on glance direction from pictures. It should be expected that increased contrast, as present in the human eye compared to the chimpanzee eye, facilitates the detection of these patterns. I therefore see no new arguments for the idea that scleral color is importantly involved in social cognition and the link between the results and the authors' interpretations remains speculative. It has been demonstrated that reflexive glance following is found in various catarrhine primates, but only humans appear to use glances as referential cues in social situations. The lack of focus on eye orientation in chimpanzee behavior has been strikingly demonstrated by the training results presented herein and strongly supports this dichotomy. At the same time, extensive scleral depigmentation is not rare among monkeys and apes, so that explanations for this phenomenon should be applicable to species other than humans (Caspar et al. Sci Rep 11, 12994 (2021), Perea-García et al. Symmetry, 13(7), 1270 (2021)).

      Thus, it is unfortunate that the very strong conclusive statement "we found that the key function of white sclera is to enhance the eye-gaze signal", is not balanced out by an exploration of alternative hypotheses or caveats to this conclusion. I would argue that such a claim is difficult to defend when a single species pair with very different expressions of eye pigmentation is studied. The authors do not discuss how their interpretations might or might not fit other primates with strongly depigmented sclerae, like Sumatran orangutans. This is an important shortcoming, because such comparisons could potentially back up or damage the hypotheses drawn from the human-chimpanzee pairing.

      Finally, the authors strongly imply that the human condition of scleral pigmentation alone is the derived one and thus requires a peculiar (functional) explanation. On the contrary, the chimpanzee phenotype is discussed as if it would represent an ancestral condition which is deemed representative for nonhuman primates as a whole. However, recent evidence suggests that both humans and chimpanzees show unusual scleral color patterns, with other great apes displaying variable pigmentation with a strong trend towards (at least localized) depigmentation in orangutans, bonobos, and gorillas (Perea García, J. O. J. Lang. Evol. 1 (2), 151-158 (2016), Caspar et al. Sci Rep 11, 12994 (2021)). This is not mentioned in the manuscript and should be added. The uniformly dark chimpanzee sclera is indeed not representative for great apes or most other groups of nonhuman primates.

      All in all, this paper represents a valuable experimental contribution to the debate on the evolution of eye pigmentation in apes. In particular, it demonstrates that eye gaze (and therefore coloration) is negligible for chimpanzee communication. However, a more inclusive and nuanced interpretation of results and a better portrayal of their relevance to hypotheses explored in the literature is required. This includes an improved discussion of the limitations of the study's approach when it comes to deducing evolutionary and socio-cognitive patterns.

      Thank you very much for your helpful comments and expertise in this topic.

      Regarding the previous experimental study on this topic, Yorzinski and Miller (2020) is indeed our predecessor, and we detailed this study in our introduction section. Moreover, we want to point out that Ricciardelli et al. 2000 is our predecessor as well, and in fact, we designed our stimulus manipulation based on this previous study. Ricciardelli et al. 2000 tested human participants in a gaze-discrimination task and found that reversing the contrast polarity of the eye regions in the human faces deteriorates the judgment of the gaze directions in participants (thus we should have said this stimulus manipulation as “the reversal of eye contrast polarity”, rather than “the inversion of eye colors”; we apologize for our error in word choice). One advantage of Ricciardelli’s et al. method is that we could change only the contrast polarity but not any color differences within each eye image, namely the color differences between the iris and sclera and also between the pupil and iris (thus this manipulation does not change the conspicuousness of iris or pupil per se). We added Figure S4 to better explain how we made our stimuli. Please also note that the visibility of eye-gaze directions depends on the visibility of both iris and eye-outline edges, not only that of the iris (or pupil). To clarify this aspect, we added Figure S1.

      Regarding a small number of chimpanzee participants, we addressed this limitation more explicitly in our revision.

      Regarding the individual differences in training performance, although we indeed observed some differences between individuals in their performance for the chimpanzee and human stimuli during the training stage, we did not find any relation between these performances and the participants’ particular backgrounds (Figure S3 and Table S3). Most importantly, please note that the key criterion of passing the training stage was learning to reliably discriminate the eye-gaze directions of both human and chimpanzee stimuli.

      Regarding the interpretations and a priori assumptions, we can clarify them by referring to one most recent morphological study on great ape eye color in this revision (Kano, F., Furuichi, T., Hashimoto, C., Krupenye, C., Leinwand, J. G., Hopper, L. M., . . . Tajima, T. (2021). What is unique about the human eye? Comparative image analysis on the external eye morphology of human and nonhuman great apes. Evolution and Human Behavior, in press, DOI: 10.1016/j.evolhumbehav.2021.12.004). Although the current experimental study was performed independently from this related study, we built our experimental designs partly based on this related study. The results from our experimental study and those from this related study are complementary to one another.

      Regarding the iris-sclera color contrast/difference, Kano et al. (in press) found that the iris-sclera color difference (not the contrast measure of Perea-Garcia et al., 2019 criticized by Caspar et al., 2021) did not differ between the human and chimpanzee eyes. Importantly, we confirmed this same result in our chimpanzee and human stimuli (Figure S2). More importantly, as mentioned above, we did not just swap the iris and sclera colors in our eye images, but reversed the contrast polarity of eye images, without changing any local color difference within the eye images such as the iris-sclera or the iris-pupil color contrast/differences. Thus, please note that we did not ask "Is human eye-gaze more conspicuous than that of chimpanzees?", but asked, “Does the uniformly white sclera (with a darker iris) facilitate the visibility of eye-gaze directions across species?”. We expanded our introduction section to clarify our general aims and rationales/explanations for our experimental manipulations.

      Regarding the social drivers, one sentence in the conclusion paragraph (that you pointed out) was indeed misleading and thus rephrased it as “one function of the uniformly white sclera is to equip eye-gaze signal with robustness against its degradation caused by natural noises (e.g., shading, distancing)”. Indeed, our aim was to test the perceptual advantage of the uniformly white sclera, one key premise of the gaze-signaling hypothesis, but not to test the social drivers of eye-gaze signals.

      Regarding the sclera colors of other great ape species, we also recognized in Kano et al. (in press) that some nonhuman ape individuals have partly unpigmented sclera. However, this related study found that such partly unpigmented sclera is characterized as more graded or patchy color patterns compared to humans’ uniformly unpigmented sclera and that these color patterns more easily blend into adjacent skin/hair colors around the eyes, particularly in visually challenging conditions (e.g., shading, distancing). We thus predict that the same pattern of results would be obtained even when we use partly unpigmented sclera as our stimulus. However, further experimental studies are necessary to confirm this prediction. We clarified these points in both our introduction and discussion sections.

      Thank you once again for your critical but constructive comments.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, the authors developed a new expansion microscopy (ExM) method called Ten-fold Robust Expansion Microscopy (TREx). This method emphasizes one-round sample expansion of cells by systematically optimizing the monomer recipe. Compared to existing ExM methods which expand samples to similar scale (~ 10 folds), TREx aims for a robust procedure that can be handled more easily. The reviewer experimentally tested the TREx protocol, and validated the TREx 10x gel can be made robustly by researchers who have experience with standard ExM.

      We are very pleased that the reviewer tested out our new recipe!

      Specific comments:

      1) The authors claimed in the abstract that "TREx can provide ultrastructural context to subcellular protein localization by combining antibody-stained samples with off-the-shelf small molecule stains for both total protein and membranes". The authors only demonstrated one NHS ester dye, BODIPY-FL NHS dye (lined 151-159) without justification why this dye was selected. Does BODIPY-FL NHS dye work better than other off-the-shelf NHS dyes? The reviewer recommends the authors to validate a few more widely used dyes with TREx, e.g. Cy3/Cy5, Alexa 488, Alexa 568, to guide the readers to choose the appropriate dyes.

      We have added text on this issue, "Sim et al (Sim et al., 2021) have shown that highly hydrophobic NHS ester dyes exhibit strong contrast for cytosolic organelles while highly hydrophilic NHS ester dyes strongly stain the nucleus. The moderate hydrophobicity dyes that we used (BODIPY-FL (Zanetti-Domingues, Tynan, Rolfe, Clarke, & Martin-Fernandez, 2013) and AlexaFluor594 (Hughes, Rawle, & Boxer, 2014)) exhibit both nuclear staining and contrast for cytosolic organelles."

      2) Page 8: The reviewer is happy to see the discussion on the heterogeneous local expansion factors in cells. It is critical for evaluating the expansion isotropy and avoid pitfalls in the applications of TREx. Based on this work and previous work (e.g. U-ExM), organelles with higher protein density may have smaller local expansion factors than the macroscopic expansion factor. The authors discussed the local expansion factor of organelles with different protein density, including centrioles, NPCs, and microtubules. To evaluate the local expansion factors comprehensively, the reviewer asks the authors to add a figure or plot to compare the local expansion factors of different organelles, ideally including centrioles, NPCs, microtubules, clathrin-coated pits, mitochondria, ER, and centromere. The authors have already measured or imaged many of these organelles. For the other organelles, good antibodies are available. Therefore, the additional experiments should be straightforward for the authors. But the comprehensive comparison will make the work much more impactful.

      We address this in our response to essential point 3 and agree that the added comparison over multiple organelles has made the work more impactful.

      3) Line 388: The authors stated "The strong overlap between NHS ester and mCLING stains was not unexpected, given the reactivity of NHS esters towards both unreacted lysines in the mCLING molecule and antibodies." Since AcX (6-((acryloyl)amino)hexanoic Acid, Succinimidyl Ester) at high concentration was added after the mCLING staining, most of the lysines in the mCLING should be reacted by the AcX. Therefore, NHS ester dye staining should not strongly overlap with the mCLING. The authors should re-evaluate and interpret the overlap. The authors can do simple experiments like increasing the concentration of AcX, or use pH 8 for AcX treatment. If the overlap is reduced, it means the overlap was caused by the unreacted lysines in mCLING, and can be reduced. If the overlap is not reduced, there are other mechanisms which need further examination or interpretation.

      The AcX concentration was selected to maximize retention of proteins without hindering gel expansion by cross-linking through multiple AcX modifications on each individual protein. Therefore, it is likely that AcX is not close to saturating the available primary amines. We have explored this further with an AcX competition assay.

    1. Author Response

      Reviewer #1 (Public Review):

      Strength: Excellent statistical methods are employed. Specimens collected from two centers are used.

      Weakness: It is not clear what new knowledge this follow-up study bring to the audience. The critical biomarker, miR150 they propose for development of biodosimetry assay was already discovered. There are close to dozen publications showing the dose response of miR50, in mouse, rats, non-human primates and humans (including two research papers and and several reviews from authors). The dose response shown in 4b is not appreciable. Introduction and discussion talk about clinical utility for triage after nuclear disaster. Is analysis of miRNAs purified from exosome a viable approach for triage and clinical decision making? If so, please provide convincing argument showing practicality.

      We appreciate that the reviewer and the editor believe that “excellent bioinformatics and biostatistical methods are employed”. We apologize for the confusion regarding miR-150 and its utility as a radiation exposure biomarker. Indeed we and others have shown the importance of miR-150 and other miRNAs in detecting radiation exposure in mice and macaques. We had inferred that the resulting evolutionarily conserved radiation-inducible microRNAs were very likely to translate well to humans due to the high conservation of their promoter regions and transcription factor binding sites. However, in this study validating microRNA-based test for radiation detection using actual samples , we demonstrate that while most of the predictions grounded in animal models held true, solely through the analysis of human data were we able to develop a model that reached clinically-useful performance. And most importantly there are key differences in humans suggesting that for clinical application the primary source of data has to be human. For example, a key miRNA for radiation detection noted in macaques – miR-133 – was absent in human patient sera. The miR-30 family, important for dose separation in mice was redundant in the human test. The results from animal studies of miR-150-5p are not directly translatable for the use in humans. In animals, particularly isogenic mice, miR-150-5p kinetics enable perfect separation of the irradiated from non-irradiated samples, even after low dose exposure. The dose response in humans, that have different genetic and clinical background, is much less appreciable and therefore a simple, single- or two-miRNA-based test is insufficient. To overcome this, we employed artificial neural networks reliant on the expression of 8 miRNAs and 2 normalizers, which assure robustness to differences in sample material content. Therefore, we are bringing significantly new knowledge to the field, and providing a template for how miRNA signatures derived from animal models need robust validation in human samples before we even conceive a human application. The analysis of miRNAs purified from exosomes constitutes an exploratory component of our work and is not part of the proposed diagnostic procedure for triage and clinical decision making. We introduced necessary changes to make the division between the main and exploratory parts of our work more evident (lines 116-127).

      Major comments:

      1. Longitudinal evaluation of specimens from human patients who received TBI is a plus. However, baseline readings in specimens collected from leukemia patients need to be compared with that in healthy humans. Why several specimens were excluded from analysis?

      Since the irradiation of healthy humans would not be ethically acceptable, we cross-referenced the results from patients with leukemia with our earlier results of radiation-responsive miRNAs in healthy mice and non-human primates as a surrogate of healthy humans undergoing TBI. As outlined in the “Preprocessing of profiling data” section of Materials and Methods, we implemented quality control based on the number of detected miRNAs per sample. For the miRNA-seq based experiment, samples with less than 350 miRNAs with non-zero reads detected (4A and 7A in Figure 1 – supplementary figure 1) and respective paired samples were removed from the analysis. Additionally, sample DFCI.13A was an outlier in hierarchical clustering and in Principal Component Analysis (Figure 1 – supplementary figure 2) and therefore this sample, together with paired samples from other timepoints, were excluded from the analysis. We incorporated this information in the main part of the manuscript (lines 146-148).

      1. Dose response noted is moderate. Biodosimetry refers retrospective evaluation of absorbed dose and the analysis should include validation using specimens of unknown exposure.

      As outlined above, the moderate dose responsiveness of miRNAs used in our proposed signature is the primary reason why we believe that a simple diagnostic procedure based on a single miRNA, e.g. miR-150-5p, will not be feasible for use in humans. The final model was evaluated on an independent group of 12 patients with samples drawn under the same protocol (for which exposure and dose was unknown, to validate the model diagnostic accuracy).

      1. Authors says that 1 Gy exposure in humans can cause ARS (paragraph 1, introduction). However their approach do not resolve dose under 4 Gy (around the LD50 value in humans).

      The TBI protocol does not allow for irradiation with doses lower than 2Gy in a single fraction, which was the reason behind the definition of low-dose exposure group (2 or 4Gy) in our study. However, localized irradiation with higher doses provokes response reflected by changes in miRNA levels in serum (Malachowska et al. Int. J Radiation Oncol Biol Phys), suggesting that the irradiation signature are likely to hold true and identify individuals exposed to smaller doses.

      Reviewer #2 (Public Review):

      The study first compared the profiles of serum miRNA in patients before and after irradiation treatment. Then they selected 8 miRNA markers that showed significant changes in levels for further analysis. Then, they showed that the analysis of these markers by real-time PCR can differentiate the pre- and post-irradiation samples in 12 additional patients. The objective of the study is unclear.

      We rephrased the appropriate sections of the manuscript accordingly to elucidate the objective of the study (lines 105-106 and 131-132).

      The study only demonstrates that the 8 miRNA markers are useful to differentiate serum samples collected before and after irradiation. This information is not useful as the blood picture would be more accurate and cheap to accomplish this task.

      The currently used diagnostic screening tests for radiation exposure, including time to onset of radiation sickness, kinetics of lymphocyte depletion and chromosomal abnormalities analysis, are time-consuming and do not allow definite conclusions, as outlined by the lack of FDA-approved biodosimeter. The nadirs of peripheral blood cell counts may reflect high dose exposure but do not allow for prediction of the eventual outcome. Moreover, as evidenced in our prior experimental studies, the dynamics of the blood cell counts are significantly slower than those of circulating miRNAs. For example, the differences in outcome, that is probability of survival of an animal after acute radiation exposure, is not evident by any blood counts or other measures for weeks after radiation, and is predicted by a blood based-microRNA signature with ~90% accuracy assessed 24 hours after radiation exposure (Acharya et al, Science Translational Medicine, 2015). Therefore, although we acknowledge that a blood cell count would be cheaper, we respectfully disagree that it would be more accurate in rapidly providing the necessary information to implement countermeasures safeguarding from the absorbed radiation dose. Furthermore, qPCR-based assays are also inexpensive and increasingly available, owing to the COVID-19 pandemic and the great need to expand PCR-based testing capabilities that it gave rise to. We acknowledge that this information was not presented in sufficient detail and we expanded relevant sections of the manuscript (lines 64-76, 401-402).

      The authors also propose that these markers are useful for the identification of subjects exposed to irradiation. As this study has not addressed the specificity of these miRNA markers to irradiation, the claim of having a signature for radiation exposure is not justified.

      We had shown in previous, experimental exposure studies (“Serum microRNAs are early indicators of survival after radiation-induced hematopoietic injury”, Science Translational Medicine, 2015 and “Evolutionarily conserved serum microRNAs predict radiation-induced fatality in nonhuman primates”, Science Translational Medicine, 2017), performed using animal models that miRNAs with radiation-dependent alterations of expression show association with bone marrow depletion, correlate with survival in amifostine rescue experiment, and that miRNA expression changes are supressed by the use of radiation-mitigating agents like gamma-3-tocotrienol. These arguments act in favour of specificity towards irradiation as the inciting stimulus of the expression patterns. The cross-referencing of results from animal studies and from our miRNA-seq experiment on human samples was aimed to account for this issue, as similar experiments on healthy humans would not be ethical, and to identify high-confidence miRNAs from which a signature could be built. We now added these explanations (lines 112-115, 164-167, 344-350).

      Although patients with irrevocable damage of bone marrow due to other factors would be an interesting comparative group, we struggle to find an ethically acceptable scenario that would match the TBI in terms of the timeline and repeatability of the bone marrow depletion. A feasible alternative may be high dose chemotherapy conducted in preparation for bone marrow transplant, but the dynamics of that procedure are vastly different making the group more adequate for analyses of bone marrow regeneration rather than a control for TBI-initiated damage.

      The key new experiments in this study are the profiling of the serum miRNA in the patients undergoing total body irradiation. The results on mouse model and macaques have been published previously. The consistency of the changes of the miRNA markers is not surprising.

      The consistency of the radiation-inducible miRNAs between mice, non-human primates and humans was expected, given the high conservation of their promoter regions and transcription factor binding sites, as we showed previously (Fendler et al., 2017). This step was important to assure that the miRNA level changes observed in humans result from radiation exposure, as this could not be determined directly, as mentioned in the response to previous remark. However, the creation of the clinically-applicable test would not be possible without a true study in humans presented in the manuscript. Notably, miRNAs crucial for the radiation exposure models in our macaque model (miR-133b) was surprisingly absent in human sera, and the miR-30 family, important for dose separation in mice was redundant in the human test. This serves as a cautionary tale for “translational” studies without true validation in humans and underlines the importance of our findings in terms of the first human-specific and adequately validated diagnostic and prognostic test for radiation exposure.

      Reviewer #3 (Public Review):

      1. Appropriate bioinformatics discussions and functional pathway analysis are necessary for the key differentially expressed miRNAs that have been screened out. It is boring to only discuss the differences of miRNA data.

      We appreciate the suggestion to back the results of differential miRNA expression with a more in-depth bioinformatic discussion. We discussed the results of functional enrichment analysis, presented in Fig. 3C, in more detail, and appended the bioinformatic analysis (lines 218-222, 360-364, 546-549). A graph of miRNA-gene interactions, created using miRTargetLink 2.0 for miRNAs differentially expressed in exosomes after high dose irradiation has been added as figure supplement 1 to Figure 3.

      1. In page 5, "We used logistic regression to create such a model in the low-dose setting (N=22 sample pairs). The resulting classifier was based on the expression of miR-150-5p, miR-126-5p and miR-375" , Why the three miRNAs in the low-dose radiation group were selected for modeling instead of the seven overlapping miRNAs in the high and low dose radiation group to classificate the irradiated- and non-irradiated samples ? Please explain in detail.

      The expression of miR-150-5p, miR-126-6p and miR-375 was used in our previous animal studies to determine radiation exposure and we used similar approach at this stage of the project to evaluate whether their expression measured using RNA sequencing in human sera can reliably distinguish between the irradiated and non-irradiated samples. We acknowledge that it is not clearly stated. The primary purpose of this analysis was to visually present similarities in radiation-inducible miRNA expression changes across species, and the logistic regression model in question was not used any further. Following the Reviewer suggestion, we built a model using the seven miRNAs overlapping in the high and low dose radiation comparisons to classify the irradiated- and non-irradiated samples, obtaining AUC of 0.95 (95%CI: 0.89-1.0); however, we believe adding this information to the main part of the manuscript is not necessary.

      1. In page 5, "Therefore, the expression of miR-126-5p, miR-150-5p and miR-375 enabled efficient classification of the irradiated- and non-irradiated samples in both settings (Fig. S6C)";

      In page 6, "Interestingly, a set of 3 miRNAs quantified by qPCR in all of our previous experiments clearly visually distinguished irradiated from non-irradiated samples in the human analysis (Fig. 5A)",

      Which three of miRNAs,miR-150-5p,miR-375,miR-126-5p mentioned before or miR-150-5p,miR-375,miR-215-5p?Please clarify clearly.

      Thank you for the suggestion. We rephrased this fragment (lines 289-290).

      1. In page 4, "Since miRNA-containing exosomes.......high dose irradiation", Do you think that the differently expression of serum miRNAs partly results from exosomes? Low dose irradiation is also able to change exosomal miRNA profile,why only high dose irradiation is taken into account in paper while low dose irradiation is not?

      We believe that serum miRNA expression results in part from exosomes and, as an exploratory component of our work, aimed to verify whether the magnitude of changes in exosomal miRNA expression exceeded that in serum, improving the potential biomarker specificity to the extent that would justify the development of an arguably more complex and labour-intensive test utilizing exosome isolation. The sequencing of exosomal miRNA content was therefore performed as an exploratory analysis only after high radiation exposure. However, the lower amount of exosomal miRNA than obtained through the total miRNA extraction protocol offsets any benefit stemming from higher cellular specificity of the former, and, based on the results that were comparable with those obtained from sera, decided to not explore this concept further. We added this explanation to our manuscript as this issue was not clarified previously (lines 116-127 and 339-343).

      1. Are there any miRNAs that can clearly distinguish between high and low dose groups? If so, please clarify them in text.

      We now clarified this issue in discussion (lines 415-417).

      1. In page 7,"Importantly, similarities were observed in the level of both individual miRNAs and miRNA families", What part of result Comes to this conclusion?Please explain clearly.

      When describing similarities between human and animal studies, we refer to our previous work describing radiation-responsive miRNAs in mice and non-human primates. These similarities (and differences) are described in detail in Table 1. We added relevant references to Table 1 and to the cited sentence (line 352).

      1. In page 7, "We found that the most common putative tissue sources for differentially expressed miRNAs were hematopoietic and endothelial cells", Which part of result shows this sentence? Please point it out.

      This statement is not validated in our work explicitly but based on the results from references: Ludwig et al., 2016, de Rie et al., 2017 and Landgraf et al., 2007. Since Ludwig et al., de Rie et al. and Landgraf et al. generated excellent data of miRNA expression across human and mouse tissues and cell types that showed overlapping results for the miRNAs of interest, as detailed in Table 1, we did not perform additional confirmatory experiments.

      1. Were the patients suffering from cancer or other diseases? How to ensure that the differential expression of miRNA was caused by radiation exposure rather than their own disease? Please explain.

      As described above, initial experimental studies performed in animal models (mouse and macaque) in preparation for this study showed the specificity of miRNA (including ones in the signature) towards radiation exposure in different animal models. This was evidenced on multiple layers of validation and rescue experiments. Admittedly, a demonstration that additional diseases with a phenotype similarity with ARS affect study performance is an interesting concept, but it would be extremely unlikely to impair the performance of the test in an individual after radiation exposure. Namely, even if the examined patient has a hematologic malignancy or myelofibrosis potentially affecting the performance of the test, identification of such individuals as potentially irradiated would lead to them being followed-up adequately. Failure of the test to detect radiation exposure will likely not be severe risk, since such individuals will already be severely ill and under proper care with regular monitoring of bone marrow function. We are aware that some unforeseen and not discussed clinical factors may affect some facets of the test but the built-in robustness derived from having multiple miRNAs mitigates the risk of non-specificity.

  4. Dec 2021
    1. Author Response

      Reviewer #1 (Public Review):

      Giove and colleagues find that a perceptual effect, namely whether a flicker is perceived or unperceived, is reflected in metabolic signals measured with functional MRS, but not in BOLDfMRI. Specifically, perceived but not unperceived flicker led to an increase in lactate and glutamate in early visual cortex (a combination of V1, V2 and V3). BOLD-fMRI did not increase in this same region, suggesting that we are missing important neural signals by focusing on BOLD-fMRI only. The authors also provide a thorough discussion of the potential physiological mechanisms underlying these metabolic effects. I should note that I have no expertise in fMRS, and my assessment is based on knowledge of BOLD-fMRI and perception.

      Whether or not the flicker was visible was manipulated by changing the frequency of the flicker. Specific, a low frequency flicker (7.5 Hz) was perceived, but a high frequency flicker (30 Hz) was not. Of course, this means that it is difficult to assess whether the fMRS effects are related to perception itself (visible vs. invisible) or due to the low-level features of the stimulus, e.g. the temporal filtering properties of the visual system. This limitation does not however hinder the main conclusion of the paper, which is that certain neural signals are missed by BOLD-fMRI but can be picked up by fMRS.

      We thank the referee for these constructive comments. In this revision we further stress the importance of the argument suggested by the referee that MRS but not BOLDfMRI better reflects differences in information processing related to perception. In other words, the metabolic response of V1 can predict whether a visual stimulus is perceived or not. This of course does not necessarily involve causality. We argue that stimulus perception is inextricably linked with low-level features of the stimulus, i.e., perception is equivalent to the filtering of the stimulus, which in turn depends on stimulus characteristics.

      In Figure 2B, it looks like BOLD dynamics may differ between the slow and fast flicker blocks, even if the mean amplitude did not. So perhaps there are some more subtle BOLD differences between conditions that the authors do not explore.

      In this revision, we test for statistical differences in the BOLD time-course, as suggested by the referee. Please see our response in the “Essential Revisions” above.

      The authors themselves also raise a potential partial voluming issue in the fMRS measurement that seems important to consider, given the differential BOLD signal in nearby regions (V2 and V3). Specifically, the volume in which fMRS is measured consists of parts of V1, V2, and V3. There are no significant differences between perceived and unperceived BOLD-fMRI in this volume as a whole, but there are in V2 and V3 in isolation. This raises the possibility that the null effect of BOLD-fMRI in the fMRS volume as a whole is due to it washing out in this larger volume. Could it be that the fMRS effects are also driven by V2 and V3, but are for some reason stronger/more robust, and therefore survive in the larger volume? In other words, I wonder if the BOLD and fMRS effects may actually co-localise, but differ in effect size.

      This is an interesting possibility to consider, but unfortunately, it cannot be really addressed without the help of a tailored study. To attempt a minimally meaningful analysis we would need (at least) to know the partial volume of each individual subject for assessing whether and to what extent the partial volume correlates with the spectroscopic results. As stated above, we did not acquire single-subject retinotopic maps. Even with this piece of information, a reliable identification of multiple and spatially distinct components summing up to the single MRS signal would be problematic. A qualitative reply to the issue raised by the referee is that the metabolic response to visual stimulation measured with FDG-PET (an index of glucose utilization, and by extension, of lactate production) has been proposed to peak in V1 (e.g., Chen et al., HBM 2018 PMID: 30076750). Therefore, it is unlikely that V2/V3 contribute much more than V1 to the stimulation-induced increase in lactate and glutamate concentration. Furthermore, to the best of our knowledge all previously reported increases in lactate concentration during photic stimulation have been assigned to V1.

      In conclusion, the authors demonstrate an intriguing dissociation between BOLD-fMRI and fMRS, which should prompt further research into this topic, and may ultimately change the way we interpret neuroimaging signals.

      The referee has wonderfully summarized our study. Thank you.

      Reviewer #2 (Public Review):

      In this paper the authors investigate differences in metabolic response in primary visual cortex (V1) to perceptible and imperceptible stimuli using proton magnetic resonance spectroscopy (1h-MRS) and fMRI.

      The main strength of this paper is it shows that perceptible stimuli trigger a different metabolic response in V1 than imperceptible stimuli, namely that lactate and glutamate levels both increase for perceptible stimuli but are unchanged for imperceptible stimuli. Weaknesses of the study are that no retinotopic mapping was performed on the subjects so the spectroscopic voxel may contain contributions from early visual cortex outside V1; the assumption that increased BOLD response in V2 is caused by perception is not convincing.<br> The differences in concentration of lactate and glutamate are striking, and the only plausible explanation is differences in metabolic response in V1.

      This is the clear and main result of the paper. The argument that an increased activation in V2 is caused by perception is less interesting. More sophisticated experimentation and analysis including connectivity analysis would be required to investigate the interaction between V1 and the rest of the brain.

      This could considerably increase the importance of MRS in cognitive neuroscience. It would be fascinating to use dynamic causal modelling or a similar technique to explore connectivity between regions for perceptible/imperceptible stimuli and to combine this with proton spectroscopic imaging.

      We agree with the referee that our work cannot help in establishing a relationship between perception and activation of visual areas, and that more sophisticated investigations would be necessary for that purpose. We also acknowledge that our study suffers of the limitations mentioned by the referee. In the present revision we include, as limitations, the absence of retinotopic mapping and the lack of causality between perception and BOLD/MRS, as suggested by the referee. The idea of correlating brain connectivity with metabolic imaging (CSI or even FDG-PET) during different stimulation paradigms (or resting-state) is appealing, as recent combined PET/fMRI experiments showed that BOLD and glucose consumption (CMRglc) are dissociated in a region- and task-dependent manner (e.g., Stiernman et al, PNAS 2021).

      Reviewer #3 (Public Review):

      Di Nuzzo et al demonstrate here that perception of visual stimulation is reflected in dissociable neurometabolic -but not neurovascular- responses in human visual cortex. This work uses human neuroimaging to show the effects of perception on neuronal energy demands and is of great importance for the neuroscience community. The authors carefully designed a task that would elicit similar BOLD response in primary visual cortex (V1) for perceived or unperceived visual flickering. They combined fMRI BOLD measurements with functional MRS, to quantify the functional (BOLD) and metabolic (concentration of lactate and glutamate) responses during visual stimulation. While they found no differences in the BOLD response within V1 for perceived vs. unperceived visual flicker, the authors show increased levels of glutamate and lactate in V1 when the flicker is perceived, suggesting increased energy metabolism during perceived visual stimulation.

      We thank the referee for the careful and constructive reviews of our manuscript.

      While BOLD response within V1 does not differ between perceived and unperceived flicker (Figures 3B, 2C, 3C), the authors find enhanced BOLD in the lateral occipital cortex when the flicker is perceived (Figures 3B, 3D).

      The authors consider BOLD in secondary visual areas to be a surrogate measure of V1 output, indicating that stimulus processing during perceived stimulation results in enhanced V1 output. The spatial and temporal resolution more commonly used in human neuroimaging do not facilitate building relationships of input-output neuronal activity in a way analogous to animal neurophysiology. The assumption that BOLD activity in secondary visual areas reflects V1 output is very tightly linked to the unique architecture of the visual system; however, the paper would benefit from including the uncertainty of this assumption in the discussion.

      We agree with the referee, and we now mention the uncertainty of the assumption that the BOLD signal increase we observe in secondary visual areas does reflect a rise in the output from V1. We also mention the lack of direct measurement to support such assumption as a limitation of the study.

      The paper would further benefit from following a more standardised way of reporting preprocessing steps of the fMRI data, as well as a more detailed description of the statistical analyses on the fMRI data.

      We have carefully checked the methods section related to the fMRI data analysis (prepreprocessing and statistics) and modified the text where appropriate. Thank you for pointing this out.

      Finally, the authors have provided a series of well-chosen controls to ensure that their findings are not driven by differences in levels of attention between perceived and unperceived stimulation (Figure 1). The authors are commended on the quality of their figures, their choice of detailed graphs and the constructive use of additional media.

      We would like to thank the referee for the complimentary comment.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper investigates what functional properties emerge from training an anatomically-constrained neural network on a specific computational task-detection of looming visual stimuli. Several functional models are identified by optimizing a network model for this task, and one of these models matches several properties observed in the fly neurons that perform the task. The approach and results are interesting. I did feel that several aspects of the work could be described more clearly, and that the potential of the model to reveal important aspects of the computation could be probed more thoroughly.

      Inhibitory component of model. The interplay between excitatory and inhibitory components of the model could be explored in more detail. A specific aspect that is interesting is the inclusion of rectification in the inhibitory circuit. Rectification is motivated by the extra neuron in the circuit proving inhibition (lines 155-157), but it is not clear why an additional neuron would require rectification. Are their physiological measurements that indicate that the extra neuron introduces rectification, or is that a speculation? Exploring whether rectification is important would also be interesting - e.g. by removing it from the trained models, and/or training models on circuits in which rectification is absent. Lines 360-362 mention interesting response properties created by inhibition, but do not define what those are. Including some of these extensions of the basic model could highlight the potential of the model to make predictions about specific circuit features that are important for detection of looming stimuli.

      Thanks for this interesting comment. Please see the revision summary and Essential Revisions 4 for more information. For lines 360-362 in the first submission, we referred to the Fig. 10E, F, where some examples of the peripheral inhibition are shown.

      Intuition for second model class. One of the key results in the paper is the existence of two classes of solution to the optimization problem - one of which follows the expectation for a detector based on outward optical flow, and the other of which does not. It is important to explain intuitively how the ”inward” model is able to detect looming stimuli, given that it seems sensitive to the wrong optical flow features. This should be early - e.g. around lines 214-216.

      We agree. In the revised manuscript, we have modified some expressions around (former) lines 214-216 to say explicitly that the inward solutions are sensitive to hit stimuli coming from the side of the receptive field rather than the center.

      In general, the results would benefit from developing some arguments in more detail. One example is the paragraph on lines 232-237. The differences in performance in Figures 8C and D stick out to me as a reader, but I am not guided through those differences in the text. Intuition for why you see the change in relative performance of the two solution (lines 266-268) would similarly be helpful. Another example is lines 290-292. These are several examples in which more explanation would be helpful, but you could look at the results in general with this in mind.

      These are good suggestions. First, We added more explanations about the differences shown in the Figure 8C. Figure 8D is basically a re-plot of the red and orange curves in Figure 8C, and is to show the distance dependency of the miss signals. Second, the relative changes in the performance of the two solutions appear due to the fact that the ROC and PR curves are bounded from above and the loss function is bounded from below (by 0). The better-performing solution (inward in this case) in general has less space to improve compared with the other one. Third, we moved the comments about the angular-size encoder in the discussion section to the results section after the sentences starting in (former) lines 290-292.

      The performance of the two classes of solution becomes more similar as the number of neurons increases. A concern is that this reflects saturation of performance rather than actual equivalence of the models. Can you make the task harder, e.g. by adding distracting optical flow? That might help separate performance of the different models and avoid saturation.

      It is correct that the tasks are relatively easy for our model, and both outward and inward models with large enough population size can almost perfectly distinguish hit cases from others. In this revision, we engineered a new set of stimuli with rotational background flows. In this case, both inward and outward solutions are found, and the outward solutions tend to perform better than the inward ones. Though this particular choice of more difficult task seems to favor outward solutions, we find it difficult to interpret, for lack of experimental comparisons. Instead, in the discussion, we interpret this result to show the potentially strong dependence of the solution on the statistics of loom stimuli, which requires characterizing. For more details, see Essential Revisions 3.

      Figure 10: how did you chose the specific outward solution used in this figure? More generally, some measure of the similarity of model components with experiment across all outward models is important. Currently the text reads as if you chose one of many models that happened to have components that looked like those measured. This comes up again on lines 310-311 and 313-315.

      We have answered this question in the section of Essential Revisions 5. With the new, simpler model, it is no longer necessary to pick from among the distribution of solutions.

      Are there animals that detect looming stimuli with fewer loom detectors? If so it would be interesting to see if they have adopted a similar or different computation.

      This is a very interesting question. However, the authors are not sure about the number of loom detectors in other animals, and also not aware of the existence of the inward solutions in either flies or other animals. One related point to note is that the LPLC2 neuron and its computational structure are not the only way to detect looming events, and there are other loom sensitive neurons and neural circuits that receive very different types of visual signals, such as LC4 in flies and LGMD in locust, which do not appear to receive directional inputs.

      Reviewer #2 (Public Review):

      The manuscript from Zhou et al. investigates how certain features of looming-detecting neurons can arise from optimizing a shallow neural network to detect imminent collisions. The authors consider architectures that resemble the known anatomy of LPLC2 neurons in Drosophila, with excitatory inputs from the four layers of motion detectors in the lobula plate and inhibitory inputs from the interneurons in those layers. The authors find that some fraction of the trained networks exhibit tuning properties of LPLC2 neurons, including (a) similar response profiles to stimuli that are not present in the training data; (b) similar dependence on the angular size of the looming object as opposed to angular velocity; and (c) similar dependence between peak response time and the ratio of size to speed of the looming object. The authors also find another solution among the trained networks that is very different from the biological circuits. However, they show that this other solution becomes less common as the number of neurons grows, which is the relevant regime for the biological circuit. This paper adds to a body of work that suggests that the structural or functional properties of brain circuits are the solution to an optimization problem implied by the task that they have to perform – in this case, the ability to detect looming motion.

      The conclusions of the paper seem well supported within the class of models that was considered. The choice of class is, however, rather narrow and could be better explained and analyzed.

      1. One potentially confusing aspect of the work is that there are in fact three major types of solutions that are found, not only two as described in the abstract: apart from ”outward” (similar to LPLC2) and ”inward” (dissimilar to LPLC2) there are also ”unstructured” solutions that, as far as I understand, basically fail to perform the task – although their performance isn’t adequately discussed. The authors comment on this in the Discussion, suggesting that the unstructured networks are local optima where the stochastic gradient descent algorithm they use for optimization gets stuck. They argue that evolutionary processes would be unlikely to linger there, implying that it might be fine to ignore these solutions. While reasonable, this claim is difficult to assess without more discussion of these results. These solutions are not a rare occurrence: according to the Methods, over half of the trained networks end up in the ”unstructured” pile.

      In our initial submission, the term ’unstructured solution’ was an unfortunate name to use for these solutions. In this revision, we call them ’zero solutions’, since all the elements in the filters are zero (or very close to zero). Please see Essential Revision 2 for a more detailed answer to this comment.

      1. The stimuli used in the paper are very simple: basically rigid, featureless objects moving in a straight line and at constant velocity, or rotating at constant angular velocity. Naturalistic stimuli are likely to be much more complex, which could hurt the training process. This is only briefly touched upon in the Discussion, leaving open the question of how the results of this work would change in more natural settings.

      This is an interesting point. Please see Essential Revision 3 for our responses and changes.

      1. The authors impose a 90-degree rotation symmetry as well as a reflection symmetry on the connection weights to the four layers of motion detectors that are sensitive to the four cardinal directions. Given that the training data that is used also has these symmetries, the question arises whether imposing these symmetries by hand was necessary. This is unfortunately not discussed in the paper.

      The imposed symmetries are not strictly necessary. Please see Essential Revisions 1 for details about how we have addressed this comment.

      1. One highly confusing aspect is that there is, in fact, an additional symmetry: the same filters are used for all the subunits. The difference between the different subunits seems to be only in the inputs that they receive – i.e., that they are responding to different parts of the visual field. This is only really apparent from the Methods. Given again the rotational symmetry of the inputs, it would be reasonable to assume that this symmetry could be learned, but this isn’t discussed or explained properly.

      Yes, we agree that this symmetry could be learned, but this requires a lot more training data, which is not practical in terms of computational cost. In addition, this imposed across-unit symmetry makes different models with different M’s have the same number of parameters, which is a nice property to have when studying how the population size affects the model performance and trained filters.

      1. The authors say that the ”outward” model reproduces biology but I’m not sure that the details of the lobula plate circuitry match this claim. For instance, LPi neurons typically have broad arbors, making location specific inhibitory inputs unlikely. And is there evidence that the inhibitory inputs are limited to a small region, like in the model?

      The LPi neurons seem to be similar in size to the LPLC2 dendrites in the lobular plate (Klapoetke et al. (Nature, 2017), Figure 5K and Extended Data Figure 9). In our outward models (both linear receptive and rectified inhibition), the inhibitory components are larger than the excitatory components when the number of units is large, which is at least consistent with potentially larger pooling of inhibitory signals than excitatory ones. Please refer to Essential Revision 4.

      1. Why not test the predictions of the model by analyzing the inputs onto the LPLC2 neurons using connectomics datasets?

      We would have loved to do this. Regrettably, the hemibrain dataset lopped off virtually all of the lobula plate. Our response to Essential Revision 1 expands a bit more on this point.

      Reviewer #3 (Public Review):

      Although collision detecting neurons have been identified across animals, the computations they perform remain unresolved. Here, Zhou et. al train artificial neural networks to predict collisions across a diverse set of visual stimuli and constrain network geometry using the known anatomy of a Drosophila looming detector cell type, LPLC2. Zhou et al demonstrate that trained networks converge upon three solution types: an unstructured solution, a solution where inward motion is excitatory, and a solution where outward motion is excitatory. Interestingly, the solution excited by outward motion is also inhibited by inward motion as predicted for LPLC2 computations, and the output of these trained networks is highly similar to measured LPLC2 responses across stimuli.

      1.Strengths: a. The novelty of this study is that the networks are trained to solve a problem(collision detection) instead of being trained on neural data, but as a result are able to reproduce neural data. b. The authors investigate how collision detection solutions change when looming is computed by a single neuron versus a population of neurons. This is particularly interesting because looming detectors have been identified at both population and single neuron levels. These results shed light on why many different collision detection computations have been proposed across neurons and across species, as they may face different anatomical constraints. The results also provide novel computations that can be further investigated in vivo. c. The manuscript is well written, the figures are clear, and the movies are very helpful in understanding the approach and the results.

      2.Limitations: a. The findings could be strengthened by a more thorough characterization across the different solutions. For example, only two of many outward solutions are compared to actual neural data, and there is no explanation for why these two solutions were selected and whether they are representative of the entire category of outward solutions. There is also no metric for evaluating how well these solutions match the neural data.

      For a more detailed response to this comment, please see Essential Revisions 5. In particular, our focus on the linear receptive field model has eliminated this issue with the distribution of solutions in the main presentation of the results. We believe this is overall less confusing than the prior presentation of the more complicated rectified inhibition model.

      b. The inward solutions are dropped from the last section of the paper; however, it would be very interesting to see the output of example inward solutions in comparison to actual neural data.

      Please see Essential Revisions 2. We have added the inward solutions to Figure 10 in the supplemental figures.

      c. Within outward solutions, there are cases where inward inhibition is completely absent which does not follow what is known about LPLC2. The authors should mention this and also provide a comparison between outward solutions with or without inhibition.

      With the simpler, linear RF model, these are no longer the focus of the study. They do still exist in the rectified inhibition model solutions, which have substantial variability.

    1. Author Response

      Reviewer #1 (Public Review):

      This group has examined basement membrane composition using sophisticated technical methods previously. Here they have methodically examined kidney organoids for their resemblance to mammalian foetal kidneys in the temporal expression of membrane proteins. They continue this through to adulthood and use peripheral blood leucocytes to demonstrate the effect of a COL4A5 mutation on the expression of basement membrane components. The manuscript's strengths are its methodical nature and the number of proteins examined, as well as building on previous work. Its weaknesses are that we do not know how good a model the organoid is for Alport syndrome and whether it results in an intact glomerular basement membrane. So far, this manuscript has demonstrated that the organoids are consistent with what we know - but can it also tell us new things? In addition, it has only examined one pathogenic Alport COL4A5 variant and this person also had a COL4A4 variant and thus complicated disease.

      Thank you for reviewing our paper and for highlighting the strengths of our work. Regarding the novelty, we consider the primary advance of our manuscript is the focus on the assembly and remodelling of extracellular matrix in kidney development. Through this focus we demonstrate that kidney organoids are a valuable human, multicellular system correlating with the matrix changes observed in mammalian kidney development at both gene expression and protein levels. Finally, we have shown that human kidney organoids can be used to study basement membrane assembly in health and disease, using Alport patient-derived organoids. As far as we are aware, this is the first time-course study using organoids to track the intrinsic changes in basement membranes during development. As such it will facilitate further studies into developmental transitions in basement membrane components during kidney development and permit detailed evaluation of the early changes that occur in genetic conditions that affect basement membrane assembly.

      Reviewer #2 (Public Review):

      Morais et al provide a convincing model for understanding basement membrane (BM) biology and interactions of BM components. The key findings of this paper are to establish a model that recapitulates the same biology and chemistry that is occuring during equivalent kidney development in humans, primarily. Utilizing kidney organoids, the authors characterize the spatiotemporal relationship of the proteins within kidney organioids as they form distinct basement membrane structures. They kidney is vital system in itself for understanding basement membranes among many different organs/tissues as they kidney has served as a genesis for discoveries over the last 60 years. Here the authors describe not only the timing of proteins in the development of kidney organoid BMs, but also the spatial relationships. Importantly, as a kidney BM model, the authors recapitulated the disease state of Alport syndrome (AS), a syndrome involving the disruption of the collagen IV α345 network in kidneys, an essential component of kidney BMs. Furthermore, they find that this model of kidney organoids derived from AS patients had the same hallmarks during development as AS in a human patient, including laminin overcompensation as a result of α345 network disruption.

      This manuscript provides an invaluable model for understanding overall BM biology and disease progression, and especially so for kidney BM biology and kidney diseases. The potential for this model to study any number of missense variations within any number of proteins within a tractable and functionally identical BM is worth noting and exploring by other researchers.

      We thank the reviewer for highlighting the strengths of our manuscript.

      In general, the weaknesses of this article are insignificant as this manuscript aims to provide functional proof of concept of kidney organoids as a model for understanding human BM disease. Importantly, however, is the assumption that kidney BMs might represent all BMs. The diversity of BMs across tissues within humans alone is significant. Amongst different organisms from a broader evolutionary standpoint than just fly, C. elegans, mouse, and human, BMs are very likely exceptionally diverse from the earliest animal BMs to different human tissues BMs. While this model provides an important model for understanding BM biology, a caveat that a kidney BM will functionally differ from a lens BM should be apparent and noted. However, the open-ended question of how to create tractable models like kidney organoids in other tissues systems will be of use in stimulating the matrix, proteomic, and structural biology fields.

      Thank you for this insightful comment. We agree that BMs are diverse and dynamic both in composition and structure throughout life. We have made alterations throughout the manuscript to highlight this point and to further emphasise the focus of this manuscript on kidney development.

      Reviewer #3 (Public Review):

      The emergence of methods to convert human induced pluripotent stem cells (iPSCs) into cultured kidney organoids that phenocopy the normal progression of embryonic and fetal differentiation represent a major advance in the study of normal and defective renal morphogenesis. This progress has been enriched by the addition of temporal/cell-type specific proteomics.

      The current study largely focusses on the site-specific compositional changes that occur in basement membranes (BM) that form on different abluminal cell surfaces as differentiation advances. A general model of BM assembly from earlier studies provides a foundation upon which to interpret organoid kidney development. Laminins initiate BM assembly by binding to cognate cell-surface receptors, polymerizing, and binding to secreted nidogens, proteoglycans and collagen type IV, the last forming a second stabilizing polymer network. The iPSC differentiation system reveals the assembly and turnover of BM components consistent with the above, but now provides detailed information on the accumulation and turnover of different components in the key cell types through the different steps of differentiation with proteomic correlation. The approach also enables the analysis of the assembly defects and consequences arising from human congenital diseases as was shown with a type IV collagen alpha 5 subunit in organoids derived from Alport cells.

      In combining organoid kidney culturing with laser microdissection and proteomic analysis, the authors have advanced use of the new tool compared to a 2018 study (Hale LJ et al., Nat. Communications), pushing the model from 18 to 25 days of differentiation and focusing more on BM formation during development. Evidence is presented to show that the major cell types, importantly including vascular endothelial cells, appear in the organoids in a temporal sequence. Relevant changes in BM-associated components are also shown. BM staining patterns are shown to change with emergence of laminin alpha5, laminin beta2 and collagen-IV alpha3 (replacing laminin beta1 and collagen-IV alpha1/2) at later stages. Organoids generated from iPSC cells derived from an X-linked missense variant of COL4A5 generated glomeruli containing alpha3/4/5, but with increases in laminin beta2, a known compensatory outcome.

      The evaluation of later renal differentiation stages is particularly critical for the study of the glomerulus in which BM components undergo isoform switches that normally correlate with glomerular vascularization. A limitation of previous differentiation glomerular models has been the inability to show formation of the vascular tuft and the associated morphological changes as well as to show podocytes form inter-digitations. In that light, the current study could be strengthened by showing the ultrastructure of the day 25 glomeruli with identification of the BMs and different glomerular cell types (noting in particular if vascular endothelial cells are beginning to organize into the morphology of vascular tufts), and revealing the appearance of podocyte processes. It would also benefit the reader to enumerate the strengths as well as limitations with the culture model and how this work compares to previous studies.

      The current submissions addresses temporal and tissue-specific BM changes during organoid kidney development. Day 25 kidney organoids contained tubules, stroma, and glomeruli with partial resemblance to (mouse) E19 kidney. Tubular and glomerular BMs are seen to form, the latter showing the expected switch from alpha-1/beta-1 laminins to alpha-5/beta-2 laminins, and alpha1/2 type IV collagens to alpha3-containing type IV collagens required for glomerular maturation.

      Thank you for reviewing our manuscript and for your summary of how our findings relate to other seminal studies in the field of basement membrane assembly.

    1. Author Response:

      Reviewer #2:

      The overall approach of this study is to compare gametocyte related parameters of infected blood samples from asymptomatic children, in some cases followed over time, with matched samples from uncomplicated malaria infections during the transmission season. A variety of parameters are analysed to investigate which mechanisms are used by the parasite to ensure that gametocytes are present in adequate numbers at the end of the dry season to be efficiently taken by mosquitoes reappearing at the start of the rainy season.

      Authors analyse expression levels of 333 P. falciparum gametocyte specific transcripts from a previously published transcriptome study on infected blood samples from asymptomatic children and from children with uncomplicated malaria. They conduct a Principal Component Analysis on 12 samples from each condition, revealing that PCA1 explains 65% of the seasonality variance, and identify 146 transcripts upregulated and 59 downregulated in the asymptomatic vs symptomatic carriers. This result may be expected as gametocyte physiology is likely different in the two very different conditions.

      The following step is investigating male and female gametocyte densities and relative proportions on asexual stages by Realtime analysis of specific transcripts. The main conclusions of this part of the manuscript are 1- that symptomatic malaria cases in the wet season have higher parasite levels and consequently gametocyte densities than asymptomatic cases in the dry season and 2- that the lower parasitaemia levels in the asymptomatic cases show a comparatively higher proportion of sexual stages, precisely of female and, albeit not significantly, of male gametocytes.

      Some observations and unclear points on this part of the work are the following:

      • More details are needed to clarify how density and proportion of the different parasite stages presented in Figure 1 panels C, D and E have been derived from the Realtime experiments.

      • Male and female gametocyte numbers are calculated using published calibration curves for genes pfs25 (female) and pfMGET (males). Incidentally, these are published in https://malariajournal.biomedcentral.com/articles/10.1186/s12936-018-2584-y and not in reference 32.

      This has now been corrected.

      • The calibration curves for the pfGLY and the pf17 transcripts, used to quantify total parasites and total gametocytes, respectively, and those used to quantify sexual commitment are instead missing and should be shown or referred to. This is relevant both for the determination of parasite numbers and for the use of the DeltaDeltaCt method.

      We now provide the standard curves obtained with serial dilutions of synthetic DNA of P25, PfMGET, Pfg17 and Glycine-tRNA ligase and the respective primer efficiencies in the new Fig. S3.

      • In panel C, it is intriguing that total parasite densities coincide with gametocyte densities. Comparing total gametocyte density in panel C with those of male and female gametocytes in panel D, it is puzzling that the former, quantified by the gametocyte specific marker pfg17, dramatically differs from those determined using pfs25 and pfMGET.

      We believe it is difficult to compare levels of gametocyte densities across different transcripts, because the best approximation to equal primer efficiencies and amplifications often falls short of perfect. Prior reports by Essuman et al. JID 2017 also report the gametocyte specific marker Pfg17 to clearly outperform P25 in detecting gametocyte positive individuals in asymptomatic carriers potentially explaining the observed differences.

      • It would be interesting to see if the proportion of male and female gametocytes on total parasites, calculated by the DeltaDeltaCt method in panel E, is comparable with the value that can be calculated from the density data in panels C and D.

      • Finally, it would be valuable to derive from these data the gametocyte sex ratio to more synthetically describe the different infections.

      We used qPCR and made use of standard curves to quantify the density of total, and male and female gametocytes to allow comparisons between groups of samples over time and different clinical symptoms. We did not compare these attributes between or across the groups, given the difficulty in achieving perfect purity in the standards with sorted gametocytes and the limitations imposed by imperfect match in different primers efficiencies'. We believe, however, that our approach is reasonable and adapted to the questions addressed and proposed interpretations.

      A general point on this part of the work is that the transcriptome work was conducted on 12 vs 12 samples from asymptomatic and clinical cases, whereas the subsequent experiments were conducted on 35 samples from the dry season and 27 clinical cases. This is not very clear from the text and it rather looks like that the latter experiments were designed to better characterise the samples used in the transcriptome analysis. Also, the subsequent analysis of phospholipid levels and of the sexual commitment transcript levels appear to be performed on a subset of the samples.

      We used different sets of samples for the different analyses of Malian samples and we improved the corresponding explanations within the results section. In figure 1 samples collected in 2012, the same year as the 12 + 12 samples used in the RNAseq analysis reported in Andrade et al. (presented on the PCA and heatmap) were used for both parasite transcript levels and plasma phospholipid analyses. In figures 2 and 4 we show samples collected in 2017/18 to better characterize the phenotype seen in Fig 1 with the inclusion of more time-points and longitudinal analyses of parasite transcript levels and plasma phospholipids. We revised the text to improve the presentation of the correct experimental layout throughout the results section.

      In the following section, authors describe a longitudinal study on a smaller cohort of children from the above asymptomatic cases. Analysis of parasite densities, of phospholipid levels and of transcript levels of genes involved in sexual commitment essentially confirms the single point analyses described in the first part of the manuscript. The issues raised above on the determination of male and female gametocyte numbers and relative proportions conducted on these samples apply here as well.

      As 2-4fold reduced levels of phospholipids, described to affect rate of gametocytogenesis, were measured in the samples from uncomplicated malaria, sera from symptomatic and asymptomatic cases were compared to measure whether those with reduced phospholipid levels induced sexual commitment of in vitro cultivated parasites. However, as authors themselves state, the observed differences in Lyso-PC levels between the sera were predictably too small to produce an effect in these experiments, as compared with the fold difference described to see effect on sexual commitment.

      Having concluded so far that level of early gametocyte and sexual commitment markers do not change in the wet vs the dry season and that the low parasitaemias associated with the latter condition show a higher proportion of sexual stage transcripts/cells, another mRNA expression comparison has been conducted on a set of 163 transcripts selected as some "define early stages" and others "are characteristic of late stages". Assumptions, rationale and conclusions of this analysis are not entirely clear. The predictive value of the gene sets is not clear: available gametocyte stage-specific transcriptomic data fail to identify large numbers of transcripts unambiguously and specifically associated to early vs late gametocytes, so use of the 163 genes requires details on their stage specific diagnostic power. The analysis shows that 43 out of 54 "late gametocyte" transcripts vs 42 out of 109 "early gametocyte" transcripts are upregulated in the samples from asymptomatic cases, which leads authors to propose that this condition has "an effect during the 8-12 days of gametocyte development, in both/either the sexual and/or the asexual parasite compartments". This part requires attention and a clearer formulation of rationale and conclusions.

      We now clarify how the subset of Early/Late gametocyte genes was defined and added a new column (F) in Table S1 to help the reader.

      In conclusion, this work produced evidences consistent with the hypothesis that the efficient clearance of asexual stages in low parasitaemia asymptomatic infections explains the increase in the proportion of mature circulating gametocytes.

    1. Author Response

      Reviewer #1 (Public Review):

      Welzel and Schuster propose an interesting hypothesis that connexins replaced innexins in chordate gap junctions due to an evolutionary bottle neck. The majority of the animal phyla possess multiple innexins, and some of them form gap junctions while others do not. Chordates have only three innexins which are unable to form gap junctions due to glycosylation of extracellular loops; chordate gap junctions are built of connexins. The authors analysed innexin sequences from multiple chordate and non-chordate phyla and discovered that non-chordates possess both types of innexins: with and without N-glycosylation sites in their extracellular loops. Because in chordates there are only glycosylated innexins, the authors conclude that non-glycosylated innexins were lost in the last common ancestor of chordates (bottleneck effect). While in Lancelets there are no connexins, they are present in Tunicates evidencing that they evolved in the last common ancestor of Tunicates and Vertebrates.

      Strengths

      The strength of this work is that the authors analysed over 2000 innexins from multiple animal groups: seven non-chordate phyla and nine groups of chordates; both non-bilaterian and bilaterian phyla were included. This comprehensive dataset allowed the authors to propose a general scenario of gap junction evolution. In my opinion, the bioinformatics analysis reported in this work convincingly support the bottleneck mechanism in the innexin evolution.

      Weaknesses

      The weaknesses of the work are rather minor and do not affect the main conclusions: (1) There is no experimental proof that the glycosylation occurs on the same motives in all the animal phyla. However, such experiments would be quite challenging therefore bioinformatics prediction is an acceptable approach. (2) There is no experimental proof that the glycosylated innexins of Lancelets and Tunicates don't form a gap junction. In some animals glycosylated innexins still do form gap junctions therefore it is possible that it is the case in Lancelets and Tunicates too, especially considering that the glycosylation sites are not conserved between Vertebrates, Tunicates and Lancelets. In this case it would mean that after the loss of multiple innexins in the last common ancestor of chordates, innexins lost the ability to form gap junctions only in vertebrates.

      We now discus this scenario in more detail (page 8). It is correct that the glycosylation sites are conserved across the vertebrates but not also between them, the tunicates and lancelets. However, the alignment of the four lancelet innexins – that we now have additionally included to Figure 2 – demonstrates that the glycosylation sites are highly conserved within lancelets. We fully agree that there is no experimental proof that the lancelet innexins do not form gap junctions and include this possibility.

      (3) It is not clear if the authors aimed to use only genomic data or both genomic and transcriptomic. While it is stated that "The taxonomic groups that we have analyzed in this study were constrained by the availability of publicly available genomic data", the majority of datasets available on the Neurobase (used in this study) are ctenophore transcriptomes; the only ctenophore genome dataset is from Pleurobrachia bachei. Currently there are two other ctenophore genomes available: from Mnemiopsis leidyi (Ryan JF et al, 2013) and Hormiphora calfornensis (Schultz D et al, 2021). Additionally, genomic data are available for more cnidarian species too (e.g. sea anemones Nematostella vectensis, Exaiptasia pallida, Actinia tenebrosa and multiple corals (Shinzato C et al, 2020)).

      Thank you for pointing this out. We added the missing information about the transcriptomic data to the materials and methods section. In addition, we also added the innexin sequences of 14 additional species to our analysis (3 ctenophores and 11 cnidarians).

      (4) Line 210: “innexins were recruited as gap junction proteins in the common cnidarian/bilaterian ancestor” - gap junctions have been reported in ctenophores as well (Satterlie RA & Case JF, 1978) therefore it probably happened much earlier (in the last common ancestor of animals).

      We agree, thank you! We clarified this. (page 8, line 204).

      Reviewer #2 (Public Review):

      1) Understanding exactly the situation in chordates and non-chordate deuterostomes is key to accurately reconstructing the evolutionary steps at the base of chordates. The authors should increase their sampling in these important groups and include hemichordates and other xenambulacrarians.

      We absolutely agree and have increased the sampling at the base of chordates including genomic as well as transcriptomic data of xenacoelomorphs, echinoderms and hemichordates into our analysis. We feel that it really was worth the effort: We identified innexins in 4 xenacoelomorphs and in 4 echinoderms (where previously no innexins were known). We also found innexin-like fragments in hemichordate transcriptomes suggesting that also hemichordates have innexins. We did not find connexins in these taxa (see Fig. 3).

      In Fig. 2. the alignments could include the non-vertebrate chordates (tunicate, lancelet) and lampreys to show whether the NGS sites are conserved in these taxa.

      Yes, this is a good point! We include innexin alignments of tunicates, lancelets and lampreys in Fig. 2.

      Tunicates have both innexins and connexins, does the NGS in innexin align to that of vertebrates?

      The NGS in tunicates are highly variable and do not align to that of vertebrates. We assume that this is due to the extremely fast evolution of the tunicate genomes and the high amino acid evolutionary rates in tunicates. We have discussed the conservation of NGS in tunicates (page 6, line 144).

      Please also show the situation with hemichordates in Fig 3.

      Yes, see above.

      2) The authors should discuss the genomic patterns also in light of the ultrastructural evidence from the literature. For example, their data suggest that cephalochordates lack gap junctions.

      "The most important finding is that the sequence of the only innexin of lancelets, which do not yet express connexins (Mikalsen et al., 2021; Slivko-Koltchik et al., 2019) (Figure 3D), contains a NGS in its extracellular loop 1. This suggests that the most basal chordates not only had a limited number of innexins but might also not be able to form functional gap junctions"

      Does this mean that lancelets have no gap junctions? The authors in particular should check and discuss these studies:

      Tissue and Cell Volume 19, Issue 3, 1987, Pages 399-411 Cell junctions in amphioxus (Cephalochordata): A thin section and freeze-fracture study https://doi.org/10.1016/0040-8166(87)90035-8

      This study finds no gap junctions in amphioxus epidermis, alimentary tract and notochord.

      Primary Sensory Cells in the Skin of Amphioxus (Branchiostoma lanceolatum (P)) Erik Baatrup, 1981 https://doi.org/10.1111/j.1463-6395.1981.tb00624.x

      In particular: "This agrees with the description (Baskin 1975) of the epidermal junctional complex of Branchiostoma californiense, but in addition this author found a membrane apposition resembling a gap junction. This was not observed in the present investigation of Branchiostoma lanceolatum.""

      but some authors described gap junction like structures https://europepmc.org/article/med/2628486

      Gap junctions are common in tunicates, this should also be mentioned:

      Georges, 1979 D. Georges, Gap and tight junctions in tunicates. Study in conventional and freeze-fracture techniques Tissue & Cell, 11 (1979), pp. 781-792

      In echinoderms, there are gap junctions but no connexins BMC Evol Biol. 2019 Feb 26;19(Suppl 1):46. doi: 10.1186/s12862-019-1369-4. Are there gap junctions without connexins or pannexins?

      Georgy A Slivko-Koltchik 1 , Victor P Kuznetsov 1 , Yuri V Panchin 2 3 PMID: 30813901 PMCID: PMC6391747 DOI: 10.1186/s12862-019-1369-4

      Thank you for pointing this out. We have discussed the question whether lancelets have gap junctions or not in the revised manuscript and added the suggested literature (page 8, line 190).

      Reviewer #3 (Public Review):

      The gap junction-forming proteins, vertebrate connexins and invertebrate innexins, are two distinct protein families with very similar structures and functions. In the process of evolution, innexins first arose in invertebrates, followed by connexins in vertebrates.

      The authors focused on the extracellular glycosylation site in innexins, that inhibit channel coupling between two cells, and analyzed available innexin sequences using the genomic database and reported sequences.

      The results showed, as phylogenetic evolution progresses, innexins lose their diversity and converge only on innexins that undergo glycosylation. And connexins without glycosylation sites arose as new gap junction-forming proteins. The authors proposed a new evolutionary scenario in which the switching of gap junction protein from invertebrate innexins to vertebrate connexins is due to the loss of diversity (especially glycosylation) of innexins.

      Strengths:

      This study, which focuses on the molecular evolution involved in the biologically important mechanism of gap junctions, is significant, and will influence many future studies. Overall, the data were properly analyzed, and the visible diagrams have been created based on a vast amount of analysis. 

      Weaknesses:

      1) The authors discussed the decrease or appearance of specific genes based on the results obtained from comprehensive sequence analysis. However, in order to discuss the number of specific genes in each animal species, especially to prove that a particular gene does not exist, the quantity and quality of the genome database greatly affect the results. It is unlikely that no gap junction proteins present at all in Echinoderms. For animal phyla for which accurate sequence data are scarce, an additional search that includes TSA will yield better results.

      Thank you very much for pointing this out. As suggested by you as well as by reviewer 1, we have additionally searched for innexins and connexins in the non-chordate deuterostomes (xenacoelomorphs, echinoderms, hemichordates) as well as lancelets, tunicates and lampreys by including the NCBI TSA databases. By doing this, we additionally found innexins of two lancelets, four xenacoelomorphs and four echinoderms.

      2) The authors proposed a scenario in which connexins emerged due to the loss of gap junction forming ability of innexins during evolution. However, this study focused only on the presence or absence of glycosylation modifications and did not consider the number of proteins in the innexin and connexin families per each animal species. Normally, gap junction-forming proteins have multiple family proteins in each animal species, and these proteins are combined to regulate channel function. The authors' scenario does not explain the small number of variety of innexin and connexin family proteins found in each phylum of Echinoderms and lancelets, and this needs to be discussed.

      Thank you for pointing out this key aspect! Above all, it is the loss of functional diversity of putative gap junctions that is cause by having only one type of innexin. We therefore already prepared this crucial point in the introduction and before introducing the bottleneck idea. It is a very important idea to make clear how a small number of innexins limits the number of functionally specialized combinations of heterotypic/heteromeric gap junction channels (page 2 and 8). It is also a good idea to directly display the mean number of connexins and innexins per each animal species in Figure 3B. This also should help to emphasize our main point: the decrease of innexin diversity and the increase of connexin diversity during early chordate evolution.

    1. Author Response

      Reviewer #1 (Public Review):

      In the present work Valperga and de Bono performed a forward genetic screen to identify candidate genes that would fulfill two criteria when mutant: 1) enhance an escape response to high ambient oxygen but 2) without modifications in the respective oxygen sensing neurons. They found that qui-1 mutants meet these criteria. qui-1 is known to act in the nociceptive neurons ASH and ADL (among others). The authors show that in qui-1 mutants ADL neurons are defective in normal chemo-sensation and upregulate neuropeptide secretion. This is associated with increased gene expression of neurosecretion components in ADL, among them two GPCR receptors (npr-22 and tkr-1); mutants in these receptors partially phenocopy the neurosecretion phenotype. The authors suggest an intriguing model in which ADL, upon loss of its normal sensory properties, relays peptidergic input from oxygen sensory circuits to peptidergic output towards yet unidentified downstream circuitry. This novel mechanism of sensory cross modality expands on on previous work on cross modality in C. elegans, where until now only one example been demonstrated, and where a different mechanisms than in the present study was described (Rabinowitch 2016). These findings could serve as generalizable models for other systems where cross-modal plasticity has been observed. Although many conclusions in this work are substantiated by cell specific rescue of qui-1 in ADL others are made based on correlated observations only. The study therefore would benefit from additional experiments that demonstrate a causal link between elevated neurosecretion in ADL and the associated changes in behavior. This could be achieved by ADL cell ablation experiments and specific interference with ADL neurosecretion.

      We thank the reviewer for this analysis of our work. We sought to address points raised in this summary using her/his suggestions.

      Reviewer #2 (Public Review):

      Loss of one sensory modality is often compensated with an increase in another sensory modality. Valperga and de Bono identify a possibly conserved mechanism that appears to heighten the worm's sensitivity to O2 while dampening other sensory responses. The mechanism that they discover suggests that increased neuropeptide secretion could be responsible for the overcompensation for a loss of a sense. The combined data based on forward genetic screening and behavioral analysis, imaging and genomics are convincing and interesting.

      1. I very much enjoyed reading a manuscript that uses 'good old' forward genetics to make an interesting discovery!

      2. The paper is well written and very easy to follow. The data quality and their display in the figures are very convincing, too.

      3. The proposed mechanism of using enhanced neuropeptide secretions for compensating the loss of one sensory modality with an increase of function of another is novel and could indeed be conserved.

      We are grateful to the reviewer for the encouraging review of our work.

      Reviewer #3 (Public Review):

      The work by Valperga and de Bono aims to uncover molecular components of cross-modal plasticity, a system-wide form of neuronal remodeling that responds to sensory loss by altering the performance of remaining sensory modalities. The study focuses on the interplay between oxygen-sensing and pheromone detection in C. elegans. The data presented are mostly convincing and revealing. However, the message and the overall context within which the findings are framed are problematic.

      The authors rightly assert that the molecular processes underlying cross-modal plasticity are not fully understood. However, they emphasize that the important challenge is to reveal genetic lesions that result in sensory loss and drive cross-modal plasticity. I find this to be over-specific and imprecise. There are many possible causes for sensory loss, some are genetic, some are non-genetic (e.g., certain diseases and injuries). In any case, the causes for sensory loss are usually independent of the processes that give rise to cross-modal plasticity. The genetics behind cross-modal plasticity enables the response to sensory loss, it does not cause the sensory loss. Genetic lesions to genes involved in cross-modal plasticity disrupt cross-modal plasticity, they don't induce it. Curiously, the authors sought to find single genes whose removal is simultaneously associated with both the loss of a sensory modality and the enhancement of another. This was done using a forward genetic screen for C. elegans mutants displaying enhanced oxygen sensation.

      We thank our reviewer for her/his thoughtful comments. We have revised our introduction to take account of her/his comments, and to remove the misleading statements s/he highlights.

      The analysis was further complicated by the fact that the screen was performed on strains whose oxygen sensitivity is already modified due to dysregulated activity in the RMG hub-and-spoke neural circuit, which integrates diverse sensory signals to control locomotion. Mutagenesis was performed on either the N2 strain, exhibiting RMG suppression, and thus decreased oxygen sensitivity, or flp-21 mutants, displaying excessive RMG activation, and increased oxygen sensitivity.

      We chose two genetic backgrounds for our mutant screens that attenuate the output of the RMG hub interneurons. Both backgrounds include a gain-of-function allele of the neuropeptide receptor NPR-1 that inhibits RMG output. The NPR-1 receptor has multiple peptide ligands, so in the second screen we reduced NPR-1 inhibitory signalling by deleting one these ligands, FLP-21. Neither of the two strains we used, N2 or flp-21, show appreciable O2 responses on food, and do not aggregate or accumulate on thicker parts of the food lawn, facilitating our screen (See Figure 1B).

      The screen yielded a gene, qui-1, whose dysfunction led to enhanced oxygen sensing (it is unclear if this is in the N2 or flp-21 background). The authors found that increased neuropeptide release from the pheromone-sensing neuron ADL underlies the increase in oxygen sensitivity. Furthermore, the qui-1 mutation was shown to diminish ADL pheromone responses. Therefore, a very particular genetic coupling between loss of pheromone sensation and enhanced oxygen sensitivity was revealed.

      We have indicated the parental origin of the qui-1 mutant in the revised manuscript.

      To generalize this finding, several additional mutant genes (not from the screen) were examined, including genes from the BBS family as well as wrt-6 and fig-1. They too displayed enhanced oxygen sensing linked to increased ADL neuropeptide secretion. However, their effects on ADL pheromone sensation were not reported. The main conclusion I draw from these findings is that the ADL neurons are able to modulate oxygen sensitivity by relaying information about oxygen levels from the RMG circuit to locomotor circuits via neuropeptide secretion. It is not at all clear that loss of pheromone sensation in the qui-1 case is the cause for increased neuropeptide release, or whether it is just one out of the many outcomes of mutating this gene. A much cleaner and more revealing experiment could have been, for example, to examine worms lacking the functional pheromone receptor OCR-2 in ADL. In fact, unlike qui-1 mutants who showed diminished oxygen responses in ADL, previous work from the de Bono group (Fenk and de Bono 2017) demonstrated that ADL O2 response are normal in ocr-2 mutants, indicating a profound difference between loss of pheromone sensitivity due to receptor dysfunction (ocr-2) and the unknown and broad effects of qui-1.

      We thanks the reviewer for this important suggestion. We have sought to test our model with a functional experiment that selectively disrupts sensory input into the ADL neurons. To achieve this, we decided to knock down a protein required for intraflagellar transport, OSM-6, rather than the OCR-2 TRP channel subunit. OCR-2 mediates not only pheromone responses in ADL, but also O2-escape behavior (de Bono et al., 2002). This may reflect a broader role for OCR-2 in ADL than sensory transduction. Disrupting OSM-6 truncates sensory cilia and severely compromises many chemosensory responses, but only weakly reduces aggregation and O2 responses.

      To target OSM-6 degradation specifically to the ADL neurons we knocked in DNA encoding an Auxin Inducible Degron (AID) into the osm-6 locus, and expressed TIR1 in ADL to achieve cell-specificity. TIR1 is required for AID. We have added the new data to Figure 4F–G and Figure 4 – figure supplement 2. We show that expressing TIR1 in ADL disrupts OSM-6::AID function both in the presence and absence of Auxin. This agrees with recent work that tested the efficiency and specificity of the AID system (Hills-Muckey et al., 2021). A partial OSM-6::AID reduction in ADL recapitulates many of the phenotypes of qui-1 mutants, including increased neurosecretion from ADL, heightened ADL responses to O2 inputs and a small but significant enhancement of the O2-escape response. We think these new data support our interpretation that a change in ADL’s sensory properties leads to heightened response of ADL neurons to O2 inputs, a phenotype observed in qui-1 and multiple other sensory defective mutants and a hallmark of cross-modal plasticity. However, the effects of knocking down osm-6 on ADL function also appear to be complex, as the stronger osm-6 knockdown achieved by adding auxin to the osm-6::AID knockin animals expressing TIR1 in ADL, unexpectedly gives weaker phenotypes than when auxin is absent.

      In fact, it would be interesting if the authors could explain or speculate how qui-1 eliminates ADL O2 responses, and how neuropeptide signaling from the RMG circuit via the NPR-22 neuropeptide receptor bypasses this lack of response and drives enhanced neuropeptide secretion in ADL, as they report.

      We can only speculate why O2-evoked responses in ADL disappear in qui-1 mutants. One possibility is that ADL becomes less excitable due to the reconfigured gene expression associated with loss of qui-1 in ADL. This model would predict that selectively knocking down qui-1 in ADL would confer the same Ca2+ response phenotype. Blocking ADL neurosecretion with TeTx in qui-1 mutants would test if the increased ADL neurosecretion we describe feeds back to reduce the O2-evoked Ca2+ response in ADL. An alternative hypothesis is that the effect of disrupting qui-1 is non-cell-autonomous, altering excitatory or inhibitory input to ADL from other qui-1 expressing neurons. We have not tested if neurosecretion from other qui-1-expressing neurons is altered in qui-1 mutants.

      Strikingly, while disrupting qui-1 leads to loss of a measurable O2-evoked Ca2+ response in ADL, these neurons display elevated O2-evoked neurosecretion in qui-1 mutants. This implies that some O2-evoked Ca2+ responses are retained in ADL’s axons in qui-1 mutants. It also suggests that other second messengers upregulate neurosecretion. Elevating cAMP, for example, can promote dense-core vesicle release more efficiently than increasing Ca2+ levels (Costa et al., 2017). Altered G-protein coupled receptor signalling could lead to elevated cAMP levels and increased neurosecretion in qui-1 mutants. It is worth noting that in N2 controls, ADL does not display O2-evoked neurosecretion despite showing measurable Ca2+ responses.

      The work includes a transcriptomic analysis comparing ADL-specific gene expression between wild type and the qui-1 mutant. Unlike other experiments in the study, in which the specific effects of mutations were confirmed through rescue experiments and the use of additional alleles, thus eliminating potential confounds with background mutations, the transcriptomic experiment did not apply such controls. Therefore, it is hard to conclude whether the reported changes in transcription are due solely to the qui-1 mutation or to other unrelated genetic modifications in the mutant strain.

      We worried about unspecific effects of background mutations both on the ADL transcriptome and on other qui-1 related phenotypes. We regret we did not explicitly address this point in our initial submission. To remove background mutations, mutants isolated in our screen, including qui-1, were backcrossed with the N2 laboratory strain a minimum of four times. These qui-1 animals were further crossed into a 5 times outcrossed line that expresses the fluorescent protein mKate specifically in ADL, to generate the strains from which we sorted ADL neurons by FACS. Mutant and transgenic strains were outcrossed using the N2 laboratory strain. We explain this in the Methods section of the revised manuscript.

      The extensive outcrossing make us confident that the large majority of differentially regulated genes between wild type and qui-1 samples in ADL are due to the absence of qui-1. Supporting this, both mutations in neuropeptide receptors identified by our profiling, npr-22 and tkr-1, suppress ADL’s elevated neurosecretion. Nevertheless, we have added a note to explicitly bring up the concern raised by our reviewers, that some transcriptional differences could be the result of background mutations.

      Overall, except for where mentioned, the data presented are solid and consistent. However, the conclusion that the study reveals a molecular pathway for cross-modal plasticity is less convincing. The chain of events does not include some form of sensory loss, leading to subsequent, independent neural plasticity, as expected for cross-modal plasticity. Rather, a very broad genetic switch is described that can simultaneously change receptor abundance and neuropeptide release. Thus, an equally interesting and more coherent framing of the data could be that the study uncovered a genetic regulator, yet to be fully characterized, of oxygen-dependent behavior in a non-oxygen sensing neuron, adding to previous literature on neural circuit cross-talk.

      We are grateful to the reviewer for her/his thorough and critical analysis of our work, which has prompted us to perform additional experiments and helped us revise our manuscript. These additional data clarify our final interpretation of the data regarding cross-modal plasticity.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper provides a new method, smfATAC-Seq, which can be used to identify epigenetic alterations that occur in myofibers during muscle regeneration or muscle diseases. Through extracting single myofibers from the Extensor Digitorum Longus (EDL) muscle fiber followed by ATAC-Seq, the chromatin accessibility profile of a single myofiber can be obtained for further analysis. Using the smfATAC-Seq, sufficient reads can be obtained from one myofiber containing around 200-300 myonuclei. This method allows for a small input amount and is easy to follow. The authors show that the chromatin accessibility profile of myonuclei is different from that of muscle stem cells (MuSCs) and changes upon injury and disease. Further analysis of the smfATAC-Seq data may allow for the identification of active regulatory elements in muscle fibers.

      Although the approach does have strengths in principle, the design of the experiments and data analyses performed are superficial. Notably, the data size (i.e., the number of myofibers evaluated) is insufficient to conclude to support the claims in the manuscript.

      We thank the reviewer for their careful and insightful comments on our manuscript. The design of the experiments is centered on the validation of ATAC-seq on single EDL myofibers. We have chosen myofibers under different physiological and disease conditions. We have also taken the recommended required numbers of biological replicates suggested by ENCODE into consideration (https://www.encodeproject.org/atac-seq/). We believe, the number of replicates and the extensive validation of the data firmly supports our conclusion that smfATAC-seq is a powerful and reproducible method to interrogate the chromatin accessibility and to identify active cis-regulatory elements in a single muscle fiber under various physiological and disease conditions.

      Major points:

      1) One muscle contains hundreds of myofibers, while the authors only show 2-4 myofiber replicates for each condition. The authors claim that this method can be used to distinguish different fiber types. However, there is no evidence to support such a claim. Instead of using EDL, the approach should be applied to a muscle that contains a ratio of both fast and slow fiber types, indicating the heterogeneity among myofibers in one muscle in different conditions. In addition, the myofibers from the injured or disease muscles are highly heterogeneous in terms of their regeneration status. What is the rationale for choosing the myofibers? Were all the myofibers injured with central nuclei from end to end? Or is it partial? What is the diameter of this muscle fiber? Can the smfATAC-seq be a method to tell us about the maturity of the myofibers? Unfortunately, the design of the experiments did not provide any interesting biological insights.

      We do understand that a muscle contains hundreds of myofibers and that they are heterogeneous. The main aim of this “Tools and Resource” article is to demonstrate that ATACseq can be reproducibly applied to a single EDL muscle fiber to analyze its chromatin accessibility. We believe that this method can be used in future studies to gain detailed biological insights on the chromatin state of myofibers under various conditions.

      Furthermore, we show that our smfATAC-Seq can distinguish between different fiber types such as slow type and fast type. We show in Figure 3 – Figure Supplement 2 the presence of ATACSeq peaks in the promoter regions of genes associated with fast type fibers such as Troponin I2 (Tnni2) and Troponin T3 (Tnnt3). However, we also show the absence of peaks for genes that are associated with slow type fibers such as Troponin T1 (Tnnt1) and Myosin heavy chain 7 (Myh7). In this manuscript we introduce a new method to the field that can be utilized under various physiological and disease conditions. In the future, smfATAC-Seq can be performed on a muscle that contains both fast and slow type fibers if that is desired by other groups in order to compare the chromatin accessibility between slow and fast type myofibers.

      We agree with the reviewer that myofibers under injury are highly heterogenous as we have stated previously in our manuscript. That is precisely why we have optimized the method to add a step to select for the desired myofiber since random selection from an injured muscle could result in a myofiber that is not under regeneration or only partially regenerated at the time of selection. For that reason, in our method, we have included a step where we stain the myonuclei with Hoechst and visualize the centrally located myonuclei, since that is a marker for regeneration. For the injured condition, we have specifically selected myofibers that are composed solely of centrally located myonuclei, indicating an injured or regenerating myofiber. In the comparison between MDX and WT myofibers, both conditions were uninjured and only myofibers that displayed no centrally located nuclei were selected for processing.

      2) The authors claim that their data "revealed a repertoire of active cis-regulatory elements", but no supporting evidence is provided. In the manuscript, only the smfATAC-Seq signal coverage across the genes known to be functional in muscle development was shown. Identifying the active cis-regulatory elements is essentially impossible without combinational analysis with other epigenetic profiles (e.g., H3K27ac ChIP-Seq, Hi-C). The results presented serve as validation but not an exploration of the regulatory elements for MuSCs and myofibers.

      We have now performed comparative analysis between our smfATAC-Seq and ChIPSeq performed on EDL muscle for H3K27ac by Ramachandran, et al. 2019. The figure for this analysis can be found in Figure 2 – Figure Supplement 2K). We now discuss the results of this analysis in the lines 184- 191 which reads as follows:

      “Accessible chromatin regions are associated with various histone marks such as H3K27ac and H3K4me3 (4-6). Thus, we compared the smfATAC-Seq to publicly available datasets on ChIP-Seq on H3K27ac in EDL muscle that was previously performed by Ramachandran, et al. 2019 (GSM3515022, GSM3515023) (7). The comparative analysis has revealed that there were only 97 peaks in the smfATAC-Seq that did not overlap with the H3K27ac peaks, while the majority of the peaks, 6090 peaks, were common to the H3K27ac peaks present in the entire EDL muscle (Figure 2- Figure supplement 2K). This demonstrates that the accessible regions that are assessed by smfATAC-Seq correspond to the regions of the chromatin marked by histones that are associated with open chromatin such as H3K27ac.”

      3) Muscle regeneration is a long-term process that could take a long time to complete depending on the age of the animal and the severity of the injury. The authors examined the chromatin accessibility profile of the myofibers in uninjured and 7 days post-injured muscle. This short time frame does not provide sufficient information to interpret the chromatin accessibility changes of myofibers during the whole regeneration process. It is difficult to understand the result of these experiments (comparing the uninjured fibers to injured fibers or WT fibers to MDX fibers). What does the ATAC-seq data add to our understanding that these myofibers are different? From a molecular point of view, Can this analysis provide a set of biomarkers of the myofiber cell states during regeneration and disease, for example? Again, the design of the experiments is superficial.

      We agree with the reviewer that regeneration is a long-term process and complete understanding of it requires assessment at multiple time points after injury including long term assessments. As mentioned before, smfATAC incorporates a step that can specifically select myofibers based on their regeneration status, which will be very beneficial for researchers in the future. Using smfATAC-Seq to understand the changes in chromatin state of a single myofiber during different time points after injury will advance our understanding of muscle regeneration and can lead to novel discoveries in the future. However, such experiments are beyond the scope of this “Tools and Resource” article.

      Reviewer #2 (Public Review):

      In this paper, Sahinyan and colleagues developed a method for analyzing chromatin accessibility in single murine myofibers. This goal was achieved by adapting the previously published OMNIATAC protocol to the specific properties of the myofiber environment. To demonstrate the validity of this method, they isolated myofibers from uninjured and regenerating murine EDL muscles dissected from wild type animals. In a second experiment, this method was applied to isolate myofibers from mdx mice, a model of Duchenne Muscular Dystrophy. The resulting datasets were further compared to the one generated from purified muscle stem cells.

      Strengths In general, the authors provided robust quality controls for these datasets, which ensures the validity of their observations. Analysis of chromatin accessibility using this protocol enabled the identification of subsets of peaks specific for each experimental group, which were further analyzed to determine enriched biological processes.

      Weaknesses While the experiments are well executed, the resulting data are descriptive and do not provide further insights into the biological processes under investigation. A more comprehensive analysis could significantly increase our knowledge of the molecular pathways controlling skeletal muscle response to acute or pathogenic injuries.

      We thank the reviewer for their comments and suggestions. As mentioned earlier, the purpose of this “Tools and Resource” article is to provide validation of the method without necessarily providing novel biological insights. However, additional comparative analysis of data in this manuscript between different physiological and disease conditions does provide some biological insights on the chromatin accessibility of myofibers.

      Reviewer #3 (Public Review):

      In this study, the authors adapt the OMNI-ATACSeq for single muscle fiber ATACSeq. This technique, dubbed "smfATAC-Seq" is used to demonstrate epigenetic changes in injury and the mdx dystrophic mouse model. Single-cell ATACSeq has been used to characterize cellular epigenetic heterogeneity of other tissues. However, the fused, multinucleated nature of muscle fibers makes skeletal muscle intractable to this method. Thus, epigenetic studies have been restricted to either whole muscle, or single cells within the tissue (eg muscle stem cells, endothelial and immune cells). While the study is primarily descriptive, the smfATAC-Seq method presented is technically thorough, and will be valuable for the muscle field. Furthermore, the data produced is a good resource that can be mined to generate testable hypotheses in the future.

      Strengths The methods are presented clearly, and the most often studied conditions in the muscle field (injury and dystrophy) are appropriately used as examples to demonstrate the utility of smfATACSeq. In addition, the authors show that the data generated is reproducible, and can capture known aspects of myofiber heterogeneity.

      Weaknesses Since the paper is largely focused on a new method, more emphasis should be placed on what kind of questions can be uniquely answered by smfATCSeq. The authors show selected tracks between smfATAC-Seq and DESeq, however, this is a qualitative comparison. Authors should also provide a more systematic comparison of smfATAC seq with other published epigenetic datasets in whole skeletal muscle (DNaseq, ATACSeq, etc) -- for example, compare quality metrics such as sequencing depth or quantify overlap between identified peaks across methods.

      We thank the reviewer for their comments and suggestions. We have now compared the smfATAC-Seq to other published epigenetic data sets such as whole muscle ATAC-Seq and ChIP-Seq of H3K27ac on whole muscle.

      We have now performed comparative analysis between our smfATAC-Seq and ChIP-Seq performed on EDL muscle for H3K27ac by Ramachandran, et al. 2019. The figure for this analysis can be found in Figure 2 – Figure Supplement 2K. We now discuss the results of this analysis in the lines 184- 191 which reads as follows:

      “Accessible chromatin regions are associated with various histone marks such as H3K27ac and H3K4me3 (4-6). Thus, we compared the smfATAC-Seq to publicly available datasets on ChIP-Seq on H3K27ac in EDL muscle that was previously performed by Ramachandran, et al. 2019 (GSM3515022, GSM3515023) (7). The comparative analysis has revealed that there were only 97 peaks in the smfATAC-Seq that did not overlap with the H3K27ac peaks, while the majority of the peaks, 6090 peaks, were common to the H3K27ac peaks present in the entire EDL muscle (Figure 2- Figure supplement 2K ). This demonstrates that the accessible regions that are assessed by smfATAC-Seq correspond to the regions of the chromatin marked by histones that are associated with open chromatin such as H3K27ac .”

      We have also compared our smfATAC-Seq on the uninjured myofiber to the whole EDL muscle ATAC-Seq that was mentioned by the reviewer (Ramachandran, et al.2019). We have determined that 65% of the peaks in the smfATAC-Seq overlap with the whole muscle ATACSeq by at least 1 bp. We have included a detailed table on the overlap between the two datasets (Table 3).

      We discuss the results of this comparison in the lines 179-182: “We also analyzed the overlap between the smfATAC-Seq on single EDL myofibers with the ATAC-Seq performed on the whole EDL muscle by Ramachandran, et al. 2019 (GSM3981673) (7). This analysis revealed that 65% of the smfATAC-Seq peaks in the uninjured myofiber overlap with the whole EDL muscle ATAC-Seq (Table 3).”

    1. Author Response

      Reviewer #1 (Public Review):

      The study shows nicely that the calcium binding protein KChiP3 is associated with poor survival of the colorectal cancer (CRC) cohort analyzed and could be a potential prognostic marker. Decreased KChiP3 expression also reduced cell survival upon FOLFIRI treatment making its impact interesting. KChiP3 has previously been indicated by this group to regulate mucus release from the used colon carcinoma cell line and has in other studies been implicated in regulation of calnexin and potassium channels. The inhibitors Benzamil, an Amiloride derivative, and SN-6, a NCX inhibitor, both further reduced survival of colon cancer cells treated with FOLFIRI. These findings reveal an interesting potential of positive effects by manipulating calcium control in the cells to enhance the effect of FOLFIRI treatment of CRC patients.

      The mechanism is speculated to involve modulation of the protective mucus secreted by the cells. The experimental setup is largely based on two subclones of the cancer derived cell line HT29 (M6 and 18N2). These cells produce and secrete different gel-forming mucins, mainly MUC2 and MUC5AC but this could vary depending on confluency, differentiation, and polarization. Is the level affected by these features?

      These cells derived from the more heterogeneous HT29 cell line are highly comparable and grow as well-polarized monolayers that differentiate after reaching confluence (Phillips et al., 1995). Thus, all experiments were done on the same day post-confluence. We have now added a specific description of this in the methods section as stated below.

      “Cell lines

      HT29-M6

      HT29-M6 cells (obtained from ATCC) were grown in Dulbecco’s modified Eagle’s medium (Invitrogen) plus 10% foetal bovine serum (Biological Industries) and were maintained in 5% CO2 incubator at 37ºC. All experiments were performed with cells at six days postconfluency when they form a well-polarized monolayer and present higher levels of mucins.

      HT29-18N2

      HT29-18N2 cells (obtained from ATCC) (RRID:CVCL_5942) were tested for mycoplasma contamination with the Lookout mycoplasma PCR detection kit (Sigma-Aldrich, St. Louis, MO). Mycoplasma negative HT29-18N2 cells were used for the experiments presented here. HT29-18N2 cells were differentiated to goblet cells as described previously (5) Briefly, cells were seeded in complete growth medium (DMEM complemented with 10% FCS and 1% P/S), and the following day (Day zero: D-0), the cells were placed in PFHMII protein free medium (GIBCO, ThermoFisher Scientific, Waltham, MA, USA). After 3 days (D-3), medium was replaced with fresh PFHMII medium and cells grown for 3 additional days. At day 6 (D-6) cells were trypsinized and seeded for the respective experiments in complete growth medium followed by incubation in PFHMII medium at day 7 (D-7). All experimental procedures were performed at day 9 (D-9).”

      The inhibitory effect of mucus was addressed by the studying the effect of FOLFIRI treatment on MUC5AC production and secretion. The initial experiments showed increased production of MUC5AC using western blot analysis with a peak intensity of the protein after three hours of FOLFIRI treatment. The band, as determined by the size, corresponds to the non-glycosylated newly produced mucin product from the endoplasmic reticulum.

      We apologize for this misunderstanding. The highest molecular weight marker (MWM) that we have runs at 450 kDa, which we erroneously indicated 560 kDa in the figure. The molecular weight of MUC5AC is 560 kDa. The anti- MUC5AC antibody detects a smear at molecular weights higher than 450 kDa and migrate at the top of a 4% acrylamide gel. We have now indicated the size of the 450 kDa MWM (Figure 1A, Figure 4A). The detected bands therefore do not correspond to newly produced mucins. We have now confirmed the presence of mature mucins in our cells and also PDOs by Alcian Blue staining (Figure 4 – Supplement figure 1A and 1C).

      These results are supplemented with detection of increased intracellular intensity in cells stained for MUC5AC, but these findings do not seem to be reproduced in the following control experiments studying effects of Benzamil or SN-6.

      In Figure 4A we also detected an increase in MUC5 levels at 3 hours of 5-FU+Iri. treatment. However, due to the high levels of MUC5AC in the SN-6 adjacent lanes it is difficult to visualize this increase. We have now repeated the experiment to obtain better and clearer images (Figure 4A).

      The amounts of transcriptionally expressed or mature secreted mucus are however not determined upon FOLFIRI treatment nor in the experiments manipulating mucus secretion by altered KChiP3 expression or inhibited NCX function. In this context the extracellular mucus could be a combination of truly secreted mucus and intracellular components from detached or dead cells, but how this contributes to cell survival is not known.

      Proper controls for all experiments involving measurements in the extracellular medium were performed to ensure that secreted medium did not contain dead or detached cells. We now include a western blot of the secreted medium against cytosolic marker, which shows there are no dead cells in the experiments (Figure 4A). In addition, we have tested MUC5AC RNA levels in HT29 cells treated with FOLFIRI, SN-6 and SN-6+FOLFIRI (Figure 4 – Supplement figure 1B). These results confirm that there is no transcriptional effect on mucins.

      The inhibitor effect of mucus on treatment of CRC cells is speculated to be though binding of the drugs to the mucus. This is studied indirectly by binding of albumin to cells with or without secreted mucus. Mucus binding is however influenced by several factors as mucus composition, organization, and environmental impacts and as the binding affinity is dependent on the molecule the difference between albumin and FOLFIRI is a concern.

      We apologize for not explaining this clearly. We propose that mucins act as a physical barrier for 5-FU internalization and we have used albumin as a means to show the effect on internalization by excessive extracellular mucins. The possibility that 5-FU+iri. is actively retained by binding to mucin fibers is a possibility that we have not addressed. This requires the availability of tagged therapeutics for visualization by fluorescence microscopy and currently unavailable. However, this is an important issue and should be investigated in the future. We now include this possibility in the discussion as stated below:

      “Cancer cells produce mucins as a protective response to chemotherapy

      Our data show that CRC cells produce mucins in response to 5-Fluorouracil + Irinotecan (5-FU+iri.), a first-choice chemotherapy for most CRC (30). Besides, 5-FU+iri. not only increases mucin production, but also increases mucin secretion (70-80% after 6 hours). This increase in secretion is completely blocked by SN-6 treatment (as described previously in (6)). This is extremely important, because secreted mucins can create a physical barrier that could prevent drugs (for instance, chemotherapeutics) from reaching the tumour cells. Another possibility is that 5-FU+iri. is actively retained by mucin fibres. The availability of tagged versions of these chemicals will help to address this issue in the future.”

      The provided images also indicate mucus free areas that could allow for compound access if not covered by other unstained mucins. Organoids were used to assess the effect of SN-6 which also in this system reduce cell survival combined with FOLFIRI treatment but the link to mucus secretion was not explored.

      We provide now additional staining with Alcian Blue from HT-29 and PDOs cultures (Figure 4 – Supplement figure 1A and 1C).

      Taken together the basis for the arguments that mucus is influencing the accessibility of the supplemented drugs is not fully supported by the experiments provided.

      As stated above, we now provide new data to clarify the effect of mucus on chemoresistance (Figure 4A, Figure 4 – Supplement figure 1A and 1C). We also explain this more clearly in the discussion as stated below.

      “Cancer cells produce mucins as a protective response to chemotherapy

      Our data show that CRC cells produce mucins in response to 5-Fluorouracil + Irinotecan (5-FU+iri.), a first-choice chemotherapy for most CRC (30). Besides, 5-FU+iri. not only increases mucin production, but also increases mucin secretion (70-80% after 6 hours). This increase in secretion is completely blocked by SN-6 treatment (as described previously in (6)). This is extremely important, because secreted mucins can create a physical barrier that could prevent drugs (for instance, chemotherapeutics) from reaching the tumour cells. Another possibility is that 5-FU+iri. is actively retained by mucin fibres. The availability of tagged versions of these chemicals will help to address this issue in the future. Our results in PDOs and cell lines show that cancer cells respond to chemotherapeutics by secreting copious amounts of mucin to form a barrier against the treatment. We suggest that this reaction (mucin secretion) of colorectal cancer cells to chemotherapy is similar to the programmed response of epithelial cells to toxics or pathogens. In this situation, which could resemble the treatment with 5-FU+iri., mucin-producing cells would release large quantities of mucins to isolate them from the insults (toxins, allergens or pathogens) (3). An interesting question is how 5-FU+iri. triggers mucin secretion and whether there is a specific receptor involved in this response or an intracellular pathway that is activated by these chemicals. This finding is important in application of chemotherapeutics and merits further investigation. Our results show that this increase in mucin production/secretion triggered by chemotherapeutics is independent of transcriptional effects on mucins, although we cannot completely discard the possibility that the levels of components necessary for mucin modification and secretion (e.g., glycosyltransferases) are altered.”

      The CRC survival analysis shows that MUC5AC expression levels were not found to be associated with disease outcome but the influence of other mucins was not analyzed. Some previous studies suggest that MUC2 expression is associated with better survival while MC5AC show opposite effects.

      As suggested by the reviewer, we have checked the effect of MUC2 on patients’ survival (Figure 2 – Supplement figure 1B). Our analysis revealed that patients with low levels of MUC2 (possibly because cells are more dedifferentiated) have worse prognosis. However, when we stratify patients with high MUC2 levels by KChIP3 expression, we found a tendency (p=0.08) of patients with low levels of KChIP3 to have lower disease-free survival (Figure 2 – Supplement figure 1C). It is interesting to note that patients that had worse prognosis with low levels of MUC5AC and KChIP3 (Figure 2C), had high levels of other secreted mucins like MUC2, MUC4 or MUC5B (Figure 2 – Supplement figure 1A), which could compensate for the lack of MUC5AC. We include this new data in the results section as stated below.

      “There is no significant effect on patients with low MUC5AC expression (Figure 2C), although after 100 months there is a decrease in DFS. Further analysis of this effect revealed that three patients (GSM358532, GSM358534 and GSM358438) were responsible of this effect. We studied these patients in more detail and interestingly, although they have low levels of MUC5AC, these patients present increased levels of other secreted mucins (e.g., MUC2) (Figure 2 – Supplement figure 1A). However, similarly to previous studies (10, 12), analysis of the effect of MUC2 levels on DFS shows that high MUC2 levels are protective while patients with low levels of MUC2, which may reflect a more dedifferentiated state of cancer cells, have a lower DFS (HR=2.54, p=0.023) (Figure 2 – Supplement figure 1B). Importantly, when we studied the effect of KChIP3 on the DFS of patients with high expression of MUC2, we found a clear tendency (p=0.08) to worse prognosis in patients with low levels of KChIP3 (Figure 2 – Supplement figure 1C). Altogether, these results suggest that low levels of KChIP3, which consequently increase the quantity of mucins secreted, protect cancer cells from chemotherapeutic drugs.”

      As the analysis do not link mucus secretion to treatment efficacy combined with analyses of a role for other regulatory proteins it is not clear how this adds to the aim of the study.

      We now show the levels of secreted mucins (Figure 4A), which strengthens the relationship between inhibition of proteins that regulate mucin secretion and chemotherapy efficacy.

    1. Author Response

      Reviewer #1 (Public Review):

      Further manipulations of the network are performed in order to relate the learned network structure to functional properties (i.e. SWRs, replay). By shuffling the weight matrix, it is confirmed that the specific pattern of strong synaptic connections is necessary for replay (as opposed to the overall weight statistics). However, this seems unsurprising given the topological nature of the implicit cognitive map and its place cell representation.

      We agree that it is not surprising that destroying the spatial structure of the weight matrix (by shuffling the weights) eliminates sequence replay. The significant novel finding is that this manipulation (which preserves not just the mean but even the cell-by-cell statistics of the weights) also has a major effect on the population-level dynamics by eliminating sharp waves and ripple oscillations.

      The authors acknowledge and discuss in detail some significant limitations of their results, most prominently the apparently unending nature of the emergent replay (i.e., replay terminates only on encountering the end of the track). Otherwise, it could be noted that the results are demonstrated exclusively in a linear track environment. It would be interesting to establish whether the model is robust to learning from trajectories in an 2d open box environment for example.

      In principle, exploration in an open box (or any other environment) could be modeled by changing the spike statistics during the learning phase so that it mimics the corresponding experimental observations. All other properties and parameters of the model would remain unchanged. This would also allow us to test whether we can predict the replay dynamics in these novel situations, and we intend to examine this in a follow-up project.

      Overall, I find the conclusions of this paper to be supported by the results. Though this work provides a comprehensive account of several previous experimental results, it lacks specific experimental predictions thus limiting the significance of this manuscript for experimentalists who may be in a position to test any such novel hypotheses.

      Both our concrete mechanistic model and the general conceptual framework suggested by our simulation results can be used to derive novel experimental predictions. In fact, the consequences of all the manipulations that we employed to probe our in silico network can be regarded as experimental predictions. Although some of these manipulations (such as shuffling the weights) would be difficult to replicate in real experiments, others are probably feasible using modern tools (cell-type-specific targeting, optogenetics, chemogenetics, etc.). We have listed several specific predictions of our modeling work in a new section of the Discussion (“Experimental predictions”). These predictions include the following:

      • Blocking PVBC-PVBC synaptic interactions should eliminate ripples but not sharp waves or replay, while blocking recurrent excitation is expected to eliminate sharp waves, ripple oscillations and spontaneous replay as well.
      • The nature of replay (e.g., stereotypical vs. diffusive) should depend on the task and behavior during learning (e.g., 1D vs. 2D environment, goal-oriented vs. random foraging task).
      • It should be possible to influence the content of replay by providing structured input during learning or before/during SWRs.
      • Any manipulation changing the symmetry of the STDP rule should bias the direction of replay.

      Reviewer #2 (Public Review):

      The paper uses advanced methodology and the results are well presented. However, certain weaknesses remain, which are listed below.

      1. The main result lacks a mechanistic explanation. My understanding of the main result is that CA3 principal cell firing needs to be tightly controlled by plasticity (so that there are chains of strong bidirectional coupling guiding initial random activity, making it very unlikely that a cell fires outside of its place field) and strong adaptation (so that the participation of a cell in a chain is terminated after a short time, suppressing burst firing) for the model to generate replay events with physiological properties. Is it not possible at all to generate ripples without structured excitatory connectivity in the model?

      Our simulations explore the relationships among (structured) connectivity, the dynamics of population activity (sharp waves and ripples), and detailed activity dynamics (sequence replay). In essence, we find that periods of sequence reactivation (which indeed requires structured connections and adaptation) involve increased activity in all cell populations (corresponding to sharp waves), and that this high activity of PV+ interneurons is responsible for the generation of ripples (as PVBCs synchronize through mutual inhibitory connections). Manipulations of the model that eliminate sequence replay also eliminate sharp waves and ripple oscillations. However, if the excitatory drive to PVBCs is increased to a sufficiently high level by other means (e.g., by drastically increasing the excitatory recurrent weights (Figure 6 D-F), or by directly setting a high level of excitatory input to PVBCs (Figure 8 A-B)), ripple oscillations can be generated in the absence of structured recurrent excitation.

      Related to this, if the principal cells fire only once or twice during the whole replay event (as is observed in vivo and also in the present model), how can a spike-triggered adaptation current (that is required for replay, as the authors show) exert its influence on the dynamics of the network?

      First, in principle, a strong spike-triggered adaptation current could influence the dynamics of the network even after a single spike, by shifting the strongest net input (i.e., the sum of EPSCs, IPSCs, and adaptation current) to cells that have not been active recently but have nearby place fields. In addition, our place cells tend to fire more than once during individual sharp waves. We have amended the manuscript to clarify this fact in lines 188-191, and updated Supplementary Figure 4 (B1 insert and new panel C).

      1. Some of the modelling choices seem to be ad hoc and not well motivated: 2.1 It is not clear whether the symmetric STDP rule proposed by Mishra et al. is more biologically plausible for CA3 than other rules and/or whether there are other rules at work at all. Currently, it reads as if this is the only STDP rule that is plausible for CA3 recurrent excitatory connections.

      Importantly, the plasticity rule used in our paper was not just proposed by Mishra et al. Nature Comms., (2016), but it reflects their direct measurement of STDP in pairs of CA3 pyramidal neurons in hippocampal slices. To our best knowledge this is the only rule that has been experimentally measured in area CA3. On the other hand, we do investigate the effect of using a different type of STDP rule (one that has been measured in several other types of synapse), and find that this choice leads to altered dynamics and the absence of reverse replay events (Figure 5).

      Also, the time constant of the plasticity rule was fixed. It would be good to study the impact of different time constants if this is biologically realistic. The reason for this suggestion is that it might allow for a more mechanistic understanding of the model behavior. Is it possible that there is a certain spatial range for stronger-than-baseline connectivity that is required for there to be replay and if so, does it depend on the time constant?

      In our study, the time constant of STDP was fixed to the value that Mishra et al. Nature Comms., (2016) measured experimentally. The way in which the shape (magnitude and time constant) of the STDP rule influences the network dynamics is indeed an interesting topic for theoretical research; however, exactly this issue was recently explored by Theodoni et al. eLife, (2017), so we decided to focus our efforts on other questions.

      2.2 Plasticity between principal cells is switched off after the initial exploration phase and the networks are static thereafter. How realistic is this? It might be possible that spike-triggered synaptic plasticity during a ripple event interferes with the bidirectional coupling established during the initial learning phase, thus rendering replay unstable and/or terminating it before the end of the track is reached.

      We agree with the reviewer that the complete absence of plasticity in the second phase may not be realistic. This is a simplifying assumption of our model (listed as Assumption 5 in Table 1), which is shared by many models of hippocampal learning. On the other hand, plasticity is known to be strongly modulated by brain state (e.g., through the action of subcortical inputs on hippocampal cells and synapses), so plasticity may be substantially weaker or different in off-line states (such as slow-wave sleep; see Hasselmo, Current Opinion in Neurobiology, (2006), and Fuenzalida et al., Neuroscience, 2021, for reviews). Finally, since replay recapitulates activity correlations in the learning phase, the main effect of plasticity during replay would be to reinforce existing patterns of weights and activity (although some kind of homeostatic mechanism may be required to prevent uncontrolled positive feedback). These arguments have now been included in the Discussion section of the paper.

      1. The authors conclude that the FINO mechanism is at the heart of ripple generation in CA3. I found it confusing that apparently, no ripple oscillations are present in the population rate of the principal cells, but only in the calculated LFP, although the Figure Supplement for Fig. 3 seems to suggest that ripple oscillations are present also in the population rate of the principal cells. Maybe calculating the ripple frequency directly from the population rate of the principal cells, but not from the LFP, would change this conclusion (presented in Fig. 8), because the former still exhibits ripple oscillations, whereas the latter doesn't?

      This appears to be some sort of misunderstanding (possibly stemming from the relatively coarse binning of PC rate in some of our figures which are intended to illustrate long-term variations); ripple oscillations are indeed present in the PC rate (in our baseline model) as shown in Supplementary Figure 5. Regarding the modified model version presented in Figure 8 C, the PC rate is devoid of ripple frequency oscillations, just like the LFP.

      1. Is it possible to determine which initial conditions cause forward and which reverse replay? Would it be possible to compare this to different behavioural states exhibited by an awake animal traversing a track and then resting? The authors mention in the Introduction that forward replay is associated with memory recall, whereas reverse replay is associated with reward-based learning. How is this reflected in the statistics of these two types of replay events in the resting phase of the model? Is there an expectation that both events should occur equally often or are there any conditions that bias the direction of the replay in the model? This is related to the unstructured input that principal cells receive in the model, so maybe there is an interplay of these two sources of excitation (feedforward via the mossy fibers and recurrent via the learned connectivity) which determines the direction and frequency of replay events. I feel that addressing at least some of these points would allow the authors to gauge the realism of their replay model more finely.

      We explored some of the mechanisms that may bias the direction of replay in Figure 2C. In particular, we show that one can evoke either forward or backward replay via external input that corresponds to neural activity at the start and the end of the track, respectively. Without any input cues, forward and backward replays occur with the same frequency in our baseline model (we have verified this by running the simulation with different random seeds, which influences the random mossy fiber input).

      In general, the direction of replay in the model can depend on systematic biases and random variations in the weight structure, but also on (random or deterministic) spatio-temporal structure in the external inputs. We believe that the cued replay scenario that we simulated is not entirely artificial – even real environments may contain choice points where the animal pauses to plan its route, or reward sites that again make the animal stop and may also activate subcortical modulatory systems that affect plasticity. At intermediate sites, systematic bias could be introduced during exploration by factors that influence behavior (e.g., running speed), neuronal activity or plasticity (e.g., attention), especially if such factors can change the fully symmetric nature of the synaptic plasticity rule. Random variations in connectivity or the weights (e.g., due to stochastic spiking during learning) will lead to a location-dependent systematic bias towards forward or reverse replay. Finally, (randomly occurring or input-driven) sequences in the external input are also expected to bias the direction of replay even at intermediate points. However, a detailed analysis of the relative impact of these factors will require further efforts. We have included some of these arguments in lines 404-427 of the Discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      There are very few studies on the spatial integration of color signals of V1 receptive fields, which is a striking gap in knowledge given the importance of color to primate vision and the powerfulness that spatial analysis of luminance contrast integration has proven for understanding how V1 works. This paper helps fill this major gap in knowledge. The main take home is that double opponent cells and simple cells are more likely to be linear in how they integrate signals across their receptive fields than a sample of non-double-opponent/non-simple cells. This conclusion is consistent with the limited data presently in the literature, and I wonder if further analysis of the rich dataset could uncover some deeper insights.

      We thank the reviewer for highlighting the gap in knowledge that our study helps to fill and for the excellent suggestions for ways to improve the manuscript. In response to both reviewers, we have conducted new analyses that uncover deeper insights into signal integration in V1. These new analyses have been incorporated into the revised manuscript.

      Reviewer #2 (Public Review):

      De and Horwitz deploy a focussed technique for testing the linearity of spatial summation for V1 neurons with spatial opponency, with the emphasis being on the properties of cells that encode chromatic information in a spatially opponent manner - so called double opponent cells. The technique isolates non-linearities of summation from non-linearities that occur after summation, by using an adaptive procedure to home in on stimulus contrasts in different color directions that produce a pre-defined criterion response. The authors conclude that many (but not all) double opponent cells embody linear spatial summation, and discuss implications for our understanding of the cortical circuitry that mediates color vision. The data appear carefully collected and generally well-analyzed. There are some points, elaborated in broad strokes below, where I think the paper would benefit from further elaboration of the data and its implications, and the paper would also benefit from some revisions to improve clarity.

      How are results affected by the cell classification criteria? The authors apply criteria to sort cells into four classes: simple, double opponent, NSNDO, and those not studied further. Response properties are then studied as a function of cell class. Criteria for classification include presence/absence of spatial opponency revealed by the pixel white noise measurements and the adequacy of a linear STA to describe the hyperpixel white noise data. I think more work is needed to clarify for the reader the extent to which these criteria, in and of themselves, affect the results for each class studied. In particular, if a linear STA describes the hyperpixel white noise data, shouldn't we then expect to find linear summation in the spatial receptive field in that same hyperpixel white noise data? I understand, as the authors point out, that the Phase 3 measurements could reveal failures of spatial summation not seen in the hyperpixel white noise data. But I'm a bit perplexed by the outliers in the NLI indices in Figure 3D. What properties of these cells allow a linear 6D STA to handle the hyperpixel white noise data well, but cause them to summate over space non-linearly for that same hyperpixel white noise data? In terms of the new information provided by the Phase 3 measurements, I wasn't able to get a sense of how much harder these stimuli were driving the cells than the Phase 2 measurements. It seemed like this was the intent of Figure 2 - Figure Supplement 1 and Figure 3 - Figure Supplment 1, but those two figures in the end didn't provide this information in a manner I could digest. Absent this, it was hard to tell how much more we are learning from the Phase 3 data. Could the higher NLI's here than in Phase 2 be a consequence of some stimuli but not others driving the neuron into saturation? And although the authors write on page 15 "Nevertheless, we found that nonlinearities detected in Phase 2 of our experiment were a good indicator of nonlinearity over the greater stimulus duration and range of contrasts in Phase 3, principally for the NSNDO cells (Figure 3E)", those correlations look very weak to me. I was left hoping for a better understanding the commonalities and differences in the data between Phases 2 and 3. I'm also not sure of the reliability of the measured NLI's for each cell with each method. Can anything more be provided about that? I note here that I did study the section of the discussion that nominally addresses some if these issues, and that my comments above remain after that study.

      The Reviewer brings up several important points that are addressed individually below.

      The revised manuscript is more explicit about the role of the cell classification criteria on the results. Particular emphasis is placed on the role of the spike-triggered covariance criterion in enriching the pools of simple cells and DO cells with neurons that are approximately linear.

      We agree with the Reviewer that, if a linear STA describes the hyperpixel white noise data well, we expect to find linear summation in the spatial receptive field in that same hyperpixel white noise data analyzed in other ways. A critical question is “does the STA describe the white noise data well?”. We address this question in two ways in this report: with an analysis of (the statistical significance of) the first principal component of the hyperpixel spike-triggering stimuli (PC1) and with a comparison of GLM and GQM fits to the hyperpixel white noise data (the white noise NLI). These analyses are related but are sensitive to different types of departure from linearity.

      Consider a neuron whose output is the product of two half-wave rectified linear subunits (see Figure 2 – Figure Supplement 5). Such a neuron would have a large white noise NLI due to the non-linear interaction between the subfields, but it would lack a significant PC1, because the nonlinearity tightens the distribution of excitatory stimuli, and the PC1 is the dimension along which the stimulus distribution is widest. In principle, such a nonlinearity would manifest in the smallest principal component, but in practice, small PCs often resemble the STA, which complicates their interpretation.

      Conversely, a neuron can have a significant PC1 but a small NLI. For example, consider a neuron that has a half-wave rectified response to modulations of one color channel but a full-wave rectified response to another. Such a neuron will have a significant PC1 due to the full-wave rectification, but an NLI near zero, because this nonlinearity is hidden once the stimuli are projected onto the STA (recall that the white noise NLI is computed from a pair of 1-D projections not the original 6-D representation). Code simulating these hypothetical neurons (used to produce Figure 2 – Figure Supplement 5) is available at GitHub (https://github.com/horwitzlab/Chromatic_spatial_contrast).

      The original submission lacked documentation of the difference in firing rates produced during Phases 2 and 3. We have added a new supplementary figure that quantifies this difference (see Figure 2 – Figure Supplement 2). Figure 2 – Figure Supplement 1 illustrates the range of inputs provided in Phases 2 & 3. This has been clarified in the revised text.

      Please note that the data shown in Figure 3D are isoresponse NLIs (that is, NLIs computed from responses recorded during Phase 3 of the experiment) not white noise NLIs (NLIs computed from the hyperpixel white noise shown during Phase 2 of the experiment). This has been clarified in the revised text.

      We agree that the correlation between the white noise NLI and isoresponse NLI measurements is weak. A full treatment of the differences in neural responses to the stimuli presented in Phase 2 & 3 is beyond the scope of this study. Nevertheless, we can think of several reasons that some neurons may have appeared more nonlinear in Phase 3 than they did in Phase 2. The first is, as suggested above, Phase 3 stimuli had higher contrast than Phase 2 stimuli, and are more likely to have engaged nonlinear gain control mechanisms upstream or within V1. Second, the linear and nonlinear models in Phase 2 had 3 and 6 parameters, respectively, but 2 and 5 in Phase 3, and this may affect the ratio of prediction errors. Third, nonstationary responses are expected to affect isoresponse NLIs more severely than white noise NLIs, because of the sequential way that isoresponse points were measured in Phase 3.

      Assessing the reliability of NLIs within cells is challenging because of the crossvalidation that is built into the definition. To address this comment, we used a jackknife procedure that quantifies the spread of NLIs computed from each of the data partitions used in the cross-validation.

      Implications of the results for models. As the authors summarize in their introduction, the motivation for testing the linearity of spatial summation is that the results can guide how we formulate response models for V1 chromatically sensitive cells. More discussion of this would be helpful. As an example, could cells with the nonlinear spatial filtering as shown in Figure 1C be classified as DO, making them relevant to the focussed tests applied in this paper? Or are they necessarily NSNDO? More generally, can the authors spend a little time discussing what classes of response models they would pursue for DO cells that do/don't show linear spatial summation, and for NSNDO cells that do/don't show linear spatial summation. Such discussion would tie the results of the primary data back to the motivating question in a more satisfactory manner, I think. Such discussion could also be used as a vehicle to discuss what the authors think about the DO cells that fail to show linear spatial summation and the NSNDO cells that do, something I found under-treated in the results. As with the comment above, I did read the sections of the paper that speak to this question, but still find it that it would benefit from going deeper.

      Inspired by this comment, we have added a new section to the Results that considers response models for neurons that do not show linear spatial summation. Specifically, we test the model illustrated in Figure 1C and reject it for many neurons. Figure 1C depicts a neuron that integrates inputs linearly within each subfield but nonlinearly across subfields. Within each RF subfield, therefore, this neuron conforms to a linear-nonlinear cascade model. Critically, during Phase 2 of the experiment, the stimulation at one RF subfield can be considered as an additive noise with respect to the signal generated by the other RF subfield. This is because the influences of the two RF subfields combine additively (under the model) and the modulations of the two hyperpixels are independent.

      To test this model, we compared GLM and GQM fits as we did in the analysis of the white noise NLI. The regressors in this analysis were the modulations of the three color channels from a single subfield. These GLMs fit the data systematically worse than GQMs as assessed by cross-validated prediction error. This result indicates that the nonlinearity of the NSNDO cells is unlikely to be a result of nonlinear combination of inputs from two linear RF subfields, as postulated by the model in Figure 1C. Instead, for many NSNDO neurons the nonlinearity appears to arise from nonlinear combinations of signals within individual subfields. We mention in the Discussion that linear DO cells may lie on a continuum with some NSNDO cells.

      Color properties of subfields. The study measures detailed properties of cells that show at least two distinct subfields in the initial pixel white noise analysis. The paper focuses on whether signals from such subfields are combined linearly before any downstream linearities. However, there is another feature of the data that seems central to understanding these cells, and that is what the chromatic properties of these subfields are, and how strong in the data the constraint that the chromatic properties of the two separate subfields be complementary is. It is stated in passing (page 7) that "the two sides of the hyper pixel STA were complementary or nearly so", but it would be nice to see this treated in more detail and also to understand whether there are differences in the distribution of the chromatic properties of the two sides between the DO and NSNDO cells, and between cells with low and high non-linearity indices.

      We have added new section on the chromatic properties of the subfields of the neurons we studied (Figure 2 – Figure Supplement 3).

    1. Author Response

      Reviewer #2 (Public Review):

      In an extensive analysis of zebrafish wild-type vs. mutant RNA seq datasets, the authors find that differentially expressed genes are often enriched in the chromosomal region of the mutated gene.

      An older paper by Miller et al. (2013, Genome Research 23: 679) also analyzed RNA seq data on wild-type and mutant zebrafish (in their case with the main goal of identifying the mutated gene), and they noted only a small number of differentially expressed genes near the mutated locus. White et al. could mention this paper and consider methodological differences that may explain these seemingly different conclusions.

      We have added some sentences discussing the Miller paper (Lines 253-257).

      By genotyping and performing RNA seq on individual animals from a large cross, the authors obtain convincing evidence that genomic polymorphisms cause many genes to be differentially expressed in different wild-type zebrafish strains. They show that most of the differentially expressed genes near mutant loci are likely to be caused allele-specific expression differences in linkage disequilibrium with the mutation rather than by any action of the mutated gene. The authors illustrate how one can determine which nearby genes may actually be regulated by the mutated gene, using polymorphisms among wild-type chromosomes and different mutant alleles of the muted gene. Another possible, complementary approach might be to rescue the mutants with a wild-type transgene, which should rescue the mutant phenotype and genes that are regulated by the mutated gene but not affect differentially expressed genes caused simply by allelic differences. In some cases, one could also use transgenic over expression of the gene of interest to compare loss of function and gain of function of the gene to assess possible inverse effects on target genes (e.g. a putative target gene may be reduced in the mutant and increased in transgenic over expression animals). As the authors note, these approaches would represent significant extra effort, and they reasonably suggest that a simpler alternative is for investigators to consider the chromosomal position differentially expressed genes when interpretating their RNA seq data from outbred strains.

      We have added a section about complementary approaches, including overexpression, that could be used to determine if differential expression is downstream of a mutation or not (Lines 303-310).

      Reviewer #3 (Public Review):

      The authors used transcriptome analyses by RNA-seq to identify differentially expressed (DE) genes in a series of previously identified forward genetic mutants emerging from outbred crosses as well as in clusters of wild type zebrafish embryos emerging from newly-generated cross of well-defined genotypes. The authors present experiments, which convincingly demonstrate physical linkage of DE genes to the mutated locus and their predominant localisation on the mutation-carrying chromosome. Next the authors demonstrate, that allelic variation of expression is common in a wild type hybrid cross of SAT double haploid strains and demonstrate haplotype-dependent allelic gene expression variation. Finally, White et al. offer an example approach for distinguishing gene expression change caused by a mutation from that caused by allelic variation of expression of genes in linkage disequilibrium with the mutation by analysing segregation of alleles and their expression dynamics.

      The data from a series of mutants and well-defined wild type crosses convincingly demonstrates the impact of strain polymorphism and linkage disequilibrium on differential gene expression. The provided evidence suggests the generality of differential gene expression readouts arising independently from generated mutations in outcross experiments in zebrafish.

      These observations are potentially important for the design of transcriptomic analyses of forward genetic screens and other experiments involving RNA-seq from outcrosses such as inter and transgenerational epigenetic inheritance studies.

      Evidence on the actual impact of misinterpretation of gene expression differences on biological conclusions drawn from mutants generated in outbred crosses would strengthen the study.

      The conclusions of this manuscript are well supported by the experimental data, some aspects would benefit from further clarification.

      1.) Figure 4 demonstrates separation of differential expression due to sox10 mutations from those arising from allele-specific variation in LD with sox10 by providing an individual example for both. In this section a global demonstration of the distinct segregation-associated expression dynamics would strengthen the claim. It is recommended that the expression variation for the full set of genes quoted in the text (10 and 15 genes respectively) are shown.

      We have included boxplots of all the genes in Figure 5C showing the variation in expression by genotype and split by those most likely downstream of sox10 and those consistent with ASE.

      2.) Demonstration of the importance of the problem of appropriately drawing conclusions from RNA-seq data may be achieved by comparing the features of mutation-dependent and mutation-independent differentially expressed genes in relation to the biological or biochemical functions of the mutated gene.

      We have looked at GO enrichments across all the experiments and the expression patterns of the two sets of genes on chromosome 3 in the sox10 experiment (Lines 188-216).

    1. Author Response

      Reviewer #1 (Public Review):

      In the present manuscript, the authors investigate regulatory roles of class IIa histone deacetylases (HDACs) in Schwann cells on developmental myelination, as well as on myelin repair after acute nerve injury. The study directly builds on previous observations (Gomis-Coloma et al., 2018) where the authors have shown that the primary HDACs of Schwann cells, HDAC4 and HDAC5, have redundant functions and cause only a mild delay in myelination in a double knock out (dKO), suggesting compensatory mechanisms by other HDACs. In the present study the authors indeed show compensatory upregulation of HDAC7 in HDAC4/5 dKO. They furthermore show by ablating all three HDACs that, next to a induction of HDAC9 expression, myelination is further delayed and the architecture of Remak bundles even permanently altered. The authors provide high quality data employing a broad spectrum of methodology, including conditional mutagenesis in mice, electrophysiology, immunofluorescence, electron microscopy, RNAseq, ChIP, cell culture, qPCR and Western blotting to justify their hypothesis of a regulatory and compensatory role of HDACs in Schwann cells during development and regeneration. The physiological relevance of this compensatory network, however, is not intuitive. Better discussion and elaboration of central findings in triple KOs in comparison to single KOs (and vice versa) would strongly improve the manuscript.

      In detail, the following points may improve the strength of the manuscript:

      1) With regard to the triple mutants (HDAC4,5 and 7) the authors present a data set from P2 to P21 and another at P60. Here, the manuscript would benefit from more comparable data sets for the respective timeline. E.g. the authors show an increased SC number at P21. What happens to these Schwann cells? Are they still present at P60? In line, the authors show that even in the triple mutants the expression of certain genes including cJun remains upregulated. How do the authors explain this upregulation? It would be helpful to know whether these genes remain upregulated in myelinating SC or whether persisting supernumerary SC are responsible for the expression of c-Jun and others at later timepoints (e.g. by IHC)?”

      As is shown in Figure 5L, the total number of Schwann cells at P60 in the uninjured (UI) tKO nerve at P60 is slightly increased, although it doesn’t reach statistical significance in our analysis. Also, an increased area of p75Ngfr expression (a protein expressed by non-myelin forming Schwann cells, but downregulated in myelinating Schwann cells) in MPZ negative cells was consistently observed in the tKO nerves (Figure 3J). Together these data suggest that the existence of supernumerary non-myelin forming Schwann cells in the tKO nerve. To further explore this hypothesis, we have now performed IF co-localization studies of c-Jun expressing cells with MPZ (Figure 5-figure supplement 3B of the revised version of the manuscript) and found most of them are MPZ negative (arrowheads), supporting the idea of the existence of supernumerary Schwann cells at the Remak bundles. Nevertheless, we found some of them are MPZ positive as well (arrows). It is worth mentioning it has been previously shown that myelinating Schwann cells (in the c-Jun OE mice) can express moderate levels of c-Jun (doi: 10.1523/JNEUROSCI.0986-17.2017). Thus, our data suggest that although c-Jun is increased mainly by the existence of supernumerary Schwann cells at the Remak bundles, part of it comes from increased expression by myelinating Schwann cells.

      “2) An important point is the description of the Remak- SC phenotype, which, in contrast to the only transient myelination phenotype, seems to persist in triple mutants. The authors suggest a defect of axonal segregation independent of a sorting defect and link this to a ectopic expression of genes of the melanocytic lineage. Given the importance of the Remak phenotype, a more detailed elaboration of this aspect also in dKO and cKO would be a strong benefit for the manuscript. In addition, the proposed ectopic expression of the melanocytic lineage genes would profit from a more extensive discussion and description with regard to their potential (transient) expression in wildtype Schwann cells and their functional relevance in relation to the observed Remak SC pathology. Moreover, the EM image in figure 2E suggests not only an increased number but also size of axons in the Remak bundles of triple mutants, in contrast to the respective quantification. As this point is crucial with regard to a potential sorting defect, the authors should carefully reevaluate the discrepancy between the presented image and data.”

      We didn’t observe gross abnormalities in the structure of the Remak bundles in the single cKOs neither the dKO. However, and following the suggestion of the reviewer, we have now analyzed in more detail and quantified the Remak phenotype in these mice. As is shown in Figure 2-figure supplement 1 A of the revised manuscript, no major changes can be observed in these genotypes. Thus, the segregation defects of small size axons are found exclusively in the tKO.

      Also, we have now discussed more extensively the expression of melanocytic markers and their putative role on the Remak phenotype in the results and discussion sections.

      In both control and tKO, we occasionally observe axons larger than 1 µm in diameter, however, a detailed quantification showed us that axon size distribution is essentially identical in both genotypes (Figure 2E). We apologize for selecting a non-representative EM image of the Remak bundles of the tKO. We have now substituted this image in the revised manuscript by another one more representative of the quantification. We hope the reviewer will find it adequate.

      “3) Regarding the expression changes of HDAC7 and HDAC9 in mutant mice: The authors only show HDAC7 expression at P60, while the proposed role of HDAC7 concerns early postnatal development. Could the authors comment on the expression of HDAC7 at earlier timepoints?

      Furthermore, within the manuscript, the authors suggest a "de novo" expression of HDAC9 in triple mutants. However, the authors show a small, but significant upregulation of HDAC9 already in single cKO4 nerves (Fig S1A) as well as in single cKO7 mice (Fig. 9A), hence a more careful usage of the term "de novo" may be advisable.”

      Following the suggestions of the reviewer, we have now included HDAC7 expression in the nerves of the dKO mice at P2 and P8. As is shown in Figure 1-figure supplement 1C of the revised manuscript, it is increased in both cases, although no so much as in the P60.

      When we use the “de novo” term we refer that its expression is new when compared to the controls (cKO5) and wild types. We apologize for not being clear enough in the original manuscript. We have now revised the text and restricted and explained more carefully the meaning we pretend to communicate with the use of the term “de novo”.

      “4) In general, the discussion of the single HDAC knockout mutants is sometimes too sparse. This applies especially to the description of the cKO4 mice, which show a number of, albeit subtle, important differences with regard e.g. to the number of unmyelinated axons at P2 and P8 as well as with regard to the number of Schwann cell nuclei. However, the authors conclude that the single KO does not show a prominent phenotype. Though, given the compensatory mechanisms between HDACs in SC and the fact that the double HDAC4 (in SC) and HDAC5 (global) knockout display a similar phenotype to single HDAC4 mutants, this point requires more discussion. This dKO dataset, however, is redundant to the previously published study by the authors (Gomis-Coloma et al., 2018)”.

      We found no changes in the cKO7 and only a slight increase in the number of unmyelinated axons in the cKO5 at P8, that fades out when it is calculated as percentage (Figure1-figure supplements 4 and 5 in the revised manuscript).As pointed out by the reviewer, there are subtle although consistent differences in several myelination parameters in the cKO4 (Figure1-figure supplement 2). However, these differences are not as big as in the dKO (Figure1-figure supplement 5). Thus, whereas the cKO4 has a decrease in the 15 % of myelinated axons at P2 when compared with the wildtype, the dKO shows a more prominent decrease (24,5% compared to its control).

      As we have used a large number of genotypes in the manuscript, we have tried to focus in those genotypes with more prominent phenotype, namely the dKO and the tKO. However, we agree with the reviewer that the existence of a subtle phenotype of the cKO4 deserves to be more clearly stated in the text. For this reason, we have rewritten the results and discussion sections in the revised version of the manuscript to highlight this phenotype more clearly.

      On the other hand, we also agree with the reviewer that part of the dKO presented as supplementary data overlaps with our previous data (Gomis-Coloma et al., 2018). We repeated these experiments, in parallel with the other genotypes, and performed a slightly different quantification to be able to compare more accurately its phenotype with the phenotypes of the other genotypes used in the current study. We hope the reviewer will be able to see it more as a positive thing than a criticism.

      “5) The authors then tested the mutants after injury. The presentation of data from these experiments, however, is a bit confusing as it is going back and forth between nerve crush and cut, different mutants (cKO4, KO5, dKO, tKO) and time points of analysis (10dpi, 20dpi, 21dpi, 30dpi). All mutants show a decreased remyelination after crush, the dKO and tKO further present increased c-Jun mRNA and protein at 10dpi and reduction of Krox20, Mbp, Mpz, Periaxin.”

      We apologize we were not clear enough in the manuscript. We have tried to fix these problems and hope the reviewer will consider we have been clearer in the revised version of the manuscript.

      As in the case of development, cKO7 did not show any significant change in remyelination after nerve crush injury (Figure 5-figure supplement 1). cKO5 showed minor and not consistent changes in remyelination after crush injury (Figure 4-figure supplement 2). Thus, for example, it shows a decrease in MBP mRNA but no changes in Prx nor MPZ (Figure 4-figure supplement 2M). Consequently, our conclusion is that they have no prominent phenotype for remyelination. However, and as the reviewer points out, cKO4 shows subtle but consistent delay in different remyelination parameters (Figure 4-figure supplement 1). We have now tried to highlight this fact in the revised manuscript.

      “The sequencing results are said to be obtained after nerve injury, however, it is not clear whether this was a cut or crush.”

      In this experiment it was a crush injury of the sciatic nerve. Along the whole paper we used the crush model of injury to study the regeneration and remyelination of the nerve, as it produces axotomy but maintains intact the perineurium allowing the growth of axons through the distal stump and the regeneration of the nerve (doi: 10.7554/eLife.62232). We have used a cut model of injury only for the studies on myelin clearance and the activation of the Schwann cell repair phenotype, as it avoids the entrance of the axons from the proximal stump. We have now modified the Results and Material and methods sections to explain this in more detail. We hope the reviewer will find it clearer in the revised version of the manuscript.

      “Four days after nerve cut in tKO, the authors report increased expression of genes typical for repair Schwann cells, as well as a more rapid myelin debris clearance, although it is unclear how this was measured. Only by quantifying the number of still intact myelin profiles early after injury as in figure 5A? If the authors would like to stress the point of myelin clearance, additional information on degeneration profiles and autophagy (LC3bI-II, p62 Western blots) or data on macrophage abundance is needed and would gain meaningful insight.”

      The aim of this experiment was to know if the remyelination delay is caused by a problem in the activation of the repair phenotype and/or in myelin clearance. Our results clearly show that this is not the case, as they seem to work properly in the tKO. Strikingly, we observed that both are even faster than in controls.

      We apologize for not being clear enough in the text with the method used for quantifying myelin clearance. We have now included a more detailed description of the protocol used in the Material and methods section of the revised version of the manuscript. We quantified both the number of intact myelin profiles and the amount of myelin protein zero (MPZ) to monitor the elimination of myelin debris in the distal stumps. As suggested by the reviewer, we have now also quantified autophagy by measuring LC3bI-II by WB and found no changes in the tKO. Also, no changes in other autophagy markers were found by RTqPCR (Figure 5-figure supplement 2 G and H of the revised manuscript). After the suggestion of the reviewer we have also quantified the number of macrophages in the distal stumps at 4d after cut and found no changes between tKO and control (Figure 5-figure supplement 2I).Thus, the consistently increased myelin clearance found in the tKO mice is not caused by accelerated autophagy/myelinophagy neither increased numbers of macrophages. Although we don’t know the mechanism, we now entertaining the possibility that accelerated axonal degeneration induced by signaling molecules derived from tKO Schwan cells could underlie this phenomenon. Future studies are therefore necessary to address this point

      “6) Mechanistically, the authors investigated the genes that respond to HDACs or to which HDACs bind. It is nicely shown that HDAC4 can bind the c-Jun promoter, thereby repressing its expression, but also to the TSS of Mcam, belonging to the melanocyte lineage. However, a potential role of this finding is not further clarified.

      We have now discussed more extensively the putative role of class IIa HDACs in repressing the expression of Mcam and melanocytic lineage genes in SCP in the revised manuscript. We hope the reviewer will find it clearer.

      “In addition, the generalized conclusion that "class IIa HDACs bind to and repress the expression of melanocyte lineage genes and negative regulators of myelination allowing myelination and remyelination proceed in a timely fashion" may be revised, considering that only HDAC4 has been tested”.

      We apologize for this generalization. We have now removed this sentence and modified the text avoiding unnecessary generalizations in the revised manuscript.

      “On the other side, it is nicely shown that c-Jun can bind to the HDAC7 promoter, inducing its expression. This is well analyzed both in vitro and in vivo using conditional c-Jun gain and loss of function in SC development. Here, although ectopic c-Jun overexpression in mice artificially increases HDAC7 expression in development, adding a more (patho-)physiological relevant experiment using c-Jun cKO in a nerve injury paradigm would be an asset.”

      We agree with the reviewer that it will be very interesting to explore what happens with the induction of HDAC7 in a c-Jun cKO background. To this aim, we generated the dKO; c-Jun cKO genotype and measured HDAC7 gene expression in the sciatic nerves of these mice. As is shown in Figure 8G of the revised manuscript, the absence of c-Jun in Schwann cells totally prevents the compensatory overexpression of HDAC7 at P8 and P60.Interestingly we have also found an increase in the expression of HDAC9 in these mice, probably to compensate their incapacity to upregulate HDAC7. We believe that this, together with our previous data, strongly supports the view that c-Jun regulates the compensatory expression of HDAC7, and hope it will fully convince the reviewer.

      “7) The final hypothesis from the authors is, that upon lack of the functionally redundant HDAC4/5 and the concomitant de-repression of c-Jun, HDAC7 is upregulated upon binding of c-Jun to compensate for the loss and ensure myelination, although delayed. If HDAC7 is also lost, Mef2d expression increases and induces "de novo" expression of HDAC9. The data presented in the manuscript indeed provide evidence of a role for HDAC4, HDAC5 and HDAC7 in developmental myelination and nerve repair with compensatory potential for each other. However, the physiological relevance of this compensatory functions is, although interesting, not quite clear and the manuscript may profit from a discussion of this point.”

      We have discussed the potential physiological role of genetic compensation in the first paragraph of the discussion section (page 14). We apologize if we have not been clear enough. We have included now a more extensive discussion on the putative physiological relevance of gene compensation in myelination in the revised version of the manuscript. Fluctuations in gene expression (noise) is a well-known phenomenon that has been described from bacteria to mammalian cells, and may have dramatic effects on fitness if they persist long enough (DOI: 10.1126/science.1105891). Kafri et al 2006 (doi.org/10.1073/pnas.0604883103) suggested that gene redundancy has been evolutionarily selected because it can reduce the harmful effects of gene expression noise. On this basis, we speculate in the revised manuscript, that the genetic compensation could avoid fluctuations in the gene dose of class IIa HDAC in Schwann cells consequence of gene expression noise, allowing differentiation and the proper myelination of the peripheral nervous system. We thank the reviewer for helping us to see that a better clarification of this point was needed.

      "Reviewer #2 (Public Review):

      The classIIa Histone De-Acetylases (HDAC) play important roles in the transcriptional control of differentiation of a wide range of cell types. This class of HDACs is regulated by different signalling pathways and it involves the shuttling of the protein into the nucleus. Indeed, previous work from this lab has demonstrated that increased levels of cAMP shuttles HDAC4 into the nucleus of Schwann cells where it recruits NcoR1/HDAC3 to repress c-Jun expression and allows commencement of a myelin-related gene expression program. Thus, HDAC4 links cAMP signalling to repression of a 'repressor' to stimulate cell differentiation. However, genetic deletion of HDAC4 (or HDAC5 and HDAC4/HDAC5) does not have a significant effect on Schwann cell differentiation and myelination in vivo, suggesting that other compensatory mechanisms might exist.

      Building upon their previous work, Velasco-Aviles and colleagues now demonstrate the existence of a genetic compensatory mechanism that relies on functional redundancies among the ClassIIa HDACs and the transcription factors c-Jun and Mef2d.

      Using genetic ablation of multiple HDAC genes, extensive morphological analysis of developing and regenerating nerves combined with gene expression analysis, provide a description of the gene regulatory mechanisms that maintain adequate levels of ClassIIa HDACs required for peripheral nerve development and repair. Their data are of high quality and support their major finding.

      One interesting finding is that in the tKO, in which myelination eventually appears to progress normally, Remak Schwann cells are deficient in segregating lower calibre axons into cytoplasmic cuffs (Figure 2E). The authors interpret this a segregation defect and not as a sorting defect (page 5). Now, it is difficult to see how these two cellular mechanisms can be distinguished or whether they are different mechanisms to begin with. Notably, the unsorted bundle of axons presented in Figure2E also contains larger calibre axons that should normally be myelinated. Therefore, a simpler interpretation is that tKO Schwann cells are moderately impaired in axonal segregation, which results in the failure to sort out the occasional larger calibre axons from bundles and ensheathment of the smaller calibre axons into mature Remak bundles.”

      We apologize for selecting a non-representative image of the Remak bundles in the tKO. As it has been explained before in the response to the reviewer#1, when the Remak bundle phenotype was quantified we observed no changes in the number of axons bigger than 1 µm in diameter (see Figure 2E). We have now changed the image presented by another one more representative of the phenotype.

      “There is no justification for proposing a 'segregation' mechanism different from the 'sorting' mechanism. As the sorting process critically depends on the elaboration of a basement membrane, it would be of interest to have a closer look at the basement membrane in EM and by IF in nerve sections and maybe WB. Is there any evidence for reduced laminin/collagen (or their receptors) expression in tKO nerves?”

      We have now removed the proposition of a defect in segregation, in contraposition of a sorting defect, in the revised version of the manuscript. However, whatever the mechanism involved, it is worth mentioning that we couldn´t observe morphological differences in the basement membranes in the tKO by EM. Also, our RNAseq data shows that collagens, laminin and integrin receptors and other molecules associated with axon sorting and segregation (doi: 10.1177/1073858415572361) are not decreased in the tKO.

      “It is argued throughout the manuscript that classIIa HDACs are involved in the repression of repressors of myelination. It is stated that in injured nerves a strong upregulation of such negative regulators of developmental myelination is observed (page 17). Regulators such as c-Jun, Runx2, Sox2 etcetera. To avoid confusion, it is important to clearly distinguish between developmental and repair functions (exemplified by c-Jun) and in Schwann cells cultured in the absence of axonal contact.”

      We apologize for not being clear enough. We have now revised the text and changed it to distinguish more clearly between developmental and repair functions. We hope the reviewer will find it less confusing in the revised manuscript.

      “Confusingly and erroneously, it is also stated that the POU domain transcription factor Oct6 blocks the transition from promyelinating Schwann cell into myelinating cells. The quoted paper does not support this idea at all. On the contrary, it demonstrates that Oct6 expression is required for the progression of promyelinating cells into fully myelinating cells.”

      We agree with the reviewer in that this reference elegantly demonstrates that Oct6 is required for the progression of promyelinating cells into fully myelinating cells. However, it is worth noting that it has been also demonstrated that Oct6 needs to be properly downregulated in a timely fashion to allow myelination (Ryu et al 2007; DOI: 10.1523/JNEUROSCI.5497-06.2007). Thus, the upregulation of Oct-6 and its posterior downregulation are both necessary to permit peripheral nerves myelination. We apologize for forgetting to include this reference in the manuscript. We have now fixed it in the revised version of the manuscript. We hope the reviewer will find now coherent our arguments.

    1. Author Response

      Reviewer #3 (Public Review):

      Strengths of the paper include the use of complementary techniques to characterize both peripheral and central defects in TMEM16A knockout mice. They use calcium imaging in excised cochlear preparations to demonstrate the altered patterns of calcium waves in TMEM16A knockout mice, which are far smaller and faster than in control animals, and less frequent. Tests of auditory function (ABRs) in normal animals demonstrate overall normal hearing thresholds and auditory nerve function. They use in vivo recordings from medial nucleus of the trapezoid body (MNTB) neurons identified by their characteristic pre-spike to show altered patterns of spiking in developing MNTB neurons, with a smaller coefficient of variation that indicates less bursting activity. In hearing MNTB neurons, TMEM16A mutants have a higher sound threshold, and slightly wider tuning curves. They then use in vitro recordings from neurons of the lateral superior olive (LSO) which receive tonotopically refined inhibitory synaptic inputs from MNTB neurons to compute horizontal sound localization paired with glutamate uncaging activation of pre-synaptic MNTB neurons to demonstrate that single LSO neurons receive inputs from a larger region of the MNTB in TMEM16A mutants compared to control animals. This further indicates aberrant refinement of tonotopically distributed neurons in the ascending auditory system.

      There are a few weaknesses in interpretation of the data. There is an unsupported claim that TMEM16A is upstream of ATP release from inner supporting cells, the opposite of which has been proposed by other groups.

      We would like to thank the reviewer for the appreciation of our work and the comments that helped us to further improve our manuscript.

      We agree with the reviewer that we have no clear evidence that TMEM16A is upstream of ATP release. Our data and the data from the literature stipulate the idea that TMEM16A may amplify ATP release from ISCs via connexin hemichannels either due to changes in cell volume or changes in membrane potential and thus promotes the propagation of Ca2+ waves. Since many epithelia, which express TMEM16A, exhibit ATP-dependent Ca2+ waves, this scenario may apply more generally. We changed the discussion in our revised manuscript accordingly.

      The authors did not adequately discuss whether TMEM16A may influence other parts of the sound localization circuitry besides cochlear supporting cells.

      As we show in the supplementary Figure 4, TMEM16A is not expressed in auditory pathways of the brainstem. To be more explicit, as requested by the reviewer, we modified the discussion section: “Because neurons of the auditory pathway do not express TMEM16A, less bursting activity and impaired tonotopic refinement of auditory projections in cKO mice likely stem from less synchronized prehearing IHC acitivity: In the absence of TMEM16A mediated Cl- efflux, no simultaneous K+ release will be triggered, and hence, the firing patterns of neighboring IHCs will not be synchronized.”

      The alterations in frequency selectivity of MNTB neurons in TMEM16A mutants seems too subtle to account for the broader deficits in frequency-specific inputs to LSO neurons, which could be discussed.

      The alterations in frequency selectivity, though significant, indeed seem insufficient to account for the differences observed in the LSO. However, the LSO receives projections from both ears, and there are likely some changes at the level of the cochlear nucleus (CN) as well. It is possible that the deficits in frequency-specific inputs to LSO neurons converge from changes in both the CN and the MNTB.

      Finally, the authors did not adequately discuss how their present work fits in with previous work using extremely similar techniques in a different knockout animal that also implicated patterns of spontaneous activity in the cochlea in tonotopic refinement of ascending auditory projections (Clause et al 2014). The authors should compare the patterns of aberrant bursting measured in MNTB neurons in TMEM16A mutants to those measured in alpha9 AChR knockout animals used in the Clause et al 2014 paper. Then, they should clearly state what their work adds to the existing literature, namely that while other papers have linked the bursting patterns of the ascending auditory system with tonotopic refinement of MNTB projections to the LSO, this work more clearly links an earlier step in the process, the spreading calcium waves in developing supporting cells, with refinement of ascending circuits.

      Thank you for your suggestions. The discussion was revised and following lines were added: “While otoferlin KO mice have almost no discernable bursting activity, TMEM16A cKO mice showed a drastic reduction of the number of bursts. In contrast, the number of bursts was not changed in α9 KO mice. Firing rates within bursts were 7080% higher in α9 KO compared to WT mice but did not differ between TMEM16A cKO and WT mice. Overall firing rates, however, do not differ between both KO models. As a consequence, we infer that the overall firing rate and firing rates within bursts, as well as the number of bursts, do not influence the physiological refinement of the MNTB-LSO pathway. Notably, the duration of bursts was markedly reduced in α9 KO and TMEM16A cKO mice (50% in α9 KO, and 56% in TMEM16A cKO mice). Average ISIs, however, were significantly longer in TMEM16A cKO and shorter in α9 KO mice. Thus, it seems that both subtle and drastic changes in the temporal pattern and/or the duration of bursts can lead to a severe disruption of tonotopic maps.

      Although recent publications already established a causal connection between the bursting patterns of the ascending auditory system and the tonotopic refinement of MNTB projections to the LSO (Clause et al. and Müller et al.), our work links the TMEM16A-dependent amplification of ATP release in developing supporting cells with the refinement of ascending circuits.”

    1. Author Response

      Reviewer #1 (Public Review):

      Lopes et al present an exploration of the functional interactions between developing CD4+ T cells and mTEC in the thymus. The study is interesting both because the precise differentiation stages for mTEC development is currently in flux with substantial recent discoveries, and because the manner in which developing T cells influence this development is also currently in question. The finding that CD4 cells induce a transcriptional reprogramming in mTEClo cells to induce cells suitable for orchestrating T cell development is therefore novel and interesting. The identification of a propensity to multi-organ immunity adds further to the impact of the work.

      We thank Reviewer 1 for highlighting that this study is novel and interesting, especially because the impact of thymocytes on the development of specific mTEC subsets remains elusive.

      A weakness of the work is that, given the complexity of the two-way interaction between CD4 cells and mTEC, some of the experimental interventions are somewhat blunt, leading to conclusions that are not well supported by the results. For instance, the differences in NFkB signalling described in Fig 1 are measured on total mTEClo cells from 'deltaCD4 mice. Given that the premise of the study is heterogeneity in mTEClo cells, it seems important to address whether these differences relate to the differences in representation of the different mTEClo populations (which might exhibit different NfkB signalling) before inferring a direct effect on signalling. Similarly, since the knockout was directed to the mTEC, it is not clear that the phenotype relates to CD4 deficiency. Thus, the phenotype might well be influenced by subtle changes in mTEC composition rather than direct effects on signalling.

      We agree that differences observed in IKKα and p38 signaling in mTEClo described in Figure 1 could be influenced by the composition in mTEClo subsets. Unfortunately, it was technically not possible to detect total and phosphorylated IKKα and p38 proteins in tuft-like mTEC by flow cytometry. Indeed, antibodies directed against these proteins are all produced in rabbit, similarly to the anti-DCLK1 antibody used to identify tuft-like cells, and are all detected through anti-rabbit secondary antibodies. Moreover, there is no valuable markers to date that allow the identification of post-Aire and TAC-TEC cells by flow cytometry. Therefore, we cannot exclude that an altered composition in mTEClo subsets in deltaCD4 mice could influence the level of total and phosphorylated IKKα and p38 proteins. Nevertheless, given that there is no defect in the proportion of CCL21+ cells that represent the majority of mTEClo (cf. new Figure 3D) and that there is only a slight reduction in the proportion of DCLK1+ tuft-like cells in mTEClo (cf. Figure 3E), it is unlikely that the substantial and homogeneous reduction in the levels of Phospho-IKKα and Phospho-p38 observed in deltaCD4 mice (cf. Figure 1A) could only rely on mTEClo subset composition. Overall, these observations argue in favor of defective IKKα and p38 signaling in absence of CD4+ thymocytes. This point is discussed in lines 396-400.

      More generally, the single cell analysis that forms the major part of the manuscript is difficult to interpret given the context - that dynamic changes in the differentiation state of this heterogeneous population of cells is likely to lead to differences in gene expression states, but the 'snapshot' analyses inherent to this single cell analysis does not allow for dissection of cause and effect. For instance, Figs 1- 3 convincingly demonstrate that the mTEC composition is different in the different mice, and that signalling and transcription is different in the mTEClo precursors. Demonstration of a functional connection between these two observations would add substantially to these findings.

      We analyzed single-cell RNA-seq data derived from Wells KL et al. (Elife. 2020) (cf. new Figure 1-figure supplement 4) to determine the respective expression pattern in mTEC subsets (i.e. CCL21+, TAC-TEC, Aire+, Post-Aire and Tuft-like cells) of genes upregulated by self-reactive CD4+ thymocytes. Some of these genes were associated with Post-Aire and Tuft-like cells (cf. new Figures 1I, 2K and 4L) in accordance with the reduced proportions of Post-Aire and DCLK1+ Tuft-like cells observed by flow cytometry among mTEClo cells (cf. Figure 3E, Figure 5H, Figure 3-figure supplement 2A and Figure 5-figure supplement 3A). Importantly, many of these genes were highly expressed by Aire+ mTECs, indicating that self-reactive CD4+ thymocytes enhance the transcriptional activity in mTEClo accompanying the transition to Aire+ mTECs. Accordingly, several of these genes were already expressed in the precursors of Aire+ mTEChi, recently called TAC-TEC (for transit-amplifying TEC). These new results are now discussed in the text.

      Reviewer #2 (Public Review):

      The study by Lopes et al focuses on the role of thymocyte-epithelial cell cross-talk in the thymus and aims to determine the role of thymocyte derived signals in the differentiation of thymic epithelial cells. The study uses three different knockout models in which the thymocyte derived signals are defective and studies the resulting effect on mTEC maturation. The study suggests that these signals indeed play a crucial role in mTEc maturation and proposes a novel mechanism by which the developing T-cells direct the functionality of the thymic stromal compartment.

      The study is mostly well designed and performed and the manuscript well written. Although the conclusions are largely based on substantial scientific evidence few points should be addressed in order to make the message of the study more precise and clearer:

      We thank Reviewer 2 for highlighting the relevance of our study in providing a novel mechanism by which the developing T cells direct the functionality of the thymic stromal compartment.

      1) According to the recent scRNA sequencing studies (reviewed in Kadouri et al 2020), the mTEClow mTECs contain at least two distinct subpopulations: the functionally mature CCL21-producing mTEC I and the immature mTECs giving rise to mTEC II and III. In its current form, however, the manuscript largely ignores the presence of mTEC I cells. The authors should make effort to analyze the changes in this population in the knockout models (by sequencing and/or qPCR) and cover this population also in the introduction and discussion.

      We now analyzed in this revised version the CCL21-producing mTEC I subset by flow cytometry in the three distinct transgenic mouse models used to decipher the impact of CD4+ thymocyte interactions on the mTEC compartment (cf. New Figure 3D, new Figure 5G and new Figure 3-figure supplement 3B). Using single-cell RNA-seq data, we also investigated whether genes upregulated by crosstalk with CD4+ thymocytes are specifically associated or not with CCL21+ mTECs (cf. New Figures 1I, 2K and 4L). Overall, we found that self-reactive CD4+ thymocytes have a moderate or no impact on CCL21+ mTEC in the different mouse models analyzed. The mTEC I subset is now covered in the introduction and these new results are discussed.

      2) As together with the classical mTEC classification (mTEChigh etc), the new scRNAseq based classification of mTECs (mTEC I etc) is used more and more often, it would be helpful to give in parallel the names of the subpopulations according to both of these classifications, at least when different mTECs are described/introduced in the beginning of the manuscript.

      Thank you for this comment. We now mentioned in the introduction the new scRNAseq based classification of mTECs (i.e. mTEC I-IV; cf. lines 71-74) when mTEC subsets are described. We also used this classification when we have analyzed single-cell RNA-seq data in order to deepen and strengthen our study (cf. new Figure 1-figure supplement 4 and lines 150-155)

      3) In Figures 2 and 4 the authors show data only on selected chemokines, cytokines and adhesion molecules and make a conclusion suggesting that these groups of proteins are down-regulated. To make this conclusion, however, the authors should analyze/show the whole groups i.e. all chemokines, cytokines and adhesion molecules (as they do for TRAs). Alternatively, the authors should be more careful/specific with their conclusions. The same is true for HDAC3 regulated transcriptional regulators and transcription factors.

      Heatmaps described in Figures 1, 2 and 4 show the whole set of chemokines, cytokines and adhesion molecules that was statistically differentially regulated in the mutant mice analyzed. Similarly, we only show HDAC3-regulated transcriptional regulators (cf. Figure 1G, Figure 2I and Figure 4I) and activation factors (cf. Figure 2G and Figure 4H) that were statistically differentially regulated. We clarified this point in the corresponding legends.

      Reviewer #3 (Public Review):

      The authors have performed extensive studies to analyse the role of MHCII/TCR interactions in shaping mTEC differentiation. This has been an important question in the field. There are at least two different messages in the manuscript which are related but make the authors' message less clear; -the main message appears to be that the absence of MHCII/TCR interactions between mTECs and CD4+ alters the mTEClo compartment -a secondary message is that disrupted MHCII/TCR interactions between mTECs and CD4+ thymocytes lead to an altered TCRVβ repertoire (see comment below).

      The authors conclude that their RNAseq data in figures 1 and 4 show that genes are upregulated/downregulated. However, it could also be that their differential gene/cytokine expression is due to the presence of different mTEClo subsets, and the authors show this in figure 3. This would change the conclusion to: CD4+ thymocytes alter mTEClo differentiation states, associated with differential gene expression. This is also the case in figure 2. For instance, Lopez et al state that AIRE expression 4.5-fold higher in mTECdMHCII cells but then they show that there are different percentages of AIRE+ cells (change in the mTEClo subsets in the ko mice).

      To clarify whether differences in gene expression observed by RNAseq could be due to the presence of different mTEClo subsets, we analyzed their respective expression pattern in mTEC (i.e. CCL21+, TAC-TEC, Aire+, Post-Aire and Tuft-like cells) by analyzing single-cell RNA-seq data recently published by Wells KL et al. (Elife. 2020) (cf. new Figure 1-figure supplement 4). Some differentially regulated genes were associated with Post-Aire and Tuft-like cells (cf. new Figure 1I, new Figure 2K and new Figure 4L). These findings are in agreement with the reduced proportions of Post-Aire and DCLK1+ Tuft-like cells observed respectively by histology and flow cytometry (cf. Figure 3D, Figure 5H, Figure 3-figure supplement 2A and Figure 5-figure supplement 3A). Importantly, many of these genes were highly expressed by Aire+ mTECs, indicating that self-reactive CD4+ thymocytes enhance the transcriptional activity in mTEClo accompanying the transition to Aire+ mTECs. Accordingly, several of these genes were already expressed in the precursors of Aire+ mTEChi, recently called TAC-TEC (for transit-amplifying TEC). These new results are now discussed in the text.

      In many instances, mTEClo subsets are shown as percentages but quantifications are presented as total numbers. This is sometimes confusing as percentages of mTEClo cells is often not different between WT, dCD4 and mTECdMHCII mice. Are differences due to lower total levels of thymocytes/ mTECs?

      As mentioned in the “Materials and methods”, mTEC subsets were analyzed on CD45- negative enriched cells purified using anti-CD45 magnetic beads by autoMACS. Then, mTEClo were identified in EpCAM+ total TEC as depicted in Figure1-supplement 1. Thus, differences could not be due to lower total levels of thymocytes/mTECs.

      For a better clarity, we now show both the percentages and total numbers of mTEC subsets (cf. new Figure 3A,C-E and Figure 5F-H). These quantifications show that the percentages of mTEC subsets are significantly altered, but to a lesser extent than absolute numbers. This is explained by the fact that transgenic mice with disrupted “crosstalk with CD4+ thymocytes” have reduced numbers of total mTEC (cf. new Figure 3A and 5B).

      In figure 3, the authors show mTEChi cells in dCD4 and mTECdMHCII mice. How do these cells develop?

      In agreement with our previous studies (Irla et al. Immunity 2008 and Plos One 2012), although strongly affected, the development of mTEChi cells is not fully abrogated in dCD4 and mTECdMHCII mice (cf. new Figure 3A). The residual development of these cells in these mice could rely on invariant NKT that also express RANKL. In agreement with this hypothesis, invariant NKT cells have been shown to participate in Aire+ mTEChi differentiation in the post-natal thymus, but to a lesser extent than CD4+ thymocytes (White AJ et al. J Immunol 2014). We clarified this point in the discussion (lines 441-444).

      The authors state that the TCRVb repertoire is altered in autoreactive T cells developing when MHCII/TCR interactions between mTECs and CD4+ thymocytes are abrogated. This is based on percentages of T cells in different TCRVβ families. To show that TCR selection differs, shouldn't the authors sequence the different TCRs and evaluate constraints on TCR-CDR3 segments?

      We agree that the differences observed by flow cytometry in the percentages of TCRVβ usage are insufficient to conclude that the TCRVB repertoire is altered. We modified the text accordingly in the Results section (cf. lines 97, 368) and discussed that future experiments based on TCR sequencing are expected to clarify this issue (cf. lines 482-489).

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors develop a tool for personalising prostate cancer treatment using a Boolean model. The model is extremely complex and describes the regulation of invasion, migration, cell cycle, apoptosis, androgen and growth factors signalling in prostate cancer using 133 nodes (genes and our metrics) and 449 edges (regulation pathways. Using their model, they were able to grade the effect of combined treatments for each of the 488 patients for already-developed drugs and find several genes suitable for intervention in most of the 488 patients. The predications from their model could help develop a patient-tailored treatment that could boost success of pancreatic cancer treatments in clinical practice.

      Strengths:

      The authors clearly achieved their aims of predicative prostate cancer modelling and have added value to the field of prostate cancer personalisation.

      Calibrating and then validating predications of a model, as this work does, is a fundamental part of systems biology and mathematical modelling. By using a cell line to investigate predictions that AKT is the top hit for prostate cancer, validates the utility of their model and also shines a light on how useful models like this can be in oncology. The methodology in this paper provides a guide for future modelling work in this area.

      Providing a detailed Supplementary Information and additional links to the code and fundamental modelling platform publications, helps to provide readers with a tool that may be applied in other settings. However, while this is a strength of the publication, the model is extremely complex and relies heavily on readers spending time comprehending pre-published work and doesn't provide a single contained body of work.

      The methodology they are presenting could have significant impact on the field of cancer treatment, but would need to be testing clinically to validate that personalising treatment in this manner does improve outcomes.

      We thank the reviewer for these comments.

      Weaknesses:

      While it is a strength of this work that such a detailed, and complex model is developed for prostate cancer, and that the code is provided, the weakness of this work is that the model is not easily accessible, and a lot of the techniques used in model development feel brushed over. The work relies heavily on other works and does not provide detailed descriptions of the underlying algorithm, requiring readers to absorb knowledge from our places. This could be a challenge if an experimentalist wishing to implement this methodology in a different cancer treatment.

      We have summarised the main techniques on which this work relies upon in a dedicated section in the Supplementary Material (Appendix file) by describing small introductions to Boolean modelling, MaBoSS stochastic approach to Boolean models, and PROFILE methods.

      We have also provided the codes to reproduce the figures and the analyses. We tried to comment on the code files (e.g., Jupyter notebook) as much as possible to facilitate their use in different contexts.

      The protein/genes in the model are not presented in a way that it can be easily validated as such, the complexity of such a Boolean model comes into question.

      We have listed all the proteins/genes of the model in SuppFile 1 with references for all the interactions of the network.

      For transparency, we have also described in the Appendix how we used information from all the different sources to construct the model in the section "Prior knowledge network construction".

      How sure are we in the model predications and are there are any potential weaknesses to modelling the network in such an extensive manner? For such a model like this, it is crucial to demonstrate its sensitivity to initial conditions and node additions/removals so some work could be done to demonstrate this so that the readers have an idea of how many over/under predications there might be in the model.

      For the sensitivity to initial conditions, we have tested some of them on the generic model in the Jupyter notebook (provided in the supplementary files) but have not done it systematically. The table of all the stable states can be computed exactly as it is done in the notebook (2460 fixpoints are found), and the simulations of MaBoSS clearly show that the proportions of some solutions (probability of model states) change depending on which input is ON. We have tested some conditions: all inputs random, all inputs at 0, growth factors ON (EGF, FGF, Androgen, and Nutrients ON), death signals ON (Carcinogen, Androgen, Acidosis, Hypoxia and TNFalpha ON) leading to very different outputs (Figure 3 for LNCaP and S22 for all 8 prostate cell lines). In fact, the MaBoSS simulation with all inputs random shows the existence of all possible, stable states as it explores the whole state transition graph: for all nodes, 50% of the trajectories will start at 0, and 50% will start at 1. Similarly, we tested the effect of some mutations on the generic model (e.g. mutation of p53, which reduces the probability to reach apoptosis). The aim of these simulations was to test the overall coherence in the model behaviour vs biological evidence as a first validation.

      As for automatic removal and addition of new nodes to assess the importance of each of them, we would recommend against it. Indeed, the model was built from the knowledge extracted from the literature, from databases (cf. Omnipath), discussions with experts, and results from data analyses. Removing nodes would mean that some nodes are considered less important, and adding new nodes would mean that some new findings were found that would justify a new addition.

      In addition, in this work, we need to balance the robustness of the model with the flexibility of being used to cover the different cell line personalisations. Thus, we do not want a highly robust wild type model that has extremely robust, few stable states but is unable to capture the different cell lines specifics. Nevertheless, we have partially covered this with our "High-Throughput mutant analysis of the LNCaP model" section in Appendix file (Section 6.1), where we study all the perturbations on one node and combinations of two nodes, let them be knock-outs (where a node is forced to be 0 throughout the simulation) or overexpression (forced to be 1). By using this analysis, we wanted to identify the fragility points of the mutants' models, but we did not perform this test to have a thorough robustness analysis. In any case, we found varying effects of these perturbations on the phenotype scores, and double perturbations having a greater effect than single ones.

      Finally, we have performed a perturbation on the stability of the logical rules. We have changed one and two logical gates from each logical rule of the LNCaP model and studied the effects on the phenotype scores. In short, we have changed an AND in OR and vice versa in each logical rule (level 1 with 372 simulations) or twice in the same rule (level 2 with 1263 simulations).

      Overall, we see that all of the most probable phenotypes are very robust to this kind of perturbation. Even the less stable phenotype, Invasion-Migration-Proliferation, only has ~3% of either level 1 or 2 perturbations that reduce this phenotype's probability to zero (Appendix File, Figure S30). Most of these perturbations were focused on HIF1, AR_ERG and p53 nodes (Appendix File, Figure S31).

      We added a sentence in the Methods section to explain this: "In addition, we found that the LNCaP model is very robust against perturbations of its logical rules, by systematically changing an AND for an OR gate or vice versa in all of its logical rules (Appendix File, Section 6.2, Figure S30 and S31)." and added Section 6.2 to the Appendix file titled "Robustness analysis of the logical model".

      As they test so many drugs and combination regimes it is also hard to extract information about which key drugs should be repurposed. It could be useful to the readers to have this spotlighted more in the model so that it is easily discernable.

      The complete study on the inhibition of all nodes of the LNCaP model can be found in the supplemental information (SuppFile 6 and Appendix file, Section "High-Throughput mutant analysis of the LNCaP model").

      Because of the size of the model, we chose to filter the full list of nodes with the list of existing drugs and their targets. Thus, Table 1 gathers the drugs we discuss in this article along with the node that they target. We also studied a selection of combinations of drugs, as depicted in Section "Experimental validation of drugs in LNCaP" of Results. In that section, we focused on the combinations that reduced Proliferation and/or increased Apoptosis. For completeness sake, we provide all the combinations of all the drugs from Table 1 in Appendix File, Figures S34 and S35, and their Bliss score in Appendix File, Figures S36 and S37. Furthermore, the code to reproduce these in our GitHub repository: https://github.com/ArnauMontagud/PROFILE_v2/blob/main/Gradient%20inhibition%20of%20nodes/data_analysis.R

      We could have identified the nodes from Table 1 on the figure of the network (main text Figure 1), but we decided against it because the figure is already hard to read, and colours were added to specify the signalling pathways that are included.

      Suggestions:

      Another way to validate the cohort level predications could have been to examine the efficacy of the predicted personalised protocols, or sensitive parts of the Boolean network, in a new prostate cancer patient cohort. Do we see the same sensitive pathways if we examine a different cohort of prostate cancer patients?

      We thank the reviewer for this suggestion. Indeed we are working on using this pipeline in other cancers and in other studies.

      One of the topics that we think can facilitate the use of this methodology is on optimising its runtime and portability. Thus, we are currently working on having a containerised, HPC semi-automatic workflow to reduce the time and optimise the efforts to get results using (almost) any published model and (almost) any omics data.

      In terms of the reproducibility of the results and as we say in the discussion section of the main text, there is a kind of effect size on this type of study. You may find that for a specific patient, their conclusions are not in line with what is expected, but when you analyse at the level of groups of patients, these outliers dampen off.

      Reviewer #2 (Public Review):

      Montagud et al. present a very successful experiment - modeling feedback loop: the authors develop a Boolean model of the major signaling pathways deregulated in prostate cancer, use molecular data from patient samples to personalise this model, use drug response of cell lines to validate the model, predict 15 actionable interventions based on the model, and test nine of these interventions, confirming four.

      The premise of the work is well-supported by prior work by the team and the wider community. The methods are sound, well integrated and thoroughly documented, with one notable omission. The process through which the logic functions of the nodes were determined/decided is not described. The Appendix file indicates "The model is completed by logical rules (or functions), which assign a target value to each node for each regulator level combination.". The interested reader would want to know what information is used and what considerations are the basis of these assignments, and what would change if an assignment were different.

      The manuscript makes a number of testable predictions of actionable single and combinatorial therapeutic interventions for prostate cancer. Equally important, the combination of information and methodologies used in this paper offers a roadmap for future development of predictive and personalised models. Such models are much needed in precision oncology.

      We thank the reviewer for these encouraging comments.

      Reviewer #3 (Public Review):

      This paper tries to establish a model for drug (and combination) selection for individual prostate cancer patient based on a prior signal network knowledge base and genomic/transcriptomic profiling data. This is of great clinical potential. However, whether this approach could be robustly applied in clinic is not validated. Limited validation using cell line is provided. Most tumors have complex structure including tumor cells and surrounding microenvironment. The model is mainly built from onco-signaling pathways. The contribution of microenvironment including immunity is unclear.

      The focus of this model is intracellular only. We explored the interplay between signaling pathways that may be linked to tumorigenesis. We only consider the microenvironment effect as indirect and in no way comprehensive. For instance, we have not considered any immune cells or the effect of the metabolism.

      Nevertheless, we are building on top of this work a multiscale model where we can include different cell types, such as immune cells, and drug-related pharmacodynamics.

    1. Author Response

      Reviewer #1 (Public Review):

      The subject of this review can be interesting and in principal helpful to the researchers who works on germline mutations. The authors have summarised all the work done in this area over the past 10 years. However, I found parts of the review unsurprising and also I am not sure if the reviewer have convinced the readers what is the best practice for calling gremlin de novo mutations.

      We agree with the reviewer that the limit of our study is that we did not agree on gold standards for calling de novo mutations, but focus on the different parameters that should be considered, how to choose them, and how to report them in a consistent way. More studies with much larger datasets are likely soon to appear and then it will also be more clear if one standard method could fit these. Our study carefully considers all the ingredients of such a future gold standard.

      Bergeron et al present an interesting paper about the biases and implications of the different methods used for the identification of de novo mutations from pedigree dataset. The first part is a review of the different methods and criteria used for the mutation identification (coverage, mapping quality, measure of callable genome among other) in previous studies. The second part of the study is an original approach named "mutationathon" where 5 teams received the same dataset from a pedigree to estimate the mutation rate. The objective here, that is very interesting for the field, is to understand how the different approaches from several teams impacts the mutation rate estimation. The 5 estimates are on the same order, but with one value significantly higher than the others. Moreover, the candidate mutations identified by each team is very variable: about 20-30 mutations candidates but with only 7 true positives in common despite strong criteria during identification steps. This study is very interesting and shows the importance of standard mutation identification method in mutation identification, and the difficulties or biases in the comparison of mutation rates from different publications using different approaches.

      We thank the reviewer for the nice summary of our study and its implication.

      Reviewer #2 (Public Review):

      I commend the authors for their extensive work on summarising large number of germline de novo mutation (DNMs) studies of Human and non-Human trios. They outlined all the methods used by different studies in order to call DNMs. They pointed out different stages which may affect DNMs calling, including; samples size, sample size, library preparation, alignment, variant calling, and post-filter. Finally, by analysing DNMs in macaques pedigree across five groups, they have demonstrated how different strategies in variant calling, may lead to reporting different mutation rates.

      We thank the reviewer for their comments.

      The authors are correct that identification of true DNMs are affected by experimental and analytical strategies. This is a long-time known issue in the field. However, as the authors also mentioned, despite all the variations yet the reported DNMs across different studies are very much in agreement. Indeed, in their Mutationathon exercise on calling DNMs in pedigree of three generations of rhesus macaques they have demonstrated that although all the five groups reported variation in number of DNMs yet the difference in mutation rate is insignificant. Moreover, I am not convinced variability in terms of calling DNMs is a major issue in this field at least not in recent years and specially for Human germline mutation. More recent studies with large number of trios such as analysis of ~12k trios by Kaplanis et al., bioRxiv, 2021 eliminates most of the issues due to systematic noise.

      We agree with the reviewer that the methodology discrepancy does not seem to be an issue in human studies, as all studies now found similar rates over the past decade. The large number of sequenced human trios has helped in the fine-tuning of GATK or Graphtyper genotyping such that the false positive rate has become very low and the callable part of a human genome for 30X coverage is now very well known. However, it may become an issue when comparing different studies on non-human primates which may differ in both overall heterozygosity (most often higher than in humans), repeat organization, and quality of the reference genome. Furthermore, in these species mutation rates are also estimated on smaller pedigrees. This is partly why we choose to do the Mutationathon on a non-human trio, for which the rate is unknown and with each group estimating independently from each other, thus there was little prior expectation for the number of real de novo mutations and callable genome size, but an opportunity to check false positives by subsequent Sanger sequencing. We have now added a sentence in the introduction to explain that the problems are more likely to appear in non-human species (page 3 lines 77-82).

      Having said above it is very helpful to have some guidelines to take different factors into consideration for future experiments. However, there are few issues that I am not sure if the authors have addressed in their review:

      The authors have not address issues with relatedness: the strategy for calling DNMs in multi-sibling families. This is also very important in non-human studies.

      We briefly mentioned the multi-sibling families in the sample size section. Overall, we do not believe that the multi-siblings pedigree should be analyzed in a different way than a unique offspring. Yet, we have now expanded this part to detail the opportunity offered by multi-siblings samples of dissociating mutations that occur during the postzygotic stage from the actual germline ones (page 8 lines 158-167).

      The best practice suggested here is certainly not applicable for different species. Due to differences in selective pressures the number of DNMs in different spices is different. This directly affect the detection method. Moreover, cellular processes causing germline de novo mutations may vary between species. Hence, our mutation calling strategies cannot be generalised across species.

      We agree with the reviewer that some steps of the analysis should be adjusted when working on different species. For instance, we mentioned that the alignment or the variant calling may perform differently on species that are more or less heterozygous. We also agree that different processes may be at play in various species with, for instance, more postzygotic mutations or a larger increase with age. Yet, we do not believe that this will affect the detection method since we basically recommend that each trio is independently called for de novo mutations. Indeed, there is no prior hypothesis on the number of mutations expected. We believe that different sample types or sequencing technologies or genome characteristics could require adjustment in the methods. That is why we proposed various filters and methods to estimate the number of candidate DNMs and the callable genome, as adapting those to each dataset may be necessary. For instance, a species with a highly repetitive genome could lead to difficulties in detecting DNMs, but in these cases a similar method as proposed could be applied as long as the callable genome also excludes the repetitive regions.

      Issues with somatic mutation contamination, as the authors correctly mentioned, can vary depending on the tissue of choice. However, the authors do not suggest a solution. For example, in case of clonal hem what is the solution to overcome this issue and call DNMs? Perhaps, the authors can explore parameters such as cell fraction or purity of the tissue which, can guide the downstream analysis for DNM calling.

      We suggested that sampling different tissues would help to differentiate somatic mutations (present only in one tissue type) from germline mutations (which should be present in all tissues in the offspring). We have now clarified this and added a sentence on the allelic fraction to avoid clonal hematopoiesis (page 9 lines 199-203).

      Another aspect that may affect DNM calling, is clinical history of the parents and/or child. What would be the strategy for these cases?

      In the autism study, there was no overall difference in rates between individuals with autism and their unaffected siblings (Turner et. al, 2017). However, it is true that the clinical history of the parents/offspring could lead to variation in mutation rate in some cases. In non-human primates, we do not normally have any phenotypic information available and are mainly interested in the general rate and spectrum of mutations.

      How about introducing a site-specific error rate? Given the high number of trios publicly now available it would be extremely useful to compute site specific error rate per nucleic acid.

      This is a very good idea. If this would be possible for human studies, it is difficult to apply for other species (see answer bellow).

      Overall, as the authors also mentioned, DNMs calling require study- or species-specific thresholds. Therefore, I am not convinced if their suggested best practice is really applicable to all types of trio-studies.

      We agree with the reviewer that we mainly focus on a standardized way of choosing the filters and reporting the methods. These are essential for transparency until we find a gold method.

      Reviewer #3 (Public Review):

      This study is motivated by the variable germline mutation rates that are estimated from numerous genome sequencing studies of primate pedigrees. The authors argue that this variability is the result of methodological differences in both the molecular and computational strategies employed. Therefore, the authors launch the "Mutationathon" as an effort to isolate the effect of computational differences employed by different research groups that form the authorship of this manuscript. Using PCR validation, they are able to assess the specificity and sensitivity of each approach, recommend some high-level guidelines, and conclude that all future studies should provide detailed reporting of all computational details.

      Whole-genome DNA sequencing of pedigrees consisting of at least the mother, father, and at least one offspring has become the gold standard for estimating the rate of germline mutation in humans and other primates. While the estimated mutation rates are broadly consistent, they can vary by up to a fact of two. The authors make a strong argument that the primary reason for the variance in estimated rates is fundamental differences in the computational methods employed from study to study. Specifically, since germline mutations are rare, therefore most studies make substantial efforts to eliminate false positive predictions. The computational "filtering" approaches differ yielding variability in the specificity and sensitivity of each study. Furthermore, studies account take different approaches to account for the specificity and sensitivity of their approach. As a result, the final estimated germline mutation rates vary.

      The authors seek to assess the impact of differing computational filtering approaches on estimated germline mutation rates by launching the Mutationathon. The study design is to provide DNA sequencing data from a single pedigree of rhesus macaques to five different research labs, who each apply their internal "best practice" computational approaches to estimate a germline mutation rate. The authors then use PCR validation of the union of all mutation predictions to directly measure specificity and sensitivity with an orthogonal molecular strategy. Finally, they provide recommendations for future studies to thoroughly document the methodologies for reproducibility and comparability.

      A key strength of this study is the fact that the authors were able to isolate the impact of computational differences on estimated rates by providing identical sequencing data to each group. Another key strength is the fact that PCR validation of all predicted mutations was performed (or at least attempted), providing an independent assessment of errors. However, these strengths are balanced by the fact that the study was conducted on a single macaque pedigree, thus preventing an assessment of the variance in rates estimated across pedigrees for the same computational approach. Similarly, neither multiple tissues nor a multi-generational pedigree was used, thereby preventing the assessment of the degree to which tissue-specific mutations or early post-zygotic mutations masquerade as germline mutations given their observed allele ratios. Lastly, the basic conclusion is that methods are sufficiently variable that providing a gold standard approach was not possible and the paper concludes that each study should simply thoroughly detail the methods so that differences in reported rates can be better understood.

      While the motivation behind the study is clear and the detailed treatment of the variability of computational approaches is fantastic, the basic conclusions of the study largely reflect the understanding of expert researchers conducting such work. However, the efforts to document the largest computational drivers of variability germline mutation rate estimation are laudable and will likely inform future efforts in this area.

      We thank the reviewer for this summary of our study, highlighting the strength and limits.

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths: The manuscript is very thorough and convincingly homes in on key circuit elements and mechanisms that likely underpin this unexpected linearity in the On-parasol circuit.

      Weaknesses: perhaps this is just me, but I found the MS is quite "hard work" to parse. To some extent I suspect this just is what it is, given the complexity of the circuit, but perhaps the occasional explanatory statement and further streamlining of the figures might help get the key points across a little more easily. I was also a little unsure about what exactly each figure and statistical measure depicted, something that presumably is easily fixed by expanding the legends accordingly. I offer some specific suggestions in the author section.

      Thank you - this is helpful. We have revised the text throughout with this comment in mind. This included being clearer about what is plotted in each figure, both in the main text and in the figure legends. We have also expanded (and in some cases added) the final paragraphs of each section of the Results to make sure that the main points of each section are emphasized.

      Finally, the manuscript rather glances over "the middle" of the circuit, i.e. the bipolar cells, and their computations. The circuit insights gained from the study do not require spatially nonlinear computations in bipolar cells (e.g. via amacrine cells), rather, they seem to require the bipolar cells to be linear in this regard. However bipolar cells, like cones or ganglion cells, also have many sources of nonlinearities, some well understood and probably several others less well understood. Is it then not surprising that they should not play a role in the circuit as well? It would be good to see some discussion/acknowledgement of this topic.

      We have revised the text to emphasize the importance of nonlinearities in the bipolar cells, starting with our description of Figure 1A. We previously discussed this issue somewhat abstractly - referring to “rectifying nonlinearities.” We have now clarified that those nonlinearities, which are essential for nonlinear spatial integration, largely originate at the bipolar output synapse. These changes include clarifying that what we describe in this paper is a previously unappreciated mechanism, initiated from adaptation in the cones, that controls the impact of the bipolar nonlinearities on RGC responses. We find this interplay between different nonlinear components very interesting (and have emphasized that now in the Discussion). This comment helped us describe that interaction in much more concrete terms. See page 11 (next to last paragraph), page 14 (next to last paragraph), and page 17 (last paragraph).

      Reviewer #3 (Public Review):

      This manuscript reports an interesting result regarding retinal ganglion cells in primates, although the presentation of the main findings was slightly confusing. There is a classic distinction between ganglion cells that integrate linearly vs. nonlinearly over space. The primary test has been the presentation of a high spatial frequency contrast-reversing grating, which generates no response in a linear cell - because the responses to bright and dark bars cancel each other. A nonlinear cell would instead respond at twice the frequency of reversal, because (e.g., for an ON cell) the increased response to the bright bars on each phase of the grating cannot be canceled by the decreased response to the dark bars. The explanation has been a nonlinearity at the output of bipolar cells, such that increases in glutamate release to the preferred contrast cannot be canceled by decreases in release to the non-preferred contrast, either because the release rate at baseline is very low (i.e., rectified) or else there is some difference in the dynamics to increases vs. decreases in release even when the output is not strongly rectified. This latter idea is illustrated in Borghuis et al. (2013; Fig. 7) - which the authors might consider citing.

      Thank you. We have added a citation to that paper (along with the citations to two earlier papers from Jon Demb that implicate bipolar cells (page 2, top; page 11, second to last paragraph; page 17, last paragraph). We have also emphasized throughout the paper the important role that the nonlinearity at the bipolar cell output synapse plays in nonlinear spatial integration. This is highlighted in Figure 1A, which we now refer back to several times. We previously referred to the key nonlinearity abstractly as a “rectifying nonlinearity” and now more directly indicate the role of the bipolar synapse. See page 11 (next to last paragraph), page 14 (next to last paragraph), and page 17 (last paragraph).

      The authors report that On parasol cells in primate show the nonlinear behavior to a high contrast grating. Although - and this an important detail - there is no such response at the first half-cycle of the grating (Fig. 5C). Likewise, there is no response to a grating flashed briefly from a gray background (Fig. 1C). There could be an advantage to presenting these results together at the front of the paper - indeed they seem to show results from the same cell (at least, Fig. 1C and 5A spike traces look identical).

      Thank you - this is a helpful suggestion. We have moved full responses to contrast-reversing gratings into Figure 1, and revised the text considerably to point out the differences in spatial integration at the onset of the grating vs during later grating cycles (gray boxes in Figure 1B; see page 2, bottom and page 3, second paragraph). This included modifying the text to emphasize that our results highlight that spatial integration depends (surprisingly) on stimulus time course. Natural images fit into this stimulus dependence, but other stimuli also elicit near-linear spatial integration. Throughout the paper, staarting in the abstract, we now emphasize that spatial integration is stimulus dependent. We hope that this shifts the focus away from specific stimuli towards the more general issue of how this stimulus dependence originates.

      The main section of the paper shows that natural images presented either in sequence or flashed briefly from a gray background lack nonlinear response behavior - that is, a linear receptive field seems to be encoding the image, because structure in the image can be replaced by the mean luminance across the receptive field center. This makes natural images and gratings seem like they bring out truly unique behaviors in the ganglion cell. But what could be so special about a grating?

      It seems, simply, that the natural image dataset did not match the high contrast grating in the detailed properties of the stimulus, and the proposed nonlinearity in the cone is also obviously important. For example, when the grating is first presented in the contrast-reversing stimulus sequence (i.e, the first half-cycle) or the grating is simply flashed briefly, the On parasol cell is excited by the presynaptic On bipolar cells that see a gray-to-white transition. But the release is relatively weak, because the cones are partly adapted to the gray background. The inhibition driven by 'crossover' circuitry (i.e., driven by the Off pathway) is relatively strong at the gray-to-black transition, and the responses cancel.

      The lack of response to spatial structure cannot be explained by the range of contrasts alone. To help make that point, we now show the contrast dependence of the grating responses in Figure 1. The image patches in our dataset often had contrasts exceeding 0.5. Similarly, the onset of a high contrast grating elicited little or no response in On parasol cells (Figures 1B and C). We have emphasized these points in the text on page 4, and we have added a few sentences about the image statistics to the methods (page 20, second to last paragraph). Instead, our analysis indicates that it is the periodicity of the contrast-reversing grating that is important - and specifically that this periodicity leads to periodic changes in the gain of the cone response. We emphasize this in the revised paper in several places (see page 10, next to last paragraph; page 13, first and second paragraph; page 18, second paragraph).

      What is less obvious is that for subsequent cycles of the grating, the On bipolar cells that see black-to-white transitions are now releasing very strongly (because the cone response is relatively strong coming from an un-adapted state), which can no longer be canceled by the Off-pathway inhibition. The grating is the optimal stimulus to reveal this nonlinearity because there are simultaneous full-contrast changes in luminance, in opposite directions, at different points across the image. There is no reason why a similar sequence in a natural movie would not drive a similarly strong nonlinear response. It just happens that the natural images and movies (i.e., image sequences) used here apparently never contain these patterns of light changes across the receptive field center.

      Indeed, we do not mean to imply that there is anything special about natural images, and we have revised the text so as not to draw a specific distinction between natural stimuli and others. This includes emphasizing that the periodicity of contrast-reversing gratings appears central to the strong responses that they elicit. This periodicity links the gain of the cone responses to the stimulus that the cones encounter: cones encountering an increase in light level have a high signaling gain because they start from a low light level, and conversely cones encountering a decrease in light level have a low signaling gain. This periodic change in intensity is not commonly encountered in natural stimuli (though we agree that natural stimuli exhibiting such a periodic change would likely elicit nonlinear spatial integration).

      So there is a fundamental feature of the circuitry that determines the degree of nonlinear spatial summation in On parasol cells (in an interesting way that differs from Off parasol cells). My main recommendation is that the authors present the findings with gratings in a more compact fashion rather than alternating between gratings and natural images. For example, the results in FIg. 1B and C are puzzling until one learns of the result in Fig. 5A and realizes that Fig. 1B is showing the average across many cycles, which does not capture the unusual response in the first half cycle that is completely consistent with the result in Fig. 1C.

      Good suggestion. We have included a more complete summary of the grating responses in Figure 1 - including removing the cycle average responses to contrast-reversing gratings in favor of traces that show the responses to the grating onset as well as responses to later grating cycles (Figure 1B). We have emphasized these features of the grating response in the text as well.

      There is also a clear difference in the contrast of the grating vs. the contrasts within the natural scenes (which are more difficult to define, perhaps). It could be interesting to explore the responses to lower contrast gratings to see if a single model can explain both grating and natural image responses over a range of contrasts in a satisfying way.

      See response above. We now include summary data showing F2 responses across a range of grating contrasts to demonstrate that high contrast values are not necessary to elicit nonlinear spatial integration in either On or Off cells. We note this in the text on page 4, first paragraph. We have also added a sentence about the contrasts of the natural image patches used here to the Methods (see page 20, next to last paragraph).

    1. Author Response

      Reviewer #1 (Public Review):

      Wang et al. use a multistage mathematical model to analyze incidence of advanced colorectal adenomas and to investigate if the protective effect of aspirin on adenoma incidence could be due to its effect on cellular fitness. The advanced adenoma incidence model is similar to the recently published Paterson et al. PNAS 2020 model, in that it includes the first three of the five steps to colorectal cancer from the Paterson et al. model. Interestingly, Wang et al. find that adding crypt competition to the previous model is needed to account for the observed advanced adenoma incidence curve. This is a nice contribution to the study of colorectal tumor incidence. The authors are also able to confirm previous findings that the order of mutations of the way to an advanced colorectal adenoma is determined by the crypt fission rates, and not the mutation rates.

      In the second part of the paper, Wang et al. use the advanced adenoma incidence model to study the effects of aspirin on reduction of adenoma incidence. This part of the paper would benefit from more precise explanation of the assumptions used, especially how exactly is the effect of aspirin implemented at the level of individual crypts and/or crypt cells. Adding these explanations to the main text would significantly increase the clarity of the second half of the manuscript.

      We agree with this comment and have now revised the main text significantly to describe in more detail how aspirin intervention was implemented in the model at the level of individual crypts (intra-crypt dynamics) and crypt fission/turnover (inter-crypt dynamics). See heavily revised text on pages 13-15 of the main text (and also the new Appendix 1- Section 6 for furrher detals), as well as a new Fig 4 of the main text for the model structure. The main text now also includes Table 1 showing aspirin dosage in mice, the human equivalent, and the resulting fold differences in the division and death rates of cells that were previously measured by us and implemented in this study. Further, model assumptions and parameter definitions were explained in a more consistent way in Appendix 1 – Section 2.

      Reviewer #2 (Public Review):

      This mathematical model of the development of advanced colorectal adenoma (polyps) stands out in terms of its biological realism and inclusion of quantitative information related to stem cell turnover and crypt biology. Early detection and prevention of colorectal cancer, be it via endoscopic screening, stool DNA tests, or use of COX2 inhibitors (Aspirin, Sulindac etc), continues to be an important public health goal, especially in view of stressed health care services under COVID and a rising incidence of CRC among younger individuals in the US. The findings are interesting and motivate further experiments to generate data that inform the model and predictions for effective chemoprevention of colorectal cancer using Aspirin.

      The authors recognize that only about ~40% of non-hypermutated CRC are KRAS+. Although KRAS onco-activation may well occur after APC-/- crypts have begun to proliferate (after a 3-6 transition in the model) forming an advanced adenoma, all non-hypermutated CRCs should be carrying detectable KRAS mutations - according to this model. However, Brenner et al did not ascertain KRAS status of their adenomas and cancers. It appears that one way of finessing this problem is to assume that the prevalences of advanced adenomas used to fit this multipathway model can be represented effectively by fractions of those observed clinically. Did the authors consider/test such an adjustment? If not, why not?

      Thank you for this comment, these important issues indeed need clarification. Instead of KRAS+, other mutations can be present in advanced adenomas, for example BRAF. In terms of the model, the identity of the mutations is not the defining issue, but the types of mutations are. One of the mutations is assumed to be the inactivation of a tumor suppressor gene (i.e. loss-offunction, most likely APC-/-), while the other mutation is assumed to be gain of function (which can be KRAS+ or alternative gain of function mutations). As long as these assumptions are fulfilled, the predicted dynamics would remain the same. It is unlikely that adenomas are characterized by the presence of two inactivated tumor suppressor genes, which would significantly reduce the rate of their emergence in the model. Further tumor suppressor gene inactivations tend to occur at more advanced stages of the disease. We have now explained this in the revised version of the manuscript by adding the following:

      “While the model assumptions about the pathways to adenoma formation are clearly defined in our model, it is important to point out that there are uncertainties in those assumptions, and that there is heterogeneity in the types of mutations that can lead to colorectal carcinogenesis. For example, it has been reported that among non-hypermutated colorectal tumors, KRAS was mutated in only about 43% of patient samples [39], indicating the importance of a variety of evolutionary pathways. Our model, however, does not depend on the identity of particular mutations, but assumes the occurrence of mutation types, which are the inactivation of a tumor suppressor gene (which is a loss-of-function mutation, e.g. APC-/-), and a gain-of-function mutation, which can be in KRAS or an alternative gene, such as BRAF [40-43]. Our model predictions hold as long as the evolutionary pathway to advanced adenomas involves these two types of mutational events, regardless of their identity.”

      To model multiple pathways efficiently the authors resort to a deterministic crypt proliferation model, forgoing a stochastic treatment. Thus, in the deterministic model, there will always be a non-zero fraction of type 6 crypts at any given time > 0. The paper does not reveal how large type 3 (APC-/-, no KRAS) crypt clones are on average when an advanced adenoma (type 6 crypt) occurs in the colon. A discussion of how exactly the advanced adenoma is defined (1 type 6 crypt or more?) and what observation thresholds were applied to model the endoscopic observations by Brenner et al, would be greatly helpful. Typically, in clinical practice, an advanced adenoma is > 1cm in caliper size, or shows signs of dysplasia.

      We thank the referee for these comments, as they allowed further refinements of the model that we found very valuable. The model that is presented in Appendix 1 (system (19-23), see also the equations in the main text) is a model that describes the expected behavior of the system. We used the mean-field approximation for the probability P(t) that at least one crypt of type 6 is present by time t. Since this is an approximation, in the new version of the paper (see new section 5.1 of Appendix 1) we showed that it works very well by checking it against fully stochastic Gillespie simulations. The ODE model (which is computationally very inexpensive) was then used to fit the advanced adenoma incidence curve.

      In the new version of the manuscript we have also taken fuller advantage of Gillespie simulations: once a set of the best fitting parameters was determined, we studied the effect of aspirin in reducing the adenoma risk by using stochastic simulations. To this end, we have also provided more details of the model behavior. The new figures 12 and 13 of Appendix 1 show the probability distributions of the populations of the crypts of different sizes at the time when the first crypt of type 6 is created.

      Finally, in the new version of the model we included a growth phase of type 6 crypts, which is outlined in the model description in the Materials and Methods section of the main text, and also discussed throughout the main text. This is further described in detail in the new Section 5.2 of Appendix 1, as well as the new figure 7 (Appendix 1) that introduces this aspect of modeling, and the new figures 8-18 of Appendix 1, which all include the expansion phase of type 6 crypts. The simulations presented in the main text now also include the expansion phase. Because of uncertainties existing in the literature in terms of the exact numbers of crypts at detection, we explored a large range of threshold sizes to which the population of type 6 crypts expands. It is very encouraging that the effects of aspirin are very consistent throughout this range.

      We would like to note that initially we did not realize the extent of the difference the expansion phase could make for the numerical values of the fitted parameters. After implementing this part of the model however we became convinced that this part of the dynamics which often remains left out of the model (e.g. in [32]) has to be included explicitly for better accuracy of predictions. All of the results in the main text are now updated to include the expansion phase of the modified type 6 crypts.

      Related to the above, the assumption of zero crypt death/fusion appears unrealistic given the findings by Baker A-M, et al. Gut 2019;0:1-8. Furthermore, given the small number of SCs in human colonic crypts, the sporadic loss of crypts cannot be zero and normal crypt fissions are clearly compensated for by crypt loss (or fusion?) in normal colon. Of further note, a recent modeling paper by Birtwell et al in (Evolutionary Applications 13, 1771-1783 (2020) also argues for crypt turnover as an important aspect of the metapopulation and stem cell dynamics in intestinal crypts and other tissue structures in multicellular bodies.

      Spontaneous crypt loss is part of the model that is developed here. It is incorporated in the crypt death rate, δ, in the system of equations presented in the main text (as well as system (19-23) in Appendix 1). All the simulations in the main text (e.g. Figs 2-4) correspond to a nonzero death rate. We have also researched the effect of crypt death rate, by performing simulations with both zero and non-zero values of d (see Appendix 1- Figure 6). Including a nonzero death rate was not found to result in qualitatively different conclusions when fitting the model to the advanced adenoma incidence data. Since the new version of the model also takes into account the expansion phase of type 6 crypts, a nonzero crypt death rate associated with type 6 crypts is also included. It plays a role in the stochastic model as sometimes, a newly generated crypt may spontaneously disappear. It is also assumed to be affected by aspirin, when we consider the inter-crypt dynamics.

      In the new version of the manuscript we added a discussion of these points including the published research that the referee pointed out (thank you!).

      Finally, it is not clear whether the model recapitulated the relatively short Aspirin exposures used in trials. In other words, were the equations solved for the situation encountered in trials where the interventions (Aspirin) only last for a few years (typically < 10) unless long-term users are included?

      We thank the referee for bringing up this aspect of the problem. In the new version of the manuscript we have included a more detailed study of the timing of aspirin administration (see e.g. the schematic in Fig4(b)). The relatively strong effect of aspirin on the advanced adenoma risk (see e.g. fig.4 of the main text) was obtained when the patients were assumed to receive aspirin for a 10-year period prior to the risk assessment. Three different doses were tested, which corresponded to the three doses used in our previous experiments, for which the kinetic parameters were measured. These mouse doses were converted to the human equivalent, please see the new Table 1 of the main text and the surrounding explanations (as well as new section 6 of Appendix 1), where the conversion is explained and the factors modifying cell division and death rates are calculated. The highest dose that was used for the modeling was the strongest dose in our previous experiments [19,20], which roughly translates to the second strongest dose (6-14 pills a week) in the study by [14]. The effect of aspirin decreases with a decrease in the dose and the duration of aspirin treatment. In particular, the role of the duration is examined in Appendix 1 – Figure 18. A number of different comparisons are now presented in the paper (see new Figures 15-18 on Appendix 1) that investigate different assumptions of aspirin modeling, as well as the age at which the drug was taken, the follow-up time, etc.

      Reviewer #3 (Public Review):

      The work by Wang et al. demonstrates a computational approach to modeling population-level tumor incidence using a crypt-level algorithm to predict modified incidence based on differential effects on cellular dynamics. While the manuscript makes ultimate conclusions about the impact of aspirin chemoprevention and how this can be parametrized for the model, the work demonstrates a potential in silico approach towards testing putative preventive agents as well as potential factors that may accelerate tumorigenesis by accounting for epithelial cell growth dynamics under different mutation conditions. The base model operates on many relatively basic and broad-sweeping assumptions about colorectal tumorigenesis that will need to be considered for individual downstream applications.

      An obvious limitation of chemoprevention studies in humans is the long timescale necessary to perform such experiments/trials and the relatively large subject numbers required to measure incremental effect sizes on relatively low incidence outcomes. Similarly, the availability of appropriate and broadly translatable in vitro (e.g. cell lines that recapitulate precancers or even 'normal' tissue within which to model protective vs. 'anti-cancer' effects) and preclinical models is limited. The strength of this approach towards being able to computationally model crypt dynamics using our best understanding of intestinal cellular proliferation is an important step towards a tool for identification and testing of putative chemoprevention agents which may assist basic and translational researchers as they consider interventions towards intercepting colorectal cancer. The authors specifically understand the constraints and limitations of their modeling approach and transparently discuss the assumptions that feed into the model. Of course, while the manuscript penultimately focuses on the effects of aspirin, the real strength is the identification of a computational model that may similarly behave to biologically relevant crypt dynamics that is able to not only consider different inputs for accelerating effects, but can also model different effects caused by preventive measures and begin to disentangle the underlying biology that is mathematically likely to be occurring in vivo. These assumptions however limit the generalizability of the individual findings as they pertain to our broader understanding of tumorigenesis and aspirin chemoprevention. As an oversimplification of this critique, the authors focus on APC/KRAS mutations, which to be fair are the most common CRC/adenoma mutations, but may not be critical to understanding aspirin's chemopreventive mechanisms, where APC/KRAS mutations have not been shown to predict the variable protective response to aspirin observed in humans. Similarly, the effect estimates on cell dynamics are derived from in vitro and preclinical work using aspirin that use high, albeit appropriately acknowledged as 'physiologically attainable', doses that do not quite represent circulating doses likely achieved through regular use of low-dose aspirin. Nonetheless, using aspirin as a 'model preventative exposure' is suitable in this setting as a way to demonstrate the adenoma models' responsiveness to this type of parameter. Appropriately, the authors do not overinterpret their findings in light of these assumptions, but readers should be similarly cautious to avoid overinterpreting the conclusions.

      Thank you very much for the positive assessment and the constructive input, which we have now incorporated into the revised version of the manuscript. As we point out in the revised text, the predicted dynamics and the effect of aspirin do not depend on the particular identity of the mutations involved. We do frame the model of advanced adenoma evolution in terms of APC-/- and KRAS+ mutations, as these are often documented in the literature. Alternative mutations, however, can characterize this process, as shown by tumors that lack KRAS+ mutations and carry alternative mutations, e.g. in BRAF. The revised manuscript now points out that instead of specific identities of mutants, the model behavior depends on “types” of mutations. Hence, the model assumes that the evolution to advanced adenoma involves the inactivation of a tumor suppressor gene (e.g. APC-/-) and the activation of an oncogene, which can be KRAS or an alternative. Aspirin is assumed to change the kinetics of these evolutionary processes, and our predictions about incidence hold as long as the assumed evolutionary process involves the inactivation of a tumor suppressor gene and the activation of an oncogene.

      We now also discuss in more detail the relationship between the in vitro and in vivo data that were collected in our experiments and the processes that occur in the colorectal tissue in humans that develop disease. We focus this discussion around the hierarchical organization in the colorectal tissue, the cell of origins of colorectal carcinogenesis, the dynamics of crypt fission and crypt death, and also discuss in more detail the aspirin dosing used in our experiments in relation to physiological doses in humans. We draw attention to limitations of our work, including alternative processes that can determine the response to aspirin in the colorectal tissue that have not been included in the model (due to lack of quantitative information), such as the effect of the microbiome composition. Further details are given below.

    1. Author Response:

      Reviewer #2:

      In this study, Russell et al. combined T cell receptor (TCR) repertoire sequencing data with SNP array genotype data to infer genetic polymorphisms which impact upon the process of TCR generation. Using these data, the authors looked for loci with polymorphisms which associate with different V(D)J recombination probabilities, i.e. sites in the genome which impact upon the chances of TCRs with different properties being produced when they change.

      Beyond the expected sites in the TCR and MHC loci, the authors observed strong associations with distant sites. One was with DCLRE1C, which encodes Artemis, the endonuclease responsible for cutting the TCR loci during recombination, while the second was DNTT, the site encoding the enzyme Tdt, which is responsible for addition of nucleotides to cut V(D)J during recombination. This is the first time that such SNP associations have been described to my knowledge, and yet make perfect sense: DCLRE1C variations were associated with the amount of trimming V and J genes underwent during recombination, while DNTT polymorphisms associated with the number of inserted nucleotides. The authors also report, after assigning donors an associated ancestry based on clustering of their genotype data, that certain inferred ancestries associate with different TCR repertoire properties. In this analysis 'Asian-associated' TCR repertoires had fewer non-templated nucleotide insertions, along with a corresponding greater incidence of the DNTT polymorphisms associated with differences in insertions, relative to other groups.

      Strengths:

      This manuscript is exceedingly well written. Both the TCR biology and the statistical considerations of the genetic analyses are extremely complex topics, mired in arcane terminology, which often end up somewhat impenetrable to non-expert readers. However both have been introduced and explained with admirable clarity throughout, including the caveats and implications of analyses that would not be intuitive to many readers not already expert in both fields.

      As best I can determine, the analyses themselves are also extremely rigorous, with each step carefully taken and justified in the text, involving numerous corrections at multiple scales (e.g. for TCR productivity, TCR gene usage, specific TRDB2 genotype, population substructure, and more). The major findings have also been validated in a completely separate cohort, using a different analysis pipeline. While the authors point out that such genome-wide association efforts looking at TCR gene expression have been undertaken before, the major innovation presented here lies in applying those data to investigating specific V(D)J recombination probabilities. Thus the findings are novel, and the conclusions well supported by the data.

      The data visualisation have all been plotted in a sensible and easily interpretable manner. The majority of data themselves are all already publicly available, having been published in prior studies. The TCRseq data for the validation code has been assigned a BioProject accession, which I presume will go live at the time of publication. The code is also appropriately hosted on Github, and are mostly adequately commented and documented enough so as to be repeatable.

      Thank you!

      Limitations:

      There are very few if any obvious technical limitations or weaknesses that I can see that are not intrinsic to the data themselves. While the authors do mention these limitations, I wonder if they should be devoted some more attention somewhere in the text of the manuscript; relatively few researchers are expert in both TCR biology and the technicalities of genome-wide association studies, so I think more explicit consideration of these issues would be helpful.

      We have expanded the section containing limitations of our approach within the discussion section. We hope this addition clarifies the intrinsic limitations of the data used here.

      In particular, I think the difficulty of studying these loci with standard techniques could be underlined, along with what implications that might have for this study. The highly repetitive nature of the TCR loci can certainly make any analysis looking at short sequences problematic, which has implications for both the TCRseq and genotyping aspects of this study. Combined with the fact that most studies focus on certain populations, polymorphisms in the TCR loci are very likely being relatively undersampled by the field (a hypothesis supported by the ongoing discovery of novel exonic polymorphisms in TCRseq data itself, e.g. as demonstrated in this pre-print by Omer et al. (https://doi.org/10.1101/2021.05.17.444409). The consequences of SNP polymorphism coverage in SNP arrays has already been considered for IgH (https://doi.org/10.1038/gene.2012.12): while this is an admittedly more polymorphic locus, the underlying causes of these issues are mostly all true of the TCR loci as well. Similarly, while the authors do appropriately point out that issues with V(D)J gene assignment could infer biases it may be worth noting that the TCRseq technology used to produce their main dataset uses relatively short read sequencing, that is unable to distinguish a substantial fraction of even known TCR gene- and allele-level diversity (see Fig. 1C of the Omer et al. pre-print). Thus there may be a whole dimension of TCR polymorphism that is not well captured by either platform.

      This is a great suggestion and we have added a section within the discussion to mention these limitations and their implications for both the SNP array and TCR repertoire sequencing data used here.

      Overall, I think this is an extremely considered and digestible study, which will be of great interest across and beyond the field. As the wider community comes to grips with how best to incorporate TCR and BCR polymorphisms into their analyses of the adaptive immune loci themselves (and how this might impact upon recombination, expression, and downstream immune functions) this serves as a timely reminder that we should not forget the polymorphisms elsewhere in the genome that might also be relevant.

      Thank you!

      Reviewer #3:

      In this manuscript, Russel et al propose an inference method to link genetic variations with TCR repertoire feature variations, based on observations from previous studies showing similarities at various level of the repertoire in monozygotic twins. To that end, they used a unique publically available dataset, which combines TCRb immunosequencing data as well as whole genome SNPs data. The method is elegant and sheds light on the importance of combining different type of data to better understand the complexity of TCR repertoire generation and selection. However, unfortunately, while their discovery data set provide some associations between SNPs and TCR repertoire features, they were almost unable to recapitulate the results with their validation dataset. The main reasons could be that the donor demographics are highly divergent between the two cohorts (81% Caucasian in the discovery vs. mainly Hispanic in the validation), the immunosequencing data were generated using RNA based method for the validation while the discovery dataset was obtained from gDNA templates and finally the SNPs array were discordant between the two datasets. Nonetheless, the approach and the study deserve attention and might be improved by additional experiments or analyses and by providing additional information.

      Thank you for your review. We would like to emphasize that the validation results reported here are as good as one might expect given the small sample size of the validation cohort (94 individuals) and the discordance between the discovery and validation SNP sets. The overlap between the discovery cohort and the validation cohort SNP sets consisted of just two significant SNPs, one within the gene encoding the Artemis protein (DCLRE1C) and the other within the gene encoding the TdT protein (DNTT). This DCLRE1C SNP (rs12768894, c.728A>G) was strongly associated with the extent of V-gene and J-gene trimming in the discovery cohort, and we were able to successfully validate this finding within the validation cohort. Specifically, this DCLRE1C SNP was significantly associated with the extent of J-gene trimming in productive TCRalpha and TCRbeta chains and V-gene trimming of both productive and non-productive TCRalpha and TCRbeta chains within the validation cohort. The overlapping SNP within the DNTT locus (rs3762093) was only weakly associated with the extent of N-insertion within the discovery cohort, and as such, it was not surprising that this SNP only reached statistical significance for one of the N-insertion types (productive TCRalpha rearrangements; note that due to the lack of the D gene, N-insertion annotations are likely less noisy on the TCRalpha locus). Despite our inability to replicate all N-insertion associations, we noted that the model coefficients for rs3762093 genotype were in the same direction (i.e., the minor allele was associated with fewer N-insertions) for all N-insertion and productivity types within the TCRbeta chains for both cohorts.

    1. Author Response:

      Reviewer #1:

      The weaknesses of the manuscript include:

      (1) Among the 108 binders selected, the authors further limited the 108 binders to 33 after bioinformatics analysis among which 14 are previously known to interact or co-localize with an Ena/VASP protein. However, whether the new ones really bind to ENAH is not tested.

      After reanalysis of the literature, we realized that NHSL1 has not previously been validated to bind to Ena/VASP proteins, although it shares many common binding proteins. This has been updated in the text, Figure 1E, and Supplementary File 1. Thus, we validated seven hits previously not reported to bind to ENAH using biolayer interferometry assays for which the data are included in Supplementary File 2, and three by co-IP in mammalian cells.

      (2) The structure analysis for EVH1 of ENAH is nice, but inference to specificity was not tested by mutagenesis.

      Paralog specificity, although very interesting to us, is a peripheral point in this particular paper and we have not tested our hypotheses about ENAH vs. VASP or EVL specificity by mutational analysis. We now point this out in the text, where we report our observations and include a discussion of possible mechanisms but do not claim that we have established the origins of specificity between these paralogs. Our most interesting finding about paralog specificity of hits from this screen is elaborated in a separate manuscript that addresses the highly selective binder PCARE; this is now published.

      (3) Although the MassTitr method works nicely, it is not easily apparent how much new insights the study has provided from the MassTitr screening. Perhaps the additional proline is one new insight. The double FP4 motif seems to be already known in the literature.

      The past decade has seen a rise in screening approaches to identify consensus short linear motifs for important signaling proteins. However, our study is the first to comprehensively map local and distal sequence elements surrounding a SLiM. MassTitr was useful for this purpose because it enabled us to screen longer peptides and identify high-affinity binders from the screening output. This allowed us to interrogate how context effects modulate affinity to a given SLiM-binding domain. In this Short Report we highlight how our method makes the role of context readily apparent in the screening results. This is a general approach. In this work, it provided multiple hypotheses, several of which we followed up using biophysical studies.

      The double FP4 motif was previously uncovered in detailed NMR studies of the interaction of zyxin with VASP (Acevedo et al., 2017). Here we show that multiple human proteins that bind to ENAH EVH1 domain contain such dual motifs, that there appears to be a preferential spacing between the motifs (Figure 3A), and that this pattern confers a binding advantage for ENAH, where the existence of a back- side side could be postulated based on study of VASP but has not been previously tested. We further illuminate a preference for C-terminal proline residues, and for flanking negative charge, and we found a protein (PCARE) that uses flanking sequence to achieve unprecedented affinity for ENAH.

      Our screening also uncovered previously unknown binders of ENAH, several of which we validated biochemically and in cells. Even among previously characterized binders pulled from our screen, the exact Ena/VASP binding site was previously unknown. Our MassTitr screen identified novel ENAH EVH1 domain binding sites for 23 previously characterized and unknown binders of the Ena/VASP family. Many of these binders link Ena/VASP proteins to new and emerging biology, such as highlighting an underappreciated role for ENAH in cilia. Furthermore, although this is only a Short Report, we provide a thermodynamic and mutational analysis of dual-motif peptides, and we provide a crystal structure that illuminates the origin of the preference for flanking proline residues.

      Our paper is an example of how a proteome-wide screen of long peptides can provide hints, in the motif contexts of the hits, that enable biophysical dissection of binding determinants for domains involved in important regulatory processes.

      We have revised the abstract and provided additional comments throughout the text to highlight the logic and outcomes of this screen.

      Reviewer #2:

      Weaknesses:

      The claim that the reported studies show "that proteins with two EVH1-binding SLiMs can wrap around a single domain" is overstated. No structural data is presented to show that dual ligands "wrap around a single domain". It is a logical possibility that is consistent with the data presented, and should be described as such.

      We agree with this and we have adjusted the language in the abstract. Our other mentions of this model throughout the paper were more circumspect, emphasizing the consistency of our data with this model. The Discussion contains an expanded discussion of our observations and their possible interpretations.

      To explain why lengthening the linker between two binding motifs in a dual ligand weakens binding by only 2-fold, the authors suggest that the linker may interact favorably with the non canonical site. A likely alternative explanation is the effect of linker length on the effective concentration, Ceff, which reflects the concentration of a second binding motif after the first motif is bound.

      We expect that the effective concentration is an important factor contributing to the affinity of dual-motif peptides, as suggested by the reviewer. Our proposal that the linker may interact with the domain was based on our observation that the peptides that were truncated to include only one motif (i.e., NHSL1 FP4 1, NHSL1 FP4 2, LPP FP4 1, and LPP FP4 2) each showed at least 2-fold weaker binding to ENAH EVH1 R47A (which disrupts the back-side site) than to the WT EVH1 domain, suggesting that linker residues themselves and not just motif residues may be able to engage this site. But we agree that the linker length could also modulate affinity by changing the effective concentration for a multivalent interaction. We have elaborated our discussion of possible models to address this point and point #1 above. We suspect that there may be multiple modes in which peptides engage EVH1, influenced by the flanking sequence of the motif(s), and that these may change as truncations are made. A detailed NMR analysis would be required to provide a higher resolution model and is beyond the scope of this study.

      Based on the materials and methods, the authors appear to have employed an "ENAH tetramer construct" (human EVH1 fused to ENAH mouse coiled coil) for all of the reported experimental interaction studies. However, the ITC methods describe sample preparation with "ENAH EVH1 domain". The monomer versus tetramer has important implications for interpretation of the ITC data, since a tetramer introduces potential inter-domain binding by dual ligands. Clarification is needed to be able to evaluate fully the implications of the data.

      Thank you for bringing to our attention this omission in the details of our description. The ITC data were collected using a monomeric EVH1 domain construct, to avoid the problems highlighted by the reviewer. We have updated the manuscript to clarify when a tetrameric vs. monomeric ENAH EVH1 domain was used for each experiment.

      Reviewer #3:

      Weaknesses:

      In several cases, comparisons are made between KD values that are relatively similar (e.g., 3-fold or less). Additional analysis is required to demonstrate that these differences fall outside the range of experimental error. In addition, the Methods appear to be missing description of the BLI assay even though it is referred to in the text of the Results.

      We now present statistical tests for quantitative comparisons. We refer to the exact biolayer interferometry protocol that we used in our recently published eLife Research Article, Hwang et al. 2021 for full experimental details, but we have also now included a shorter description of our protocol in the methods section.

      The description of the selection and screening procedure in the Results is vague and lacks critical details.

      Details have been added to the methods section, particularly regarding the binning procedure (as requested). This is also supported by Figure 1 – figure supplement 4.

    1. Author Response:

      Reviewer #1:

      The paper demonstrates the role of Pou domains for various sensory cells. Using CRISPR to delete the gene, the authors show an incomplete deletion of sensory cells. Further evidence shows problems with the formation of mechanosensory cells. Overall, the presentation is clear but can be expanded by adding the role of bHLH genes (Atoh1 is upstream of Pou4f3). If possible, I suggest expanding the role of TMC as it is the main receptor in mammalian hair cells that connects to the stereocilia. Please note that the cnidarian organization is a central kinocilium surrounded by microvilli, comparable to choanoflagellates. This paper is a great original presentation but it could provide a broader perspective by expanding on the evolution of Pou IV and by adding a discussion of the evolution of bHLH, Myc and TMC in order to provide this broader perspective.

      We thank the reviewer for the suggestions to include and expand the discussion about bHLH (and other) factors that may have roles in cnidarian hair cell development. We agree that identification of upstream factors to POU-IV is of particular importance, and that bHLH genes related to the Atonal family are reasonable candidates based on comparative data from bilaterians. In the revised version of the manuscript, we have therefore added a discussion about the evolution of bHLH factors, and the potential role of atonal-like bHLH genes in controlling POU-IV expression in the context of cnidarian hair cell development. We have also emphasized the relevance of such studies to further illuminating the evolution of animal mechanoreceptor development and its gene regulatory mechanisms.

      Reviewer #2:

      Whereas the role of POU-IV for the differentiation of cnidocytes and other neurons of Nematostella has been previously characterized (Tourniere et al., 2020), the present study extends previous reports by specifically addressing the role of POU-IV for the so-called "hair cells" of Nematostella (not to be confused with the hair cells of the vertebrate inner ear and lateral line). These presumably mechanosensory hair cells are identified here as postmitotic neurons, which are ciliated and carry a collar of stereovilli - actin-filled microvilli with a long actin-rich rootlet. Using CRISPR/Cas9 based gene editing, the study shows that transgenic animals, in which the POU-IV gene has been disrupted, become touch insensitive. While hair cells can still be identified in these POU-IV mutants, they lack the stereovillar rootlets suggesting that POU-IV is required for proper hair cells maturation, but dispensable for early steps of hair cell specification and differentiation. The study then uses ChIP-Seq to identify direct target genes of POU-IV in Nematostella and to characterize a POU-IV binding motif, which turned to be evolutionarly highly conserved with POU-IV binding motifs in bilaterians. Comparison of the ChIP-Seq data with published bulk and single-cell transcriptome data indicated that POU-IV activates substantially different sets of effector genes (but no regulatory genes) in hair cells and cnidocytes, and identified polycystin1 as a hair cell-specific direct target of POU-IV. Taken together, this suggests that POU-IV had an evolutionary ancient role as a terminal selector gene for mechanosensory neurons, which predated the split between cnidarian and bilaterian lineages but that its function diverged (e.g. by the acquisition of new target genes) during the evolution of cnidocytes as a novel cell type in cnidarians.

      Combining gene editing with sequencing and with careful morphological and behavioral characterisation of cellular phenotypes, the study provides valuable new insights into the evolution of sensory neurons. POU-IV class transcription factors have previously been implicated in the specification of mechano- and chemosensory neurons in bilaterians. The present study together with the previous study of Tourniere et al. (2020) now suggests an even deeper evolutionary origin of this cell type in the last common ancestor of eumetazoans. The paper is very well written and the results are beautifully documented. The authors are overall cautious and conservative in the conclusions drawn from their findings. However, two points deserve a more critical discussion, first, the question of which sensory modality is mediated by the hair cells (are these dedicated mechanoreceptors or possibly multimodal cells?), and second, the question whether POU-IV serves as transcriptional activator or repressor in cnidocytes.

      We thank the reviewer for an excellent summary of our work and for the questions. In response to the first question, we do not rule out the possibility that hair cells could be multimodal sensory cells, and have added a discussion that raises this possibility in the revision. In response to the second question, POU-IV should be regarded as a leaky repressor in cnidocytes. We have modified the language to clarify this point in the revised version of the manuscript.

      Reviewer #3:

      In this manuscript, Ozmet et al. investigated the developmental genetics of mechanoreceptor cells (hair cells) in the cnidarian model N. vectensis. They used CRISPR-Cas9-mediated mutagenesis to showed that POU-IV homeodomain transcription factor regulates the differentiation of hair cells in this organism. The authors applied behavior assay, EM observations, and various types of fluorescence labeling to show that pou-iv -/- polyps exhibit defects in touch-sensitive behavior, likely due to the failure of forming the complete stereocilliary rootlet structure near the apical side of the hair cells in those mutant polyps. The authors went on to apply ChIP-seq in N. vectensis and showed that the POU-IV-binding motifs are conserved across Cnidaria and Bilateria. They also used this ChIP-seq dataset to screen for possible POU-IV downstream targets and identified one of the candidate genes, PKD1, as a conserved effector gene that has been shown playing important functions in hair cells across different bilaterian animals. Furthermore, by cross-checking their results with the newly published single-cell transcriptome data from N. vectensis adults, the authors identified the putative cell cluster (c79) of mechanosensory hair cells and confirmed that pou-iv and PKD1 are indeed co-expressed in this cell type. This approach also enabled the identification of additional candidate POU-IV downstream targets, and based on the GO term analysis, it appears that many of these genes are involved in ion transport and sensory perception functions. In summary, the authors provide strong evidence to support that POU-IV likely functions as a terminal selector factor of hair cell development in the sea anemone N. vectensis. Comparing their findings with other animals, the authors suggested that POU-IV factor plays a conserved role in regulating mechanoreceptor differentiation across Cnidarians and Bilaterians and that this regulatory mechanism may represent an ancestral trait dated back to their common ancestor.

      This is a detailed study on the role of POU-IV factor during cnidarian mechanoreceptor cell development. In general, the manuscript is well written, most of the data presented are of great quality, and the conclusions of the paper are supported by the data. This study is a significant advancement to our understanding on the evolutionary origin of sensory neurons and the possible genetic mechanisms underlying the diversification of neuronal cell types in animals.

      Strengths: The authors applied multiple approaches to examine the developmental process of hair cells in N. vectensis and analyze the molecular genetic functions of POU-IV factor during this process. The generation of gene-specific KO animals with CRISPR-Cas9 mediated mutagenesis in N. vectensis and the characterization of the sensory ability of those mutant animals with behavior assays provide compelling data to show that POU-IV factor is involved in the final maturation of mechanoreceptor hair cells. The ChIP-seq data generated by this study further enabled the authors to analyze the POU-IV factor binding sequences across animals, and the data also help to identify candidate downstream targets of POU-IV factor in N. vectensis system. Because POU-IV factor is likely involved in the development of multiple cell types in N. vectensis (as shown by previous publications and this study), this dataset would be highly valuable in the future for analyzing the differentiation process of different neuronal cell types in N. vectensis. In fact, by comparing with the recently available scRNA-seq resources, the authors have demonstrated the usefulness of this dataset and pointed out several interesting future research directions. Because N. vectensis is one of the few experimentally tractable systems within Cnidarian, which represents the sister group of bilatarian animals, experimental data from N. vectensis would give important mechanistic insights to infer the possible developmental characters in the common ancestor of Cnidarians and Bilaterians.

      We appreciate the reviewer’s excellent summary of our work.

      Weakness:

      1. The specificity of the POU-IV antibody staining. It appears that the signals of the POU-IV immune-staining are distributed quite extensively, especially near the basal part of the epithelia ectoderm (Figure 2A-L). And in their Western blot, the authors also noticed an extra band that might represent another protein in the N. vectensis sample that cross-reacted with their anti-POU-IV antibody. Although the authors provided controlled experiment showing that the immunostaining signals disappeared after they pre-absorbed the antibody with the POU-IV antigen (Figure 2 - supplement 2), this result can only demonstrate that indeed their antibody reacts specifically with this antigen. This cannot rule out the possibility that other N. vectensis protein(s) may possess peptide motifs similar to this antigen region and can be recognized by their antibody. Therefore, it would be nice if the authors can do double staining using in situ hybridization with pou-iv anti-sense riboprobe and immunostaining with their POU-IV antibody, to examine whether these two different methods would give overlapping results, so that they can be more confident about the specificity of their POU-IV antibody staining.

      We previously carried out the suggested double labeling experiments combining in situ hybridization with the pou-iv riboprobe and immunostaining with the anti-POU-IV. However, the signal to noise ratio of anti-POU-IV antibody staining decreased substantially when immunostaining was combined with in situ hybridization. We have therefore chosen, in this case, to compare the results of single labeling experiments for Pou-iv in situ and antibody staining side-by-side (Figure 2 – Figure supplement 1) to provide evidence that the pattern of POU-IV antibody staining in developing tentacles indeed recapitulates that of pou-iv mRNA expression.

      1. The electron microscopy data (Figure 5J-L) are not as clear as one would expect showing the differences in rootlet structure between the wildtype and mutant polyps, given that the phalloidin staining results (Figure 5B, E, H) show quite noticeable reduction in the mutant polyp tip. It is very hard to see the stereovillar rootlets (rlst) in Figure 5J, and thus it is very difficult to assess whether these structures are indeed affected in Figure 5K and 5L. In addition, the rootlet structure of the apical cilium in Figure 5K and L (presumably underneath "ci") appears to be less prominent compared to that shown in Figure 1I (labeled as "rlci"). I am not sure whether this is due to differences of the section angle, or whether it really reflects some differences between the wildtype and mutant.

      We had similar concerns with respect to the clarity of stereociliary rootlets of the F2 wildtype sibling animal shown in Figure 5J, and therefore included in the original submission additional electron microscopy evidence of stereociliary rootlets in hair cells of a wildtype sibling in Figure 5 – Figure supplement 2. Regarding the ciliary rootlet structure, we assume that the apparent lack of the structure in Figure 5K and L is due to the section angle not capturing the structure, as ciliary rootlets can be observed in other TEM sections of hair-cell-like cells in pou-iv mutant polyps. For clarification, we have added a supplementary figure (Figure 5 – Figure supplement 3) of a presumptive hair cell of a pou-iv mutant showing a ciliary rootlet.

    1. Author Response:

      Reviewer #1:

      Salehinejad et al. run a battery of tests to investigate the effects of sleep deprivation on cortical excitability using TMS, LTP/LTD-like plasticity using tDCS, EEG-derived measures and behavioral task-performance. The study confirms evidence for sleep deprivation resulting in an increase in cortical excitability, diminishing LTP-like plasticity changes, increase in EEG theta band-power and worse task-performance. Additionally, a protocol usual resulting in LTD-like plasticity results in LTP-like changes in the sleep deprivation condition.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations. In what follows, we addressed the comments one by one, revised the main text accordingly, and pasted the changes here as well.

      1) My main comment is regarding the motivation for executing this specific study setup, which did not become clear to me. It's a robust experimental design, with general approach quite similar to the (in the current manuscript heavily cited) Kuhn et al. 2016 study (which investigates cortical excitability, EEG markers, and changes in LTP mechanisms), with additional inclusion of LTD-plasticity measures. The authors list comprehensiveness as motivation, but the power of a comprehensive study like this would lie in being able to make comparisons across measures to identify new interrelations or interesting subgroups of participants differentially affected by sleep deprivations. These comparisons are presented in l. 322 and otherwise at the end of the supplementary material and the study does not seem to be designed with these as the main motivation in mind. Can the authors could comment on this & clarify their motivation? Maybe the authors can highlight in what way their study constitutes a methodological improvement and incorporates new aspects regarding hypothesis development as compared to e.g. Kuhn et al. 2016; currently, the authors highlight mainly the addition of LTD-plasticity protocols. Similarly, no motivation/context/hypotheses are given for saliva testing. There are a lot of different results, but e.g. the cortical excitability results are not discussed in depth, e.g. there is no effect on IO curve, but on other measures of excitability, the conclusion of that paragraph is only "our results demonstrate that corticocortical and corticospinal excitability are upscaled after sleep deprivation." There are some conflicting results regarding cortical excitability measures in the literature, possibly this could be discussed, so the reader can evaluate in what way the current study constitutes an improvement, for instance methodologically, over previous studies.

      Thank you for your comment/suggestion. The main motivation behind this study was to examine different physiological/behavioral/cognitive measures under sleep conditions and to provide a reasonably complete overview. This approach was not covered in detail by previous work, which is often limited to one or two pieces of behavioral and/or physiological evidence. Our study was not sufficiently powered to identify new interrelations between measures, because this was a secondary aim, although we found some relevant associations in exploratory analyses (i.e., association of motor learning with plasticity, and cortical excitability with memory and attention). Future studies, however, which are sufficiently powered for these comparisons, are needed to explore interrelations between physiological, and cognitive parameters more clearly and we stated this as a limitation (Page 22).

      That said, we agree that specific rationales of the study were not sufficiently clarified in the previous version. We rephrased and clarified respective motivations and rationales here:

      1) By comprehensive, we mean that we obtained measures from basic physiological parameters to behavior and higher-order cognition, which is not sufficiently covered so far. This includes also the exploration of expected associations between behavioral motor learning and plasticity measures, as well as excitability parameters and cognitive functions.

      2) In the Kuhn et al. (2016) study, cortical excitability was obtained by TMS intensity (single- pulse protocol) to elicit a predefined amplitude of the motor-evoked potential, which is a relatively unspecific parameter of corticospinal excitability. In the present study, cortical excitability was monitored by different TMS protocols, which cover not only corticospinal excitability, but also intracortical inhibition, facilitation, I-wave facilitation, and short-latency afferent inhibition, which allow more specific conclusions with respect to the involvement of cortical systems, neurotransmitters, and -modulators.

      3) Furthermore, Kuhn et al (2016) only investigated LTP-like, but not LTD-like plasticity. LTD- like plasticity was also not investigated in previous works to the best of our knowledge. LTD- like plasticity has however relevance for cognitive processing, and furthermore, knowledge about alterations of this kind of plasticity is important for mechanistic understanding of sleep- dependent plasticity alterations: The conversion of LTD-like to LTP-like plasticity under sleep deprivation is crucial for the interpretation of the study results as likely caused by cortical hyperactivity.

      4) Finally, an important motivation was to compare how brain physiology and cognition are differently affected by sleep deprivation, as compared to chronotype-dependent brain physiology, and cognitive performance, especially with respect to brain physiology, and performance at non-preferred times of the day. Our findings regarding the latter were recently published (Salehinejad et al., 2021) and comparisons of the present study with the published one have a novel, and important implications. Specifically, the results of both studies imply that the mechanistic background of sleep deprivation-, and non-optimal time of day performance- dependent reduced performance differs relevantly.

      We clarified these motivations in the introduction and discussion. Please see the revised text below:

      "The number of available studies about the impact of sleep deprivation on human brain physiology relevant for cognitive processes is limited, and knowledge is incomplete. With respect to cortical excitability, Kuhn et al. (2016) showed increased excitability under sleep deprivation via a global measure of corticospinal excitability, the TMS intensity needed to induce motor-evoked potentials of a specific amplitude. Specific information about the cortical systems, including neurotransmitters, and - modulators involved in these effects (e.g. glutamatergic, GABAergic, cholinergic), is however missing. The level of cortical excitability affects neuroplasticity, a relevant physiological derivate of learning, and memory formation. Kuhn and co-workers (2016) describe accordingly a sleep deprivation-dependent alteration of LTP-like plasticity in humans. The effects of sleep deprivation on LTD-like plasticity, which is required for a complete picture, have however not been explored so far. In the present study, we aimed to complete the current knowledge and explored also cognitive performance on those tasks which critically depend on cortical excitability (working memory, and attention), and neuroplasticity (motor learning) to gain mechanistic knowledge about sleep deprivation-dependent performance decline. Finally, we aimed to explore if the impact of sleep deprivation on brain physiology and cognitive performance differs from the effects of non-optimal time of day performance in different chronotypes, which we recently explored in a parallel study with an identical experimental design (Salehinejad et al., 2021). The use of measures of different modalities in this study allows us to comprehensively investigate the impact of sleep deprivation on brain and cognitive functions which is largely missing in the human literature."

      We added more details about the rationale for saliva sampling:

      "We also assessed resting-EEG theta/alpha, as an indirect measure of homeostatic sleep pressure, and examined cortisol and melatonin concentration to see how these are affected under sleep conditions, given the reported mixed effects in previous studies."

      We also rephrased the cortical excitability results. Please see the revised text below:

      "Taken together, our results demonstrate that glutamate-related intracortical excitability is upscaled after sleep deprivation. Moreover, cortical inhibition was decreased or turned into facilitation, which is indicative of enhanced cortical excitability as a result of GABAergic reduction. Corticospinal excitability did only show a trendwise upscaling, indicative for a major contribution of cortical, but not downstream excitability to this sleep deprivation-related enhancement."

      "The increase of cortical excitability parameters and the resultant synaptic saturation following sleep deprivation can explain the respective cognitive performance decline. It is, however, worth noting that our study was not powered to identify these correlations with sufficient reliability, and future studies that are powered for this aim are needed.

      Our findings have several implications. First, they show that sleep and circadian preference (i.e., chronotype) have functionally different impacts on human brain physiology and cognition. The same parameters of brain physiology and cognition were recently investigated at circadian optimal vs non-optimal time of day in two groups of early and late chronotypes (Salehinejad et al., 2021). While we found decreased cortical facilitation and lower neuroplasticity induction (same for both LTP and LTD) at the circadian nonpreferred time in that study (Salehinejad et al., 2021), in the present study we observed upscaled cortical excitability and a functionally different pattern of neuroplasticity alteration (i.e., diminished LTP-like plasticity induction and conversion of LTD- to LTP-like plasticity)."

      2) EEG-measures. In general, I find the presented evidence regarding a link between synaptic strength and human theta-power is weak. In humans, rhythmic theta activity can be found mostly in the form of midfrontal theta. Here, the largest changes seem to be in posterior electrodes (judging according to in Fig 4 bottom row), which will not capture rhythmic midfrontal theta in humans. Can the authors explain the scaling of the Fig. 4 top vs. bottom row, there seems to be a mismatch? No legend is given for the bottom row. The activity captured here is probably related to changes in nonrhythmic 1/f-type activity (which displays large changes relating to arousal: e.g. https://elifesciences.org/articles/55092. It would be of benefit to see a power spectrum for the EEG-measures to see the specific type of power changes across all frequencies & to verify that these are actually oscillatory peaks in individual subjects. As far as I understood, the referenced study Vyazovskiy et al., 2008 contains no information regarding theta as a marker for synaptic potentiation. The evidence that synaptic strength is captured by the specifically used measures needs to be strengthened or statements like "measured synaptic strength via the resting-EEG theta/alpha pattern" need to be more carefully stated.

      Thank you for this comment. We removed the Pz electrode from the figure and instead added F3 and F4 along with Fz and Cz to capture more mid-frontal regions. Please see the revised Figure 4. The top rows now include only midfrontal and midcentral areas (Fz, Cz, F3, F4), and show numerical comparisons of midfrontal theta which is significantly different across conditions (and larger after sleep deprivation). The purpose of the bottom figures, which are removed now, was just to provide an overall visual comparison of theta distribution across sleep conditions. However, we agree that the bottom-row figures are misleading because these just capture average theta band power without specifying midfrontal regions. We removed this part of the figure to prevent confusion. Please see below.

      Regarding the power spectrum, we also added new figures (4 g) showing how different frequency bands of the power spectrum are affected by sleep deprivation. Please see the revised Figure 4 below.

      Updated results, page 12-13:

      "In line with this, we investigated how sleep deprivation affects resting-state brain oscillations at the theta band (4-7 Hz), the beta band (15-30 Hz) as another marker of cortical excitability, vigilance and arousal (Eoh et al., 2005; Fischer et al., 2008) and the alpha band (8-14 Hz) which is important for cognition (e.g. memory, attention) (Klimesch, 2012). To this end, we analyzed EEG spectral power at mid-frontocentral electrodes (Fz, Cz, F3, F4) using a 4×2 mixed ANOVA. For theta activity, significant main effects of location (F1.71=18.68, p<0.001; ηp2=0.40) and sleep condition (F1=17.82, p<0.001; ηp2=0.39), but no interaction was observed, indicating that theta oscillations at frontocentral regions were similarly affected by sleep deprivation. Post hoc tests (paired, p<0.05) revealed that theta oscillations, grand averaged at mid-central electrodes, were significantly increased after sleep deprivation (p<0.001) (Fig. 4a,b). For the alpha band, the main effects of location (F1.49=12.92, p<0.001; ηp2=0.31) and sleep condition (F1=5.03, p=0.033; ηp2=0.15) and their interaction (F2.31=4.60, p=0.010; ηp2=0.14) were significant. Alpha oscillations, grand averaged at mid-frontocentral electrodes, were significantly decreased after sleep deprivation (p=0.033) (Fig. 4c,d). Finally, the analysis of beta spectral power showed significant main effects of location (F1.34=6.73, p=0.008; ηp2=0.19) and sleep condition (F1=6.98, p=0.013; ηp2=0.20) but no significant interaction. Beta oscillations, grand averaged at mid-frontocentral electrodes, were significantly increased after sleep deprivation (p=0.013) (Fig. 4e,f)."

      Fig. 4. Resting-state theta, alpha, and beta oscillations at electrodes Fz, Cz, F3 and F4. a,b Theta band activity was significantly higher after the sleep deprivation vs sufficient sleep condition (tFz=4.61, p<0.001; tCz=2.22, p=0.034; tF3=2.93, p=0.007; tF4=4.78, p<0.001). c,d, Alpha band activity was significantly lower at electrodes Fz and Cz (tFz=2.39, p=0.023; tCz=2.65, p=0.013) after the sleep deprivation vs the sufficient sleep condition. e,f, Beta band activity was significantly higher at electrodes Fz, Cz and F4 after sleep deprivation compared with the sufficient sleep condition (tFz=3.06, p=0.005; tCz=2.38, p= 0.024; tF4=2.25, p=0.032). g, Power spectrum including theta (4-7 Hz), alpha (8-14 Hz), and beta (15-30 Hz) bands at the electrodes Fz, Cz, F3 and F4 respectively. Data of one participant were excluded due to excessive noise. All pairwise comparisons for each electrode were calculated via post hoc Student’s t-tests (paired, p<0.05). n=29. Error bars represent s.e.m. ns = nonsignificant; Asterisks indicate significant differences. Boxes indicate the interquartile range that contains 50% of values (range from the 25th to the 75th percentile) and whiskers show the 1 to 99 percentiles.

      Regarding the reference, unfortunately, we were referring to a different work of the Vyazovskiy team. We meant Vyazovskiy et al. (2005). We removed this reference and the part that needed to be toned down from the introduction and added new relevant references while tuning down the statement about synaptic strength. Please see below:

      Revised text, Results, page 12:

      "So far, we found that sleep deprivation upscales cortical excitability, prevents induction of LTP-like plasticity, presumably due to saturated synaptic potentiation, and converts LTD- into LTP-like plasticity. Previous studies in animals (Vyazovskiy and Tobler, 2005; Leemburg et al., 2010) and humans (Finelli et al., 2000) have shown that EEG theta activity is a marker for homeostatic sleep pressure and increased cortical excitability (Kuhn et al., 2016)."

      3) In general, the authors generally do a good job pointing out multiple comparison corrected tests. In some cases, e.g. for their correlational analyses across measures, significant results are reported, but without a clearer discussion on what other tests were computed and how correction was applied, the evidence strength of these are hard to evaluate. Please check for all presented correlations.

      Thank you for your comment. For correlational analyses, no correction for multiple comparisons was computed, because these were secondary exploratory analyses. We state this now clearly in the manuscript. For the other analyses, the description of multiple comparisons is included below:

      Methods, pages 35-37:

      "For the TMS protocols with a double-pulse condition (i.e., SICI-ICF, I-wave facilitation, SAI), the resulting mean values were normalized to the respective single-pulse condition. First, mean values were calculated individually and then inter-individual means were calculated for each condition. For the I-O curves, absolute MEP values were used. To test for statistical significance, repeated-measures ANOVAs were performed with ISIs, TMS intensity (in I-O curve only), and condition (sufficient sleep vs sleep deprivation) as within-subject factors and MEP amplitude as the dependent variable. In case of significant results of the ANOVA, post hoc comparisons were performed using Bonferroni-corrected t-tests to compare mean MEP amplitudes of each condition against the baseline MEP and to contrast sufficient sleep vs sleep deprivation conditions. To determine if individual baseline measures differed within and between sessions, SI1mV and Baseline MEP were entered as dependent variables in a mixed-model ANOVA with session (4 levels) and condition (sufficient sleep vs sleep deprivation) as within-subject factors, and group (anodal vs cathodal) as between-subject factor. The mean MEP amplitude for each measurement time-point was normalized to the session’s baseline (individual quotient of the mean from the baseline mean) resulting in values representing either increased (> 1.0) or decreased (< 1.0) excitability. Individual averages of the normalized MEP from each time-point were then calculated and entered as dependent variables in a mixed-model ANOVA with repeated measures with stimulation condition (active, sham), time-point (8 levels), and sleep condition (normal vs deprivation) as within-subject factors and group (anodal vs cathodal) as between-subject factor. In case of significant ANOVA results, post hoc comparisons of MEP amplitudes at each time point were performed using Bonferroni-corrected t-tests to examine if active stimulation resulted in a significant difference relative to sham (comparison 1), baseline (comparison 2), the respective stimulation condition at sufficient sleepvs sleep deprivation (comparison 3), and the between-group comparisons at respective timepoints (comparison 4).

      The mean RT, RT variability and accuracy of blocks were entered as dependent variables in repeated-measures ANOVAs with block (5, vs 6, 6 vs 7) and condition (sufficient sleep vs sleep deprivation) as within-subject factors. Because the RT differences between blocks 5 vs 6 and 6 vs 7 were those of major interest, post hoc comparisons were performed on RT differences between these blocks using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons. For 3-back, Stroop and AX-CPT tasks, mean and standard deviation of RT and accuracy were calculated and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. For significant ANOVA results, post hoc comparisons of dependent variables were performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      For the resting-state data, brain oscillations at mid-central electrodes (Fz, Cz, F3, F4) were analyzed with a 4×2 ANOVA with location (Fz, Cz, F3, F4) and sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factors. For all tasks, individual ERP means were grand-averaged and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. Post hoc comparisons of grand-averaged amplitudes was performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      To assess the relationship between induced neuroplasticity and motor sequence learning, and the relationship between cortical excitability and cognitive task performance, we calculated Pearson correlations. For the first correlation, we used individual grand-averaged MEP amplitudes obtained from anodal and cathodal tDCS pooled for the time-points between 0, and 20 min after interventions, and individual motor learning performance (i.e. BL6-5 and BL6-7 RT difference) across sleep conditions. For the second correlation, we used individual grand-averaged MEP amplitudes obtained from each TMS protocol and individual accuracy/RT obtained from each task across sleep conditions. No correction for multiple comparisons was done for correlational analyses as these were secondary exploratory analyses."

      There are also inconsistencies like: " The average levels of cortisol and melatonin were lower after sleep deprivation vs sufficient sleep (cortisol: 3.51{plus minus}2.20 vs 4.85{plus minus}3.23, p=0.05; melatonin 10.50{plus minus}10.66 vs 16.07{plus minus}14.94, p=0.16)"

      The p-values are not significant here?

      Thank you for your comment. The p-value was only marginally significant for the cortisol level changes. We clarified this in the revision. Please see below:

      Revised text, page 19:

      "The average levels of cortisol and melatonin were numerically lower after sleep deprivation vs sufficient sleep (cortisol: 3.51±2.20 vs 4.85±3.23, p=0.056; melatonin 10.50±10.66 vs 16.07±14.94, p=0.16), but these differences were only marginally significant for the cortisol level and showed only a trendwise reduction for melatonin."

      Reviewer #2:

      This study represents the currently most comprehensive characterization of indices of synaptic plasticity and cognition in humans in the context of sleep deprivation. It provides further support for an interplay between the time course of synaptic strength/cortical excitability (homeostatic plasticity) and the inducibility of associative synaptic LTP- LTD-like plasticity. The study is of great interest, the translation of findings is of potential clinical relevance, the methods appear to be solid and the results are mostly convincing. I believe that the writing of the manuscript should be improved (e.g. quality of referencing), clearer framework and hypothesis, reduction of redundancies, and more precise discussion. However, all of these points can be addressed since the overall concept, design, conduct and findings are convincing and of great interest to the field of sleep research, but also more broader to the neurosciences, to clinicians and the public.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations.

    1. Author Response

      Reviewer #1 (Public Review):

      The recordings done by the authors are impressive and rare, and I appreciate the efforts of the authors to bridge very different types of signals that are generally recorded in different paradigms. However, the analysis at many places is quite nuanced and high-level, making it difficult to directly compare these findings with previous results. I think several additional analyses are needed to properly place these findings with previous results.

      1. Effects of attention in V4 generally start earlier (~100 ms). It is unclear why no effect is observed during earlier time periods in these data. To make better comparison with previous studies (such as Nandy et al., 2017), the authors should show the average PSTHs in supragranular, granular and infragranular layers during both target-out versus target-in conditions. Interestingly, Nandy and colleagues found largest changes in firing rates in the granular layer. To better understand the ERP outside the cortex, the authors should also show the average LFPs in the three layers, for target-in and target-out conditions. It is surprising that MI analysis reveals no significant information about the target in granular layer - given that some attentional effects are seen in upstream areas such as V1 and V2.

      We have created a new figure showing multiunit activity and LFP across the layers in both attention conditions. It is included here for convenience. Accompanying text has been added to the Results and Discussion sections to address the reviewers’ comments.

      The timing of differentiation between attended and unattended in the population spiking activity is evident in both MUA and LFP. We note that the largest magnitude difference in population spiking between attention conditions was observed in the middle layers, consistent with Nandy et al., 2017. We wish to highlight two observations.

      First, with respect to the timing of attentional modulation, it should be noted that the attention task used in our study (pop-out visual search) is different from that used by Nandy et al., 2017, Neuron (cued change detection). The timing of “effects of attention” vary according to stimulus properties and task demands (the number of publications demonstrating this is too long to list). Hence, we do not expect equivalence between the times we measure and times Nandy et al. measure. Nonetheless we are happy to include the requested supplementary figure with that caveat in mind.

      Second, with respect to the surprising observation of a relationship between activity in the granular layer and the extracortical signal, we think it is important to remember that these information theoretic analyses are not simply correlational. That is, attentional modulation might be observed in both signals, but if the covariation of these signals trial-to-trial does not exist, then we would not expect a relationship in the mutual information analysis.

      1. Eye position analysis: my understanding is that the animals could make a saccade as soon as the arrays were displayed. Given that the main effect of attention is observed after ~150-200 ms, the potential effect of saccade preparation could be important. There could also be small eye movements before the saccade. Given that the RFs were quite fovial for one monkey and not too far from the fixation window, and the effect of attention appears to be quite late, detailed analysis of eye position and microsaccades is needed to rule out the possibility of differences in eye movements between target in and target-out conditions influencing the results. A timeline and some analysis of eye movement patterns would be appropriate. The authors should also clearly mention the mean and SD of the saccade onset.

      The reviewer makes a valuable observation. Saccades will influence the electrical signals, something we are quite familiar with (e.g., Godlove et al., 2011, J Neurophysiol). In an effort to combat this, we have two points worth noting. First, as was the case in the initial submission (which remains the same in the revision), we have clipped signals on a trial-by-trial basis prior to eye movements. By doing so, we cannot have an influence of the motor-related polarization of the task-demanded eye movement on the data.

      Second, we have prepared a microsaccade analysis – and accompanying newly added supplementary figure included here for convenience – to determine whether they might be driving the results. To do this, we identified trials where microsaccades occurred using a well-regarded microsaccade detection algorithm (Otero-Millan et al., 2014, J Vis). We then reperformed the information theoretic analysis across sessions after removing trials where microsaccades were detected. Briefly, we found that the information theoretic relationship persists in the absence of trials where microsaccades occurred. We believe this serves as evidence that microsaccades are not responsible for the information theoretic findings.

      To address the reviewer’s last point, we have included response time data (defined as the saccade onset latency) in the Results.

      1. Attention studies typically keep the stimulus in the RF the same to tease out the effect of attention from stimulus selectivity. Ideally, the comparison should be between the two green (or red) in RF conditions as shown in Figure 4A. However, these results are shown only after pooling across all color selective columns. This comparison should be shown from Figure 2 itself (i.e., Figure 2C should have green in the RF and red target outside).

      We have clarified prior to Figure 4 that we used a all trials including both colors in each of the attention conditions. That is, while the cartoon in Figure 2 shows only green-attended and red-unattended conditions, green-unattended and red-attended conditions were also included in this analysis. As the proportion of red-target and green-target trials was matched, this first analysis was designed in such a way that the influence of stimulus color should be minimized, yet all trials could still contribute to the calculation. We have included a new supplementary figure (included here for convenience) which is what we believe the reviewer requests. In this addition, we perform the information theoretic computation on only stimulus matched conditions. Briefly, we find that this approach does not seem to alter the temporal profile of information theoretic findings.

      1. Information has been well characterized in a large number of previous studies (generally yielding values between a few bits/s, see for example, Reich et. al, 2001, JNP). Here, the absolute value of mutual information seems rather low. This may be due to the way the information is computed. A discussion about these reasons would be useful for scientists interested in information-theoretic measures.

      We agree that the exact magnitude of our information theoretic analyses in curious. And while these methods have been widely characterized – they have not been characterized, to our knowledge, in relating intracortical laminar currents to extracortical field potentials. As such, we do not have a strong prior as to what we should expect magnitude-wise. We have expanded the discussion to note this observation and provide potential reasons as to why this might be the case. The conclusion being that further application of these methods to these datatypes is necessary to really gain a fuller sense of what should and shouldn’t be expected.

      1. Dependence on feature preference: The effect of spatial and feature attention is well studied. A multiplicative gain model of spatial attention would predict a larger increase in firing rates )and perhaps other signals such as CSD) for preferred versus non-preferred signals. Feature similarity gain model would predict the red preferring columns to increase their activity and green preferring columns to reduce their activity when the animal is attending to the feature red, irrespective of which stimulus is in the receptive field. Here, the task is a pop-out task which likely has both a spatial and feature attention component. The authors should discuss their findings in these contexts. Further, the authors should discuss whether their findings could just be a reflection of the magnitude of the change (which could be larger for preferred versus non-preferred stimulus). The information-theoretic measure should ideally not depend on the absolute magnitude, but these quantities often get biased in non-trivial ways based on the magnitude. Does information transmission depend on the magnitudes of firing rates/CSDs?

      The relationship of these findings to the specificities of attentional mechanisms and models is indeed intriguing. As the reviewer suggested, this task likely engages both spatial and feature attention – however, the design was not such that they can be disentangled wholly. We have added text to the Discussion to reflect this consideration. As for the potential influence of response magnitude changes on the information theoretic analyses – the exact parameters were chosen to mitigate concerns about magnitude. That is, we chose a uniform count binning procedure on the data which eliminates potential issues such as outliers driving relationships as well as the changes in variability associated with increases in magnitude. Moreover, the uniform count binning procedure results with states rather than magnitudes which again mitigates response-magnitude-driven effects.

      1. For columns that were not feature selective, is there an effect of attention? Does the magnitude of N2pc change depend on color selectivity? I think that should be the case based on Figure 4H and 4I, but a plot and/or some quantification would be useful.

      These questions have been addressed in a newly added supplementary figure as well as quantification in the Results. Briefly, we did find an effect of attention non-selective columns. Also, we found the magnitude of N2pc did not depend on color-selectivity of the intracortical recording. The results were reported as:

      “We also tested whether feature selective columns, on average, transmitted more information than their non-feature-selective counterparts. We found that feature selective columns, in all laminar compartments, transmitted significantly more information (Figure 4I) (two-sample t test: L2/3, p = 0.044; L4, p = 0.023; L5/6, p = 0.009). As such, we wanted to determine if this was due to a lack of attentional modulation in the non-selective columns. This was not the case, we observed that non-selective columns were modulated with attention. Attentional modulation was observed in both the CSD in L2/3 and L5/6 (one-sample t test: L2/3: t(64) = -6.01, p = 9.8e-8; L4: t(64) = -0.18, p = 0.86; L5/6: t(64) = 5.24, p = 1.9e-6) as well as across all layers in the population spiking activity (one-sample t test: L2/3: t(64) = 8.00, p = 3.7e-11; L4: t(64) = 9.66, p = 4.1e-14; L5/6: t(64) = 7.58, p = 1.8e-10) during the N2pc interval (averaged 150-190 ms following array onset) (Figure S6).

      Importantly, we tested whether the N2pc varied across sessions with or without color-selective columns sampled. We found no difference between N2pc polarization (150-190 ms after the array) between sessions with (n = 17) or without (n = 13) sampling of color selective columns (two sample t test: t(28) = -0.75, p = 0.46). This invariance is expected because extracortical EEG spatially integrates signals from multiple cortical columns.”

      Reviewer #2 (Public Review):

      Scalp ERPs are widely used in human neuroscience research to understand basic mechanisms of neural and cognitive function and to understand the nature of neurological and psychiatric research. However, this research is hampered by a surprising lack of research in animal models exploring the neural mechanisms that produce specific ERP components.

      Previous research by this research group identified a potential monkey homologue of the N2pc component, a neural correlate of the focusing of attention onto visual objects embedded in arrays of distractors. The present study took a giant leap forward by recording extracellular potentials from densely spaced arrays of electrodes (.1 mm spacing) on probes that extended perpendicular to the cortical surface. These electrode arrays made it possible to simultaneously record voltages throughout the different layers of a cortical column and convert these voltages into current source density (CSD, which isolates local synaptic current flow and minimize volume-conducted activity from other brain regions). In addition, simultaneously recorded voltage from an electrode just above the cortical surface was used as a proxy for scalp potentials. Scalp ERP recordings were also obtained from separate monkeys to measure the actual scalp ERPs and verify that an N2pc-like ERP was elicited by the task (a simple visual search task in which the monkey made an eye movement to the location of a color popout item).

      Very clear CSD was observed in V4 in both supragranular and infragranular layers that was stronger when attention was directed to the contralateral visual field than when attention was directed to the ipsilateral visual field, which is the hallmark of the N2pc component. Little or no such activity was observed in the granular layer (the primary recipient of feedforward projections). In addition, the effects were observed primarily when the column was selective for the target's color. An information theory analysis showed that these intracortical current flows contained significant information about the voltage measured on the cortical surface and the location of the target object.

      All of these results were clear and convincing. Moreover, the laminar and columnar analyses provide interesting new evidence about attention-related neural activity independent of any considerations about ERPs. The most challenging aspect of the study is to provide a solid link from the intracortical activity to the voltage on the cortical surface, and then to the monkey scalp ERPs, and finally to human ERPs. Toward that end, the present study relied entirely on correlational evidence, rather than experimental manipulations. That's quite appropriate for a first step, but it must be considered an important limitation on the conclusions that can be drawn. It would be wonderful if future research took the next step of providing experimental evidence.

      We appreciate the reviewer noting that this manuscript is a valuable step in linking attention-associated electrophysiological signals across species. We also recognize that there is much work to be done in this domain. As requested, we have added to the Discussion the limitation of this type of study as well as what should be considered valuable next steps in this program of research.

      There are also some troubling aspects of the existing evidence. The scalp ERP effect in this study and the prior work from this groups is a positive voltage over the contralateral hemisphere, whereas in humans the voltage is negative. This may well reflect the orientation of the relevant cortical surface in monkeys versus humans. However, the voltage on the cortical surface in the present study was negative contralateral to the target, not positive. Unless this opposite voltage on the cortical surface relative to the scalp reflects something about the reference site for the cortical surface electrode, then this makes it difficult to link the intracortical effects and cortical surface effects to the scalp ERP effects. Also, the CSD was negative in the upper layers and positive in the lower layers, again suggesting that the voltage should be negative contralateral to the target on the surface. Ironically, this polarity is what would be expected from the human brain, where a contralateral negativity is observed. The oddity seems to be the contralateral positivity in the monkey scalp data. Also, the cortical surface voltage exhibits a polarity reversal at approximately 180 ms, which is not seen in the intracortical CSD.

      One possible explanation for the discrepancy is that the scalp voltage likely comes from multiple brain areas besides V4. If, for example, areas on the ventral surface of the occipital and temporal lobes produce stronger scalp voltages than V4 under the present conditions, the opposite orientation of these areas relative to the cortical surface would be expected to produce a positive voltage at the scalp electrodes.

      The manuscript notes that multiple areas probably contribute to the scalp ERPs and argues that the pattern of intracortical CSD results obtained in V4 will likely generalize to those areas. That seems quite plausible. Moreover, the results are interesting independent of their link to scalp ERPs. Thus, the present results are important even if the scalp polarity issue cannot be definitively resolved at this time.

      We thank the reviewer for expressing that the results are important whether this polarity difference can be resolved. This is an interesting observation and quite important to consider carefully. First, it is worth reiterating that the referencing setup in our ‘10/20’ monkeys was different than that for the monkeys where intracranial recordings took place. Specifically, the 10/20 recordings were more similar to our previous reports of monkey EEG (e.g., Woodman et al., 2007, PNAS; Cohen et al., 2009, J Neurophysiol; Purcell et al., 2013, J Neurophysiol). Recordings from these monkeys used either a frontal EEG electrode (approximately FpFz) or linked ears for referencing. These yielded the positive-going N2pc and contrast the negative-going N2pc found in humans. The V4 laminar recordings – and their accompanying extracortical signal – used a different referencing setup that we believe is the most likely candidate for the observed difference. Specifically, these recordings used a tied ground-reference setup which incorporated the support rod of the linear multielectrode array. This support rod extended into the brain meaning we had a neural tissue grounded signal and that the reference spanned the neural generator. Therefore, if we are not measuring both sides of the electric field across the generator equally, we might observe an inverted signal. Unfortunately, we cannot observe the 10/20 EEG distribution with an intracranial reference. Ideally, this could be resolved by an experiment where referencing setups are tested before and after performing craniotomy with a series of reference locations used to understand where exactly this flipping of polarization takes place. We have added this consideration to the Discussion and more thoroughly detailed the referencing setups in the Methods.

      There are also some significant concerns about the filters. The high-pass cutoff was high enough that it could have produced artifactual opposite polarity deflections in the data. If causal filters were applied (e.g., in hardware during the recordings), these artifactual deflections would have been after rather than before the initial deflection, possibly explaining the polarity reversal at 180 ms. If noncausal filters were applied in software, this would be a larger problem and could produce artifacts at both the beginning and end of the waveform. Moreover, the filters were different for the CSD data and the extracortical voltages, which is somewhat problematic for the information theoretic comparisons of these two data sources (but is likely to reduce rather than inflate the effects).

      In revisiting the description of the recording system and filters, we see how some information was conveyed poorly. The language describing the recording in the original submission suggested that online filters were applied to the data as it was being recorded. This was not the case. We have changed that language so that it reads as the data was being collected at a sampling frequency sufficient to observe data between 0.1 Hz and 12 kHz rather than the data being filtered between 0.1 Hz and 12 kHz. Also, it appears that the description of the processing sequence regarding CSD was ambiguous in the original submission. The CSD underwent the same offline, bandpass filtering procedure (1-100 Hz) as the extracortical signal. We have clarified the Methods accordingly.

      Reviewer #3 (Public Review):

      In this study, Westerberg et al., investigate the cortical origins of the N2pc, an ERP for selective attention. By using a combination of indefinite inverse models of cranial EEG and translaminar electrophysiology, the authors demonstrate that dipoles in V4 are the source of the N2pc.

      The study is well conducted and the manuscript is well written.

      We are pleased that the reviewer recognized the contribution of our efforts.

      I have a few comments about the CSD, RF alignment profiles, and LFP based analyses:

      (A) The method section states correctly that "current sinks following visual stimulation first appear in the granular input layer of the cortex, then ascend and descend to extra granular compartments". However in the example CSDs shown in Fig 2, Fig 3, Fig S3 there is no visible current sink in the infra-granular layers. Instead, the identified infra-granular layers show a prolonged current source (e.g. Fig S5B,C), which is unexpected.

      We have clarified the Methods to reflect the observations of our data and why they may differ from previous reports. We believe the discrepancy is likely due to the stimulus conditions used to evoke the CSD profile. Specifically, the descending infragranular sink in visual cortical columns has most commonly been described when CSD was computed while monkeys view briefly presented flashes or stimuli (e.g., Schroeder et al., 1998, Cereb Cortex). However, our study uses task evoked CSD to perform the alignment. Importantly, this means there is a persistent stimulus in the receptive field. We believe this persistent stimulus, rather than a flashed stimulus, leads to a persistent, strong sink in the superficial layers of cortex which would mask any current sink present in the infragranular layers (Mitzdorf, 1985, Physiol Rev). This is an observation we made in previous reports (Task evoked CSD: Westerberg et al., 2019, J Neurophysiol vs. Flash evoked CSD: Maier et al., 2010, Front Syst Neurosci), albeit in V1 instead of V4. Given the latency offset between putative granular and supragranular sinks, that we observe receptive fields below the putative granular input sink, and the demonstrable multiunit activation as indicated by the newly included Figure S2, we have no reservations in our assessment of the position of the electrode relative to the layers across sessions.

      (B) The example RF profile shown in Fig S5A, although aligned, looks a little strange in that the RFs taper off rapidly in the infra-granular layer. Is this the best representative example? It will be important to see other examples of RF alignment.

      The attenuation observed in the lower layers is largely due to overall decreased gamma power in the lower layers of cortex as compared to upper and middle layers (Maier et al., 2010, Front Syst Neurosci). At the reviewer’s request, we have added an additional panel to the noted supplementary figure which shows additional laminar receptive field profiles using the evoked LFP so that they are more directly comparable to those shown in Nandy et al., 2017, Neuron.

      (C) The study used LFP power in the gamma range to compute the response ratio between red and green stimuli. LFPs measured across the cortical depth are highly correlated, and so would gamma power estimated from the LFPs. Given this, how meaningful is the laminar analysis shown in Fig 4B? How confidently can it be established that the LFP derived gamma power estimates have laminar specificity?

      An astute observation – there are two aspects to consider. The existence of color-feature columns has been well-documented in V4 (e.g., Zeki, 1973, Brain Res; Zeki, 1980, Nature; Tootell et al., 2004, Cereb Cortex; Conway and Tsao, 2009, PNAS; Kotake et al., 2009, J Neurophysiol; Westerberg et al., 2021, PNAS). This manuscript did not need the evaluation of interlaminar differences in color selectivity to address the question at hand – the top of Figure 4B only serves as a step to the bottom of Figure 4B which provides the measurements used for the subsequent analyses. Thus, the estimation of color selectivity from gamma was sufficient to capture a general sense of the color selectivity of the column. Second, we recently published a manuscript which directly addresses the laminar specificity of gamma with respect to feature selectivity. Westerberg et al., 2021, PNAS uses a spatially localized form of gamma to evaluate color-feature selectivity along V4 columns. In that manuscript, we find a high degree of consistency along the layers of cortex using the gamma signal. Notably, we compared the gamma signal to the population spiking and found a high degree of coherence between selectivity in those two measures as a function of cortical depth. Given the secondary nature of the interlaminar feature selectivity to this submitted manuscript and the detailed report of laminar feature selectivity using the same dataset in another manuscript, we are inclined to leave the analysis reported here as is with adjustments to the text that note these considerations now included in the Results.

    1. Author Response

      Reviewer #1 (Public Review):

      The effects of antimicrobial peptides (AMPs) on bacteria is a major question in biology and is of great importance in medicine. Biophysical studies, such as the one described in the present manuscript, attempt to gain physical insight into the molecular mechanisms behind such effects. These may vary from structural effects of AMPs on the bacterial envelope membranes, to direct effects on metabolism from their presence in the cytosol.

      In this manuscript the authors build upon a recently published seminal paper (Semeraro et al, Acta Crystallographica 2021), where they were able to fit x-ray (USAX and SAXS) and neutron (VSANS and SANS) scattering data from live E.coli cells over four orders of magnitude in length scale. In the present manuscript, they have employed time-resolved USAX/SAXS, using stopped-flow, coupled with contrast variation SANS, transmission electron microscopy, and activity assays, to study the interaction of AMPs with live E.coli cells.

      The shifts in the scattering curve (Figure 1) induced by the AMP appear to be quite subtle, and yet the detailed analysis described here and in the authors' previous paper is able to detect changes in a number of structural parameters (Table 1). How much confidence do the authors have in the estimated errors cited in this table?

      This is indeed a crucial point. Even if the differences in USAXS/SAXS data (reciprocal space) look small at the first glance, this translates in real space into significant effects. Focusing on Figure 1 – Figure supplement 2, and specifically on panel E, the significance of these differences becomes clear even for scattering data. Here, USAXS/SAXS data are scaled to the same sample concentration, allowing a direct comparison of E. coli initial (w/o AMP) and end-states.

      The analytical model, developed in Semeraro et al., J Appl. Cryst. 2021, was fitted to experimental data using a chi-squared-minimization based on a Monte Carlo genetic-selection algorithm, which is a robust method to deal with a high number of adjustable parameters. This yielded distribution functions for all adjustable parameters. We report the mean and standard deviations of these distributions, which yields highly realistic uncertainties. We are thus highly confident in the stated errors. In the revised manuscript we now show these distribution functions (Fig. 2- supplement 5) and also correlation plots between these parameters (Fig. 2- supplement 6). We have revised the manuscript thoroughly to guide the reader better through all aspects of the analysis.

      Reviewer #2 (Public Review):

      This article presents a novel and powerful approach, based on small angle scattering, to study the effects of antimicrobial molecules on bacterial cells in real time, obtaining information at multiple spatial scales (nm-um). As such, it is highly interesting.

      The main result of the present study is that the peptides accumulate in the cytosol within a few seconds. This finding is solid and peptide accumulation inside the cell is in agreement with several previous studies. However, from this observation the authors conclude that blockage of metabolic activity and not membrane perturbation is the mechanism of bacterial killing. In my opinion, this conclusion is not adequately supported by the data, since bacterial killing over time and metabolic activity were not studied, and membrane perturbation took place essentially on the same time-scale of peptide accumulation in the cytosol. Data interpretation should be revised with increased caution

      We thank the reviewer for the highly positive evaluation of our manuscript. We agree that based on our current data, we cannot comment on any blocking of metabolic activity. Our data clearly shows, however, that the vitally detrimental mode action of the presently studied AMPs occurs in the cytosol (without knowing any further detail). We carefully reformulated accordingly.

      Reviewer #3 (Public Review):

      Semeraro et al. present a very interesting work on the impact of antimicrobial peptide Lactoferricin on the structure of bacteria. They use primarily small-angle neutron/X-ray scattering to look for structural hallmarks of the effect of the AMP. Based mainly on SANS/SAXS results they conclude that the peptide enters the cytosol "within seconds" and cause irreversible damage. The work is nicely carried out and well written, but I wonder whether it is a bit too ambitious and bold in its claims. In particular considering the shaky ground of which the SAXS/SANS fit analysis is constructed. I agree that TEM backs it up but the results seem only to clarify structural changes at larger scales / morphological features. The manuscript should be published in some form after carefully taking into account several concerns from the reviewer (listed above).

      We thank the reviewer for the positive evaluation of our manuscript. The claims put forward in our study are based on a solid and thorough data analysis, combining besides neutron/ X-ray scattering and TEM also results obtained from bioscreen, zeta potential, dynamic light scattering and fluorescence spectroscopy experiments. We understand, however, that the SAS data analysis is complex and deserves a better explanation. We originally thought that it is sufficient to refer mainly to our paper, where we give all details about the USAXS/SAXS/VSANS/SANS analysis (Semeraro et al., J. Appl. Cryst, 2021). The revised version of our manuscript now gives some additional details that will help to appreciate the robustness of our statements regarding the activity of LF11-324, LF11-215 and O-LF11-215 in live E. coli.

    1. Author Response

      Reviewer #2 (Public Review):

      Strengths

      The strongest aspect of this study is about the relationship between resting potential, action potential and calcium transient. This is the first study demonstrating such relationships and this was quite overdue.

      Weaknesses

      There is a misunderstanding about the all or none concept. The authors argued that some researchers had proposed that the potassium effects on force was an all or none effect, which is not true. It is true that the range of potassium concentrations and resting potentials over which the decreases in twitch forces is very narrow, it does not implicate an all or none concept because the decreases are steep but still gradual. The all or non concept for action potential does not implicate that the action potential shape is constant in all physiological conditions; instead for a given physiological condition the action potential shape is constant but changes between physiological condition.

      We have extensively rewritten the manuscript to remove the all or none concept.

      There is a major lack of information about the calcium indicator that has been used for this study. Consequently, it is difficult to validate the relationship between peak calcium and resting potential or overshoot relationship. It is not clear whether the indicator get saturated with calcium and there is a major issue about a short twitch contraction in msec and a calcium transient lasting minutes.

      We have added more information on the Ca2+ indicator dye to the manuscript. The reviewer is correct about potential issues of interpretation and a section on limitations has been added to the discussion. A point of clarification, the signal from GCAMP6f lasts 1 s, not minutes.

      Finally the experimental temperature was 25°C when during exercise muscle temperature often exceeds 37°C. So, it will not be possible to use any of the relationships provided in this study to understand the role of potassium in fatigue because the potassium effects on all three parameters are very temperature sensitive.

      This is an excellent point. We hope, however, that the reviewer will agree that the given the significant number of new experiments performed, repeating all of the experiments at 37 is beyond the scope of the current work. We plan to do studies at a more physiologic temperature in the future.

    1. Author Response:

      Reviewer #2:

      The SRF transcription factor regulates gene transcription through associating with ERK- and actin-regulated cofactors belonging to the TCF and Myocardin families. Each family has multiple members, which exhibit differing expression profiles, and which are to varying extents functionally redundant, which has complicated their functional analysis. Thus, while inactivation of SRF itself leads to failure of gastrulation, inactivation of individual TCF and myocardin-family genes results in much later developmental defects, or barely affects development. Nevertheless studies of this sort have established that myocardin is limiting for VSM development at e10.5, while MRTF-B and myocardin function become limiting in neural crest cells at e14.

      Vasudevan and Soriano previously presented evidence for a PDGF-SRF axis operating in neural crest cells during craniofacial development: NC-specific SRF inactivation caused facial clefting, and in embryonic palatal mesenchymal cells, PDGF signalling to SRF cytoskeletal target genes that are controlled through MRTF in fibroblasts was impaired. These results pointed to a role for MRTF signalling to SRF in craniofacial development.

      "Differential regulation…." by Dinsmore and Soriano revisits these findings. They take a novel approach to assessment of SRF cofactor function by analysing an SRF variant, SRF-alpha1. This SRF derivative was previously shown to be defective in recruitment of MRTF-A (and by extension other myocardin family members) but remained competent to recruit the TCFs. They show that:

      • Homozygous SRF-1 mutant mouse embryos survive to e10.5, when they succumb to defects similar to the myocd knockout.

      • Anterior mesoderm (Mesp1-cre) SRF-alpha1 embryos last to e10.5, similar to the null, and phenocopy global Myocd mutants.

      • Surprisingly, (Wnt1-cre) SRF-alpha1 embryos do not show facial clefting, and there is no genetic interaction with PDGFR, although their palatal MEPM cells are selectively defective in MRTF-SRF target gene expression

      • Instead, (Wnt1-cre) SRF-alpha1animals survive to birth, and succumb to cyanosis resulting from highly penetrant cardiac outflow defects reminiscent of those seen in two other models: a hypomorphic MRTF-B genetrap mutation, and an NC-specific (Wnt1-cre) Myocd mutant.

      The results raise two main questions:

      1) What is the basis of the gastrulation defect seen in SRF-null embryos? The results suggest that in cannot not reflect a deficient MRTF signalling, but triple TCF-deficient embryos live beyond this point, so why is there a defect?

      2) What is the basis for the craniofacial defects in wnt1-Cre SRF-null embryos? The previous proposal that they result from defective PDGF-MRTF-SRF signalling was based on correlation with defective MRTF gene expression, but the SRF-alpha1 result suggests this is not in fact correct.

      The authors cannot answer these questions, but propose three possible explanations for their findings: (i) that the SRF-alpha1 allele is a hypomorph not a null for MRTF interaction; (ii) that the TCF cofactors execute some SRF pathways; and (iii) that other undefined SRF cofactors may be involved.

      I found this paper enjoyable to read, but hard to review, because it is a clever experiment that raises more questions than it answers. It is an interesting study for a specialist in the SRF field, but less conclusive in terms of clarifying SRF's biological roles.

      Unfortunately the paper does not directly answer the questions it raises - it does not directly test the role of MRTFs (and/or myocardin) in the processes analysed, and does not assess the requirement for SRF cofactors per se using appropriate SRF mutants. For example, global MRTF-A/B double knockouts (or A/Mycd, BMycd), which could give direct insights into potential early myocardin-family functions, have not been reported. In the view of this reviewer, however, to do such experiments as part of this study would be inappropriate, and the paper would be of value as a spur to the field.

      However, I do have concerns as to the way the data are presented and discussed. To a more general reader in development or transcription, the discussion does not pose the issues clearly, and would benefit from reworking. It would be clearer if the alternative explanations for variance from the simple MRTF-null view of SRF-alpha1 should be posed briefly, and then the basis for each of the phenotypes observed considered in turn.

      This is an excellent suggestion to better orient the non-SRF cognoscenti and we have updated the Discussion accordingly, as well as incorporating the reviewer’s extremely insightful ideas that follow. We briefly address them point by point below.

      In addition, the authors leave some issues unaddressed in their discussion. For example, they do not consider:

      1) That the gastrulation defect of SRF-nulls may reflect a cofactor-independent SRF activity. This is plausible, since SRF does have a constitutive transcription activation function. One possible way to test it would be to introduce a mutation such as SRF V194E that blocks both TCF and myocardin-family interactions with SRF (Ling et al 1998 JBiolChem). This mutant phenocopies an SRF-null in immune cells (Mylona et al, 2011 MCB), and should it bypass the gastrulation defect, TCF/MRTF-independent SRF function would be highly likely.

      This is an important possibility that we now discuss along with the possibility of alternate cofactors.

      2) That the SRF-alpha1 allele is a hypomorph for MRTF interaction but a null (or stronger hypomorph) for Myocd interaction. On this model the SRF-alpha1 phenotypes might reflect Myocd recruitment - the lack of craniofacial phenotype might reflect residual MRTF-B interaction, but the later cardiac outflow phenotype would arise from limiting Mycd interaction.

      This interpretation is totally consistent with our results and we now include it along with the discussion of tissue-specific thresholds.

      3) That for some functions the TCF and Myocardin families act through SRF in a functionally redundant manner, so inactivating one family would not impair function.

      We meant to suggest this with our first interpretation, “TCF factors … can somehow compensate for loss of SRF-MRTF activity. It is true that some SRF targets can be bound and regulated by both MRTF and TCF factors.” However, we now more strongly invoke their possible redundancy to highlight this point.

      4) That the MRTF-A and MRTF-B proteins act functionally redundantly with myocardin.

      This is an important consideration in tissues where Myocd is expressed and we now address this.

      5) Their previous paper identified Mrtfa as the only MRTF expressed above background level in MEPM cells. However, the MRTFa knockout mouse develops normally. Thus, if MRTFs are involved in the clefting phenotype, a substantial decrease in MRTF activity can be tolerated before the phenotype becomes manifest. The nonclefting phenotype of the SRF-alpha1 mutant would not be unexpected if this were the case.

      The reviewer’s interpretation is correct as stated, but we believe this is an instance where the shortcomings of cell culture complicate matters. Mrtfa is expressed roughly 5x higher than Mrtfb at the mRNA level in passage 2 MEPMs, based on RNA-Seq data from Vasudevan et al., 2014 and Vasudevan et al., 2015. However, in whole E13.5 palate (from which MEPMs are derived) the expression levels are nearly equal. We find in our bulk RNA-Seq that Mrtfa and Mrtfb are expressed at equal levels. More strikingly, in NC cells sorted from E10.5 and E11.5 facial prominences, another study (Minoux et al., 2017) found that Mrtfb is expressed roughly 3x higher than Mrtfa. Rather than being well-tolerated like the loss of Mrtfa, Mrtfb null mutant mice have much stronger developmental phenotypes and the hypomorphic Mrtfb gene trap allele has NC-specific defects. All that stated, it is true that neither Mrtfa nor Mrtfb have been associated with facial clefts in the mouse. It would be interesting to know whether Mrtfa+/-; Mrtfb-/- embryos, Mrtfa-/-; Mrtfb-/- embryos (if they survive long enough), or Mrtfa-/-; Mrtfbflox/flox; Wnt1-CreTg/+ embryos would have clefts. We now include the gene expression data as a supplemental figure (Figure 1, Supplement 2E). Given the expression data from the Minoux study, we are cautious about interpreting the lack of a phenotype in Mrtfa mutants as strong evidence that NC can tolerate substantial depletion of Mrtf genes without clefting.

      6) The TCFs and MRTFs seem to compete to some extent at most SRF targets - for example, loss of TCFs potentiates cytoskeletal contractility. Thus the effectiveness of the SRF-alpha1 mutation in blocking MRTF-SRF activation in a given setting will be dependent not only on MRTF level but also on TCF level.

      This is an interesting point that we had not considered. We mentioned TCF-MRTF competition but now add this intriguing possibility as a further interpretation.

    1. Author Response:

      Reviewer #1:

      The dependence of cell volume growth rate on cell size and cell cycle is a long-standing fundamental question that has traditionally been addressed by using unicellular model organisms with simple geometry, for which rough volume estimates can be obtained from bright field images. While it became soon apparent that the volume growth rate depends on cell volume, the experimental error associated with such measurements made it difficult to determine the exact dependencies. This challenge is even more significant for animal cells, whose complex and dynamic geometry makes accurate volume measurements extremely difficult. Other measures for cell size, including mass or fluorescent reporters for protein content, partially bypassed this problem. However, it becomes increasingly clear that cell mass and volume are not strictly coupled, making accurate volume measurements essential. In their previous work, Cadart and colleagues established a 'fluorescent exclusion method', which allows accurate volume measurements of cells with complex geometry. In the present manuscript, Cadart et al. now take the next step and measure the growth trajectories of 1700 HeLa cell cycles with further improved accuracy, providing new insights into animal cell growth.

      They convincingly demonstrate that throughout large parts of the cell cycle, individual cells exhibit exponential growth, with the volume-normalized specific growth rate moderately increasing after G1-phase. At the very early stages of the cell cycle, cells exhibit a more complex growth behavior. The authors then go on and analyze the growth rate fluctuations of individual cells, identifying a decrease of the variance of the specific growth rate with cell volume and observed time scale. The authors conclude that the observed growth fluctuations are consistent with additive noise of the absolute growth rate.

      The experiments and analysis presented by Cadart et al. are carefully and well executed, and the insights provided (as well as the method established) are an important contribution to our understanding of cell growth. My major concern is that the observed fluctuation pattern seems largely consistent with what would be expected if the fluctuations stem from experimental measurement noise. This fact is appropriately acknowledged, and the authors aim to address this issue by analyzing background noise. However, further controls may be necessary to unambiguously attribute the measured noise to biological fluctuations, rather than experimental error.

      We thank the reviewer for their positive feedback and for the appreciation of our work. We performed a series of experimental controls to address the main issue regarding the measured fluctuation pattern, which indicate that it should be of biological origin.

      1.) To address whether the observed fluctuations could be due to experimental error, the authors analyze the fluctuations recorded in a cell-sized area of the background, and find that the background fluctuations are small compared to the fluctuations of the volume measurements. I think this is a very important control that supports the interpretation of the authors. However, I am not convinced that the actual measurement error is necessarily of the same amplitude as the fluctuations of the background. The background control will control for example for variations of light intensity and fluctuations of the fluorophore intensity. But what about errors in the cell segmentation? Or movement of the cells in 3D, which could be relevant because the collected light might be dependent on the distance from the surface? Is cell autofluorescence relevant at all? I am aware that accurately estimating the experimental error is exceptionally difficult, and I am also not entirely sure what would be the perfect control (if it even exists). Nevertheless, I think more potential sources of error should be addressed before the measured noise can be confidently attributed to biological sources. Maybe the authors could measure objects with constant volume over time, for example vesicles? As long as the segmented area contains the complete cell, the measured volume should not change if the area is increased. Is this the case?

      We are grateful to the reviewer for all these useful suggestions. We performed all these controls on the sources of noise, and we discuss them in the revised manuscript.

      2.) I am particularly puzzled by the fact that even at the timescale of the frame rate, fluctuations seem not to be correlated between 2 consecutive time points (Fig. 5-S2b). This seems plausible for (some) sources of experimental error. Maybe an experiment with fast time resolution would reveal the timescale over which the fluctuations persist - which could then give us a hint about the source?

      We performed this analysis, finding an autocorrelation time of a few minutes, and we report our results below:

      In the main text and in the new Figure 5 – Supplement 3, we report the results of newly performed 20 sec timelapse experiments over one hour to investigate the timescale of volume fluctuations. The autocvariance function analysis on the detrended curves shows that fluctuations decay over a few minutes (Figure 5 – Supplement 3a-c), a timescale that matches the analysis of the 10 min timelapse experiments.

      Copy of Figure 5 – Supplement 3: Autocovariance analysis shows that the timescale of volume fluctuation is around 760 seconds. a) Cells measured every 20 sec (n=177) and linearly detrended reach a covariance of 0 at a lag of 760 sec. b) As a control, the background fluctuations are not autocorrelated (20 sec, n=92), providing further evidence that cell volume fluctuations likely have biological origin. c) The autocovariance analysis for cells measured every 10 min confirms that fluctuations covary for a lag of 10-20 min.

      3.) The authors use automated smoothing of the measurement and removed outliers based on an IQR-criteria. While this seems reasonable if the aim is to get a robust measurement of the average behavior, I find it questionable with respect to the noise measurements. Since no minimum time scale has been associated with the fluctuations interpreted as biological in origin, what is the justification of removing 'outliers', i.e. the feature that the authors are actually interested in? Why would the largest fluctuations be of technical origin, and the smaller fluctuations exclusively biological?

      The IQR-criteria is designed to remove only rare and obvious outliers (i.e. a jump in volume of more than 15% in 1 frame -10 minutes- which arguably cannot happen biologically). Fluctuations of smaller range are kept (see examples below). We looked back at the raw data and calculated that the IQR filtering removes a total of 337 measurement points out of 99935 initial points (0.03% of the points).

      Figure D: Three examples of single cell trajectories with raw volume measurement (red dots) and points removed with the IQR filtering (blue dots). The IQR criteria is very stringent and removes only the very large ‘bumps’ in cell volume measured (2 left plots) while it keeps fluctuations of smaller amplitude (right plot).

      4.) If I understood correctly, each volume trajectory spans one complete cell cycle. If this is the case, does Fig. 1e imply that many cell cycles take less than 2-3 hours? Is this really the case, and if so, what are the implications for some of the interpretations (especially the early cell cycle part)?

      In this study, we performed experiments on a time scale comparable to the cell cycle time (~ 24hours) and recorded single-cell volume trajectories. Since the cells are not synchronized, we have very few complete cell cycles (~ 100, Fig. 1f). Fig. 1e shows the distribution of the duration of all individual curves, regardless of the fraction of the cell cycle they span, hence the very short duration for some cells.

      Reviewer #2:

      In this paper, the authors use a volume exclusion-based measurements to quantify single cell trajectories of volume increase in HeLa cells. The study represents one of the most careful measurements on volume regulation in animal cells and presents evidence for feedback mechanisms that slow the growth of larger cells. This is an important demonstration of cell autonomous volume regulation.

      While the subject matter of the present study is important, the insights provided are significantly limited because the authors did not place their findings in the context of previous literature. The authors present what seems to be remarkably accurate single cell growth trajectories. In animal cells, a joint dependency of growth rate on cell size and cell cycle stage has been previously reported (see Elife 2018 PMID: 29889021 and Science 2009 PMID: 19589995). In Ginzberg et al, it is reported "Our data revealed that, twice during the cell cycle, growth rates are selectively increased in small cells and reduced in large cells". Nonetheless, these previous studies do not negate the novelty in Cadart et al. While both Cadart and Ginzberg investigate a dependency of cellular growth rate on cell size and cell cycle stage, the two studies are complimentary. This is because, while Ginzberg characterise the growth in cell mass, Cadart characterise the growth in cell volume. The authors should compare the findings from these previous studies with their own and draw conclusions from the similarities and differences. Are the cell cycle stage dependent growth rate similar or different when cell size is quantified as mass or volume? Does the faster growth of smaller cells (the negative correlation of growth rate and cell size) occur in different cell cycle stages when growth is quantified by volume as compared to mass?

      We are grateful to the reviewer for their appreciation of the value of our study. Following their remarks, we have extended our Discussion section to incorporate a more careful discussion of these findings. We believe that the main contribution of our study is finding evidence of phase- dependent regulation of growth rate and identifying an additive noise on volume steps, this noise has constant amplitude, hence fluctuations of specific growth rate decrease with volume, but specific growth rate (in the bulk of the cell cycle) does not decrease.

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, the authors identify a role for mindbomb (mib) in the trafficking of the Ryk receptor, a Receptor Like Tyrosine Kinase with roles in planar cell polarity (PCP). The authors use a combination of tools and mutants in the zebrafish to propose that the role in gastrulation is notch-independent. A strength is the generation of new genetic alleles. The authors show a genetic interaction between Mib and Ryk. An intriguing finding from the double mutant (mib;MZryk), is that the impact of Mib is mediated through Ryk. This study highlights the need to consider complex signaling networks when evaluating phenotypes, especially convergent extension.

      A concern is that most of the analyses are with morpholino knockdown. It is understood that the use of the MO allows for higher sample sizes, but the authors would need to also investigate the functional rescue in the genetic mutants.

      In the revised version of our manuscript we have included a new series of experiments in which Ryk-GFP overexpression was used to rescue the CE defects of mib1tfi91 mutants (Fig.3L). In accordance with the data already obtained in mib1 morphants, this additional experiment confirms our previous observation that Ryk overexpression allows to rescue the morphogenetic defects of Mib1 depleted animals.

      Moreover, the CE defects shown in Figures 2-4 were analyzed by morphological characteristics. The conclusions can be strengthened by measurement of axial structures with molecular markers.

      To further strengthen our analysis of the impact of Mib1 / Ryk loss of function on the development of axial structures we performed in situ hybridizations with shhb (labeling the notochord) and foxa3 (labeling notochord and prechordal plate) on mib1 morphant, mib1ΔRF123- or mib1ΔRF3-injected embryos as well as MZ ryknce4g mutant embryos. The corresponding pictures and quantifications have been included in Fig.4H,I and the newly added Figure 1-figure supplement1. Importantly these new experiments confirm our observations that Mib1 or Ryk loss of function both result in a decrease of axial elongation and an increase of notochord width that are diagnostic for embryos that display an impairment of PCP-dependent CE movements.

      Ryk has been proposed to be a substrate for Mib and in the manuscript, there is a clear demonstration that Mib overexpression is sufficient to stimulate Ryk-GFP internalization. It is noteworthy that Mib over-expression does not stimulate Vangl internalization. However, the difference of the number of Ryk-GFP endosomes/cell in the mib mutant compared to would-type is less clear. Also, the MO alone is not shown.

      The impact of the Mib MO alone on the number of Ryk-GFP labeled endosomes is now displayed in Figure 3-figure supplement 4A,B in addition to the quantifications already displayed documented in Fig3E.

      Using an early endosomal marker with the Ryk-GFP would be a more effective way to evaluate endocytosis.

      The data displayed in Figure 3-figure supplement 1A show that a majority of RYK-GFP labeled intracellular vesicles are also positive for the early endosomal marker Rab5, providing thereby evidence that the intracellular structures quantified in Fig.3E and J correspond essentially to endosomal compartments.

      In the manuscript, the authors generate a zebrafish ryk mutant. In reference to the functional analysis of the ryknce4g product, the authors would need to perform a Western blot of the injected embryos to determine how efficiently the RNA is made into protein and how stable the protein product is, compared to levels of the expressed wt RNA. This will be important when making conclusions about a lack of rescue and lack of over-expression defects.

      We do not claim that the loss of Ryk function in ryknce4g mutants is solely due to a loss of Ryk protein. Our analysis of ryk gene expression in ryk mutants (through in situ hybridization in Figure 4-figure supplement1D and newly included qPCR analysis in Figure 4-figure supplement1G) indeed suggests that ryknce4g transcripts are subject to nonsense mediated degradation. It is therefore not possible to estimate the stability of the Ryk mutant protein through Western Blot, as the resultant protein levels would be affected not only by premature termination of Ryk protein translation, but also by the degradation of the mutant transcript. Independent of the relative importance of these two processes, our antibody staining data presented in Fig.4C provide evidence that the ryknce4g mutation results in a complete absence of full length protein. Taken together, our data provide collective evidence that our ryk mutants present a complete loss of Ryk activity, as further documented by the observation that ryknce4g loss of function phenotypes cannot be enhanced through ryk morpholino injection (Figure 4-figure supplement1I).

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript presented describes the transcriptome of primitive hematopoietoc stem and progenitor cells harvested from untreated control or mice treated with either PGE2, G-CSF, pIpC or indomethacin. These are some of the drugs commonly used to generate experimental models of stressed hematopoiesis. Having observed some patterns of responses and the transcriptomic level, the authors ask whether these may be driven by specific chromatin accessibility patterns in stem and progenitor cells subset. However, ATAC-seq reveals that this is not the case when directly responsive genes are analyzed, and rather differences can be found in the promoter accessibility of genes further downstream.

      Strengths. The authors analyze large and challenging datasets, where relatively minor differences in transcription and chromatin accessibility patterns are highlighted.

      Weaknesses. The choice of stimuli is somehow arbitrary, and the description of the data presented in the figures is often hard to follow, with some contradictions present and text and figures being ordered differently.

      We thank reviewer #1 for the positive evaluation of our work highlighting our analysis of a large and challenging dataset. In our updated manuscript we have addressed all the weaknesses indicated. We have clarified the rationale for using different stimuli and reorganized our manuscript to more clearly delineate figures discussing HSCs (1 and 2) versus figures with comparative analysis of LSKs vs HSCs (3 and 4).

      Reviewer #2 (Public Review):

      Fast et al here describe responses by hematopoietic stem and progenitor cells to niche signals using scRNseq and ATACseq. The data provide a rich resource to the research community demonstrating a number of distinct cell states, and heterogeneity between cell clusters in their responses to external stimuli. Notable observations are the continuum in cell states among HSCs and LSK cells, and the distinct clusters that are marked by interferon signaling response as opposed to AP-1 family / PGE signaling. This paper is a resource paper that will serve as a starting point for future studies - in depth studies were not undertaken to validate or understand the implications of these findings on disease states or developmental outcomes, although such studies would certainly increase the impact of the work if they were available.

      We are very pleased that Reviewer #2 feels that our data “provide a rich resource to the research community” and that our paper could “serve as a starting point for future studies”. We are fully aware of the limitations of our study especially the absence of functional validation of specific hypotheses generated from our genomics data. Carefully executed HSC reconstitution experiments take between months and years and we opted to not hold back the dataset until we would have completed these experiments. Our goal is to make our data available while it is current as a resource for the broader research community. We hope our dataset provides the opportunity for replication of existing datasets and stimulates new follow up experiments.

      Reviewer #3 (Public Review):

      Understanding the molecular determinants controlling hematopoietic stem cell (HSC) biology is critical for myriad clinically-relevant interventions; however, because HSC are rare, this information is limited. Here the authors exploit their considerable facility with HSC isolation and apply single-cell genomics to provide a profile of both normal HSC transcriptional clusters and HSC relevant perturbations (di-methyl-PGE2 vs. the Cox1/2 inhibitor indomethacin, and G-CSF stimulating mobilization, or the TLR3 ligand poly(I:C)) and identify potential underlying regulatory transcription factors based on in silico analyses. They note that they can understand the perturbations as shifts in cells within the unperturbed clusters (with modest gene expression changes in each cluster). There are some aspects of the work that could be changed to improve impact and to clarify the take-home message.

      The manuscript leaves the reader with the expectation that the work will biologically dissect the normal and perturbed cluster/populations. This is probably because the authors do not sufficiently clarify the biological impact of the manipulations, the depth of the published record on them, and then convey the expected versus observed transcriptional changes based on that prior published record. In addition, the transcriptional changes in each cluster within the heatmaps relegated to supplementary data probably provide the essential information, but they fail to represent the data across all clusters with all differentially expressed genes to demonstrate common or distinct gene expression changes. This would best be consolidated to a heatmap of differentials instead of the current method of clustering the actual expression metric. To be clear, it would significantly improve the work to show all differentially expressed genes in each HSC cluster across all perturbed clusters in a single heatmap. A viewer other than a genome browser session (which is not easily maintained) would be an essential improvement.

      The central claim is that "niche signals regulate continuous transcriptional states in hematopoietic stem cells". As an experimental paradigm, the authors inject mice with different molecules and then purify HSC two hours later to examine changes in gene expression. This experimental paradigm does not represent specific perturbations of niche signaling.

      We thank the reviewer for the critical and constructive feedback of our manuscript. We further value the assessment that ‘understanding the molecular determinants controlling hematopoietic stem cell (HSC) biology is critical for myriad clinically-relevant interventions’ which was one of the driving forces for us to undertake this investigation. We have substantially reorganized our manuscript and added additional analysis responding to the concerns raised by the reviewer.

      Specifically we have compared our stimulant induced gene signatures to prior publications to provide additional context for our results in light of previous findings. As suggested we have compiled a unified heatmap (Figure 1D) showing differentially expressed genes between clusters and treatments, which provided additional insights into the crosstalk between cluster defining and treatment induced genes. We have chosen to only display selected genes as opposed to all differentially expressed genes in the main Figure, to increase readability and allow easy referencing from the main text. We have added a heatmap encompassing a larger number of genes to Figure 1 – figure supplement 2I. In addition we have added visualizations of pairwise comparisons of cluster-defining and stimulant-induced genes (Figure 1H and 3F). Source tables contain the complete set of differentially regulated genes for treatments and clusters.

      We have deliberately chosen not to add another interactive visualization application to this manuscript. Currently our data is hosted externally for interactive exploration on the UCSC Cell Browser website (https://cells.ucsc.edu/) which provides a free resource for scientists to make their single-cell datasets available (378 single-cell datasets - July 2021). In addition, we plan to make all source datasets (such as differential expression analyses, cluster enrichments, cluster-treatment overlaps) available in a tabular format that ensures both persistence into the future as well as easy data accessibility for non-computational biologists.

      We agree that the original terminology of ‘niche stimulants regulating HSC transcriptional states’ was not fully accurate. We have revised this terminology throughout the manuscript to ‘external stimulation’ or equivalent wording. While our pharmacological perturbations certainly have limitations (discussed in detail below and in the discussion) we do believe that our results provide novel findings about HSC response to external stressors and the relationship to baseline transcriptional heterogeneity. Because of the cost and time required of single cell genomics studies we believe that our work serves as an important starting ground for more fine-tuned investigations of the niche-HSC interaction using genetic models.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. The analysis of the data revealed an interesting discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus (IFOF) - the longest associative axon bundle in the human brain.

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They also present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. Overall the paper can serve as a how to example for analyzing large non-human primate brain data, though some parts of the paper can be improved and the interpretation of the data should also be further strengthened.

      We thank the reviewer for his positive evaluation of our manuscript.

      The methodological part should include more detail on image acquisition - speed of imaging, pixel residence time, total time for data acquisition of a single brain and data sizes. Also the time and hardware needed for the computational analysis should be included, including the registration to the common reference and the running time for the machine learning predictions - this should also include the F score for the axon detection.

      We thank the reviewer for pointing out these vital issues. We have added these technical details in the resubmitted manuscript.

      “High x-y resolution (0.95 μm/pixel) serial 2D images were acquired in the coronal plane at a z-interval of 200 μm across the entire macaque brain. The scanning time of a single field-of-view which contains 1024 by 1024 pixels was 1.629 s (i.e., pixel residence time was ~1.6 μs), as resulted in a continuous ~1 month scanning and ~5 TB STP tomography data for a single monkey brain.”

      “The data analysis was undertaken on a compute cluster with a 3.1 - 3.3 GHz 248 core CPU, 2.8 T of RAM, and 17472 CUDA cores.”

      “The total computational time for the machine learning predictions in one macaque brain was ~ 1.5 months.”

      “To evaluate overall classifier performance, the precision–recall F measure, also called F-score, was computed by using additional four labeled images as test sets. Higher accuracy performance achieved by the classifier often yield higher F-scores (94.41% ± 1.99%, mean ± S.E.M.).”

      “For registration to the 3D common space, it took half an hour approximately.”

      The discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus seems puzzling. One would expect that the STP data would reveal more detail not less.. One possibility is that the Tau-GFP does not diffuse throughout the full axon arborization of the PFC neurons, resulting in a technical artifact. Can this be excluded to support the functional significance of the current data?

      We thank the reviewer for raising this important issue. We apologize for not providing sufficient details of the IFOF debate due to limited space and causing confusion. We have added literature background of the IFOF debate to the section of Introduction (also recommended by Reviewer #2). Thanks to the comments by Reviewer #2, the present finding provides direct support for the speculation that the IFOF of macaque monkeys may not exist in a mono-synaptic way.

      The AAV construct encoding cytoskeletal GFP (Tau-GFP) was used here to label all processes of the infected neuron, including axons and synaptic terminals. About 3 weeks of post-surgery survival time are usually sufficient to label intracerebral circuits in rodents (Lanciego and Wouterlood, 2020). We have extended the survival time to 2-3 months in order to achieve adequate labeling of axonal fibers and terminals in macaques.

      Regarding the extent of Tau-GFP diffuse, the STP images and high-resolution confocal microscopic analysis further showed differences in the morphology of axon fibers that populate the route and terminals of these axon fibers. Consistent with previous reports (Fuentes-Santamaria et al., 2009; Watakabe and Hirokawa, 2018), the axon fibers were thin and formed bouton-like varicosities in the terminal regions (MD, Figure 2—figure supplement 7D; caudate, Figure 2—figure supplement 7J; PFC, Figure 1—figure supplement 5A-D). Those results indicate that the Tau-GFP has reached axonal terminals.

      References:

      Fuentes-Santamaria V, Alvarado JC, McHaffie JG, Stein BE (2009) Axon Morphologies and Convergence Patterns of Projections from Different Sensory-Specific Cortices of the Anterior Ectosylvian Sulcus onto Multisensory Neurons in the Cat Superior Colliculus. Cereb Cortex 19:2902-2915.

      Lanciego JL, Wouterlood FG (2020) Neuroanatomical tract-tracing techniques that did go viral. Brain Struct Funct 225:1193-1224.

      Watakabe A, Hirokawa J (2018) Cortical networks of the mouse brain elaborate within the gray matter. Brain Struct Funct 223:3633-3652.

      Reviewer #2 (Public Review):

      The authors utilized viral vectors as neural tracers to delineate the connectivity map of the macaque vlPFC at the axonal level. There are three main goals of this study: 1) determine an effective viral vector for tract-tracing in the macaque brain, 2) delineate the detailed map of excitatory vlPFC projections to the rest of the brain, and 3) compare vlPFC connectivity between tracing and tractography results.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      Accordingly, my comments are organized around each aim:

      1) This study demonstrates the advantage of viral tracing technique in targeting neuron type-specific pathways. The authors conducted injection experiments with three types of viral vectors and found success of AAV in labeling long-distance connections without causing fatal neurotoxicity in the monkey. This success extends the application of AAV from rodents to nonhuman primates. The fact that AAV specifically targets glutamatergic neurons makes it advantageous for mapping excitatory projections.

      Although the labeling efficacy of each viral vector type is described in the text, Fig. 2 does not present a clear comparison across viral vectors, despite such comparison for a thalamic injection in Fig. 2S. Without a comparable graph to Fig. 2E, it is unclear to what extent the VSV and lentivirus failed in labeling long-distance pathways.

      We thank the reviewer for the helpful suggestion. As suggested, we have added three new figures as Supplementary materials in the revised manuscript.

      *Figure 2—figure supplement 2. Expression of GFP using VSV-△G injected into MD thalamus of the macaque brain. (A) GFP-labeled neurons were found in the MD thalamus ~5 days after injection of VSV-△G encoding Tau-GFP. (B) A magnified view illustrating the morphology of GFP-labeled neurons in the area outlined with a white box in (A). (C) Higher magnification view of GFP-positive axons.*

      *Figure 2—figure supplement 3. Expression of GFP using lentivirus injected into MD thalamus of the macaque brain. (A) Lentivirus construct was injected into the macaque thalamus and examined for transgene expression after ~9 months. (B) High power views of the dotted rectangle in panel A. (C) Magnified view of panel B. Note the presence of GFP-positive cells.*

      *Figure 2—figure supplement 4. Expression of GFP using AAV2/9 injected into MD thalamus of the macaque brain. (A) GFP-labeled axons were observed in the subcortical regions ~42 days after injection of AAV2/9 encoding Tau-GFP in MD thalamus. The inset shows the injection site in MD thalamus. Two dashed line boxes enclose the regions of interest: frontal white matter and ALIC, whose GFP signal are magnified in (B) and (C), respectively. (D) Higher magnification view of GFP-positive axons.*

      2) The authors quantified connectivity strength by the GFP signal intensity using a machine-learning algorithm. Both the quantitative approach and the resulting excitatory projection map are important contributions to advancing our knowledge of vlPFC connectivity.

      However, several issues with the analysis lead to concerns about the connectivity result. First, the strength measure is based on axonal patterns in the terminal fields (which the authors refer to as "axon clusters"), detected by a machine-learning algorithm (page 25, lines 11-13). However, the actual synaptic connections are the small dot-looking signals in the background. These "green dots" are boutons on the dendritic trees. The density of boutons rather than the passing fibers reflects the density of synapses. The brief method description does not mention how the boutons are quantified, and it is unclear whether the signal was treated as the background noise and filtered out. Second, it is difficult for the reader to assess the robustness of the vlPFC connectivity patterns, due to these issues: i) It is unclear how many injection cases were used to generate the result reported in the subsection "Brain-wide excitatory projectome of vlPFC in macaques". The text mentions a singular "injection site" (page 8, line 12) and Fig. 4 shows a single site. However, there are three cases listed in Table 1. Is the result an average of all three cases? ii) Relatedly, it is unclear in which anatomical area the injection was placed for each case. Table 1 lists the site as "vlPFC" for all three cases, while the vlPFC contains areas 44, 45 and 12l. These areas have different projection patterns documented in the tract tracing literature. If different areas were injected in the three cases, they should be reported separately. iii) It is hard to compare the projection patterns with those reported in the literature. Conventionally, tract tracing studies report terminal fields by showing original labeling patterns in both cortical and subcortical regions without averaging within divided areas (see e.g. Petrides & Pandya, 2007, J Neurosci). It is hard to compare Fig. 3 with previous tract tracing studies to assess its robustness.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      1). We appreciate the reviewer’s comment and sincerely apologize for not explaining this point clearly in our previous submission. The major concern is whether the axonal varicosities were likely to be treated as the background noise and removed by mistake. In fact, the dot-looking autofluorescence rather than the axonal varicosities were reduced through a machine-learning algorithm in segmentation. Hence we have provided new results and updated the “Materials and Methods” and “Discussion” sections in the revision accordingly.

      “Fluorescent images of primate (Abe et al., 2017) brain often contain high-intensity dot-looking background signal caused by accumulation of lipofuscin. Thanks to the broad emission spectrum of lipofuscin, dot-looking background and GFP-positive axonal varicosities are easily distinguishable from each other. For instance (Figure 1—figure supplement 4), axonal varicosities can be selectively excited in green channel, while dot-looking background lipofuscin usually present in both green channel and red channel. During quantitative analysis, a machine learning algorithm was adopted to reliably segment the GFP labelled axonal fibers including axonal varicosities, and remove the lipofuscin background (Arganda-Carreras et al., 2017; Gehrlach et al., 2020).”

      “One recent study compared results of terminal labelling using Synaptophysin-EGFP-expressing AAV (specifically labelling synaptic endings) with the cytoplasmic EGFP AAV (labelling axon fibers and synaptic endings). There was high correspondence between synaptic EGFP and cytoplasmic EGFP signals in target regions (Oh et al., 2014). Thus, we relied on quantifying GFP-positive pixels (containing signals from both axonal fibers and terminals) rather than the number of synaptic terminals, similarly done in recent reports (Oh et al., 2014; Gehrlach et al., 2020).”

      *Figure 1—figure supplement 4. Difference between axonal varicosities and dot-looking background. STP images (A-D) and high-resolution confocal images (E-H) were acquired in green channel and the red channel. Synaptic terminals (indicated by white arrows) can be specifically excited in green channel, while dot-looking background lipofuscin (indicated by yellow arrows) can be visualized both in green channel and red channel. (C and G) No colocalization was found between axonal varicosities and dot-looking background. Axonal varicosities were easily distinguished from dot-looking background in the merged image. (D and H) The dot-looking autofluorescence rather than the axonal varicosities was reduced through a machine-learning algorithm.*

      References:

      Abe H, Tani T, Mashiko H, Kitamura N, Miyakawa N, Mimura K, Sakai K, Suzuki W, Kurotani T, Mizukami H, Watakabe A, Yamamori T, Ichinohe N (2017) 3D reconstruction of brain section images for creating axonal projection maps in marmosets. J Neurosci Methods 286:102-113.

      Arganda-Carreras I, Kaynig V, Rueden C, Eliceiri KW, Schindelin J, Cardona A, Sebastian Seung H (2017) Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics 33:2424-2426.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      2.1) We apologize for causing these confusions due to insufficient description in the main text. Now we have revised the description of the “Materials and Methods” section accordingly. Furthermore, we have made both the whole-brain serial two-photon data and high-resolution diffusion MRI data freely available to the community, as allows researchers in the field to perform further analyses that we have not done in the current study.

      “Three samples were injected with AAV in vlPFC, and two of them were able to be imaged with STP. Unfortunately, one sample became “loose” and fell off from the agar block after several weeks of imaging. So, the quantitative results were not shown in Figure 3.”

      2.2) We apologize for insufficient description of the precise location of the injection sites. We have revised the description of “Materials and Methods” section and provided a new figure to clarify the exact location of the injection sites.

      “Figure 3-4 and Figure 4—figure supplement 2-4 were derived from sample #8 with infected area in 45, 12l and 44 of vlPFC. Figure 1—figure supplement 6 was derived from sample #7 with infected area in 12l and 45 of vlPFC.”

      *Figure 1—figure supplement 6. Representative fluorescent images showing injection site and major tracts of sample #7. (A) STP image of the injection site in vlPFC are shown overlaid with the monkey brain template (left hand side), mainly spanning areas 12l and 45a. (B) Confocal image of the AAV infected neurons (indicated by white arrows). (C-F) Representative confocal images of major tracts originating from vlPFC.*

      2.3) We agree with the reviewer that most tract tracing studies report terminal fields by showing original labeling patterns. Several recent studies report the total volume of segmented GFP-positive pixels (Oh et al., 2014) or percentage of total labeled axons (Do et al., 2016; Gehrlach et al., 2020) to represent the connectivity strength, and other studies provide the projection density as well (Hunnicutt et al., 2016). We have provided both percentage of total labeled axons (Figure 3C right panel), projection density (Figure 3C left panel) and representative original fluorescent images (Figure. 4, Figure 4—figure supplement 2 and Figure 4—figure supplement 4) to demonstrate our projection data at different dimensions.

      References:

      Do JP, Xu M, Lee SH, Chang WC, Zhang S, Chung S, Yung TJ, Fan JL, Miyamichi K, Luo L, Dan Y (2016) Cell type-specific long-range connections of basal forebrain circuit. Elife 5.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Hunnicutt BJ, Jongbloets BC, Birdsong WT, Gertz KJ, Zhong H, Mao T (2016) A comprehensive excitatory input map of the striatum reveals novel functional organization. Elife 5.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      3) Using the ground-truth from tract tracing to validate tractography results is a timely problem and this study showed promising consistency and discrepancy between the two modalities. Especially, the discrepancy between tracing and tractography data on the IFOF termination brings critical insights into a potential cross-species difference. The finding that IFOF does not reach the occipital cortex provides important support for the speculation that IFOF may not exist in monkeys (for a context of the IFOF debate see Schmahmann & Pandya, 2006, pp 445-446).

      I have minor concerns regarding the statistical robustness of the tracing-tractography comparison. The authors compared the vlPFC-CC-contralateral tract instead of a global connectivity pattern without justification. Why omitting other major tracts that connect with vlPFC? In addition, the results are shown for only one monkey, while two monkeys went through both tracer injection and dMRI scans. It is unclear how the results were chosen or whether the data were averaged.

      We apologize for not describing it clearly. The STP images were acquired in the coronal plane with high x-y resolution (0.95 μm/pixel), while the z resolution was relatively low (200 μm). The axonal connection information along z axis may be lost due to the present step size (relatively large) such that it is technically demanding to reconstruct the axonal density maps in sagittal or horizontal plane. Therefore, we focused on the vlPFC-CC-contralateral tract traveling along the coronal plane when quantifying the similarity coefficients along the anterior-posterior axis of the whole macaque brain, and omitted the tracts that were shown as dots in the coronal plane. We have revised it in the resubmitted manuscript.

      “GFP projection and probabilistic tract were plotted with the Dice coefficients and Pearson coefficients (R) along the anterior-posterior axis of the whole macaque brain. The Dice coefficients and Pearson coefficients were higher in dense projection regions, especially for the vlPFC-CC-contralateral tract (Figure 6A). To carry out a proof-of-principle investigation, we focused on the vlPFC-CC-contralateral tract that was reconstructed in 3D space by using STP and dMRI data, respectively.”

      With regard to the demonstration of dMRI data, we apologize for not making it clear in previous version. We have already revised Figure 6 and Figure 7 so that dMRI scans from different macaque monkeys were shown separately.

      *Figure 6. Comparison of vlPFC connectivity profiles by STP tomography and diffusion tractography. (A) Percentage of projection, Probabilistic tracts, Dice coefficients and Pearson coefficients (R) were plotted along the anterior-posterior axis in the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. (B, C) 3D visualization of the fiber tracts issued from the injection site in vlPFC to corpus callosum to the contralateral vlPFC by STP tomography and diffusion tractography. (D-F) Representative coronal slices of the diffusion tractography map and the axonal density map along the vlPFC-CC-contralateral tract, overlaid with the corresponding anatomical MR images. (G-J) GFP-labeled axon images as marked in Figure 6F were shown with magnified views. (H, J) correspond to high magnification images of the white boxes indicated in G and I, both of which presented a great deal of details about axonal morphology.*

      *Figure 7. Illustration of the inferior fronto-occipital fasciculus by diffusion tractography and STP. (A) The fiber tractography of IFOF (lateral view). Two inclusion ROIs at the external capsule (pink) and the anterior border of the occipital lobe (purple) were used and shown on the coronal plane. The IFOF stems from the frontal lobe, travels along the lateral border of the caudate nucleus and external/extreme capsule, forms a bowtie-like pattern and anchors into the occipital lobe. (B) The reconstructed traveling course of IFOF based on vlPFC projectome was shown in 3D space. (C) The Szymkiewicz-Simpson overlap coefficients between 2D coronal brain slices of the dMRI-derived IFOF tract and vlPFC projections were plotted along the anterior-posterior axis of the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. Four cross-sectional slices (D-G) along the IFOF tracts were arbitrarily chosen to demonstrate the spatial correspondence between the diffusion tractography and axonal tracing of STP images. (D-G) The detected GFP signals (green) of vlPFC projectome and the IFOF tracts (red) obtained by diffusion tractography were overlaid on anatomical MRI images, with a magnified view of the box area. Evidently there was no fluorescent signal detected in the superior temporal area where the dMRI-derived IFOF tract passes through (G).*

    1. Author Response:

      Reviewer #1:

      This manuscript by Silver, et al., details work investigating the relationship between season of conception and DNA methylation differences at sites across the genome, measured by widely-used arrays, in two cohorts of children using Fourier regression. They find that season of conception is associated with persistent methylation differences at several hundred CpG sites, and that these CpG are enriched for properties, compared to sets of control sites, that suggest that methylation at these sites is influenced very early in development/during conception and that these sites are positioned in genomic regions relevant for gene activation and regulation. Additional analyses investigated the effects of genetic variation of these sites, and found no evidence for single nucleotide polymorphisms nor child sex confounding the associations between season of conception and DNA methylation. As the number of sites measures by these arrays are a very small amount of total sites across the genome, the authors suggest that these findings indicate there may be many more sensitive methylation 'hotspots' in the genome that are not captured by these arrays but could impact on health/development.

      The key strengths of this manuscript include the use of two cohorts of children at different ages, providing evidence that these effects of season of conception appear to attenuate by 8-9 years of age; and comparison with control sites and additional analyses investigating confounding to build the evidence for these relationships reflecting true, biological associations rather than statistical artefacts or the result of confounding.

      However, the conclusions around the potential functional importance of these methylation differences are limited by a lack of evidence for a relationship between methylation of these season-of-conception-associated sites and child growth/development, so while this manuscript builds compelling evidence for the effects of season of conception on methylation, it's functional relevance is unclear. Additionally, there are some choices made in the analyses where the rationale for those choices should be made more clear, such as the use of CpG sites above or below a certain estimated effect size for different analyses.

      Overall, the approach taken here to demonstrate different levels of evidence for true relationships between early development exposures and differences in DNA methylation is a compelling one, and the manuscript delivers clear evidence for its primary conclusions.

      We are currently researching links between several SoC-CpGs and health-related outcomes including measures of growth, and we have prepared/submitted other papers with different groups of authors (e.g. the EMPHASIS team) relating to other phenotypes. We consider a detailed analysis of links between SoC-CpGs and diverse outcome measures in Gambian children to be beyond the scope of the current study and would argue that such an analysis would dilute the central focus of this paper that is already long and complex. We do already refer to two existing studies linking Gambian SoC or nutrition-associated CpGs to health outcomes in non-Gambians (child & adult obesity/POMC, Kuhnen et al Cell Metab 2016; cancer/VTRNA2-1, Silver et al, Gen Biol 2015) in the current manuscript. The VTRNA2-1 locus does not overlap any SoC-CpGs and we already speculate that this may be due to SoC effect attenuation, since the previous association was observed in younger (3-9mth) infants. We have additionally referenced a recently published paper linking another SoC-associated locus to thyroid volume and function in Gambian children (Candler et al Sci Adv 2021) and highlighted that neither this nor the POMC locus overlap the array background analysed in this study. Finally we had already included an analysis of overlaps between SoC-CpGs and traits in published EWAS and GWAS catalogues.

      Regarding our use of different SoC amplitude thresholds for one analysis, our original motivation for analysing all 768 ‘SoC-associated CpGs’ with FDR<5% in the ENID 2yr analysis, including those with amplitude < 4%, was to explore the degree to which the strength / amplitude of SoC effects could be explained by proximity to ERV1 over the wider range of amplitudes represented by the larger set of loci. However we agree that this approach is open to question and have removed this analysis (previous Fig. 6B and Supp. Fig. 11, and text in section headed ‘Enrichment of transposable elements and transcription factors associated with genomic imprinting’). We have also removed the definition of ‘SoC- associated CpGs’ (which included CpGs with SoC amplitude < 4%) from Table 2 and Methods to aid clarity and avoid confusion.

      Reviewer #2:

      This is a very interesting manuscript, which will be of interest for a broader readership. The authors have analysed an unique cohort, which is of importance to understand the impact of environmental factors on DNA methylation.

      The performed analysis is well balanced, and the conclusions are justified by the presented data. It is a strength of this study, that results from the initial ENID study have been re-evaluated in the EMPHASIS study. Unfortunately, DNA methylation has been analysed using HM450 and EPIC arrays. Both methods are providing only a limited view on methylome-wide DNA methylation.

      Another limitation (as already addressed by the authors) is the lack of longitudinal samples. This would potentially have helped to gain further knowledge about the identified attenuation of DNA methylation levels at SoC associated CpGs.

      Finally, I am not entirely sure, that one confounding factor has been completely ruled out: It is known, that blood composition may cause methylation variability. In general, the authors addressed this point and analysed blood compositions (supplementary Figure 16) of both cohorts. Here, no marked seasonal differences between and within both cohorts have been identified. However, the participants of the EMPHASIS cohort have a very similar age (8-9 years). For this reason, I am wondering if methylation variability/ differences and in addition the attenuation of methylation levels might be influenced by the younger age of ENID participants compared to EMPHASIS study individuals.

      We agree that the necessary restriction of our analysis to data derived from Illumina 450k and EPIC arrays means that we can only obtain a limited view of DNAm loci associated with Gambian season of conception. We expect that there will be many more such hotspots across the human methylome. We have commented on this in the Discussion.

      Regarding the lack of longitudinal data to confirm the potential attenuation of SoC effects with age observed between unrelated cohorts, we are pleased to report that we have now acquired an additional EPIC array dataset covering a subset of n=138 individuals from the ENID cohort included in the main analysis. This subset had methylation measured in blood at age 5-7yrs enabling us to conduct an investigation of longitudinal methylation changes in these individuals. This analysis strongly supports the circumstantial evidence of SoC effect attenuation with age suggested by our previous comparison of the independent ENID (2yr) and EMPHASIS (7-9yr) cohorts, with:

      a) strong correlation of conception date methylation maximum between age 2yr and 5- 7yrs at SoC-CpGs in these 138 individuals (Figs. 3A, 4A); and

      b) evidence of SoC effect size attenuation at the majority of SoC-CpGs (Fig. 3B; Wilcoxon signed rank sum p=10-12).

      We note that this additional longitudinal dataset has a different confounding structure with respect to biological and technical covariates (Supp Tables 15-17) and date of sample collection (Supp. Fig. 1B), lending strong support to our previous two-cohort cross-sectional analysis.

      Regarding the potential for confounding by differences in blood cell composition, we have performed an additional sensitivity analysis with Houseman estimated blood cell counts added directly to the linear regression model for the ENID cohort (see ST1s). 518 out of the 520 estimated Fourier regression coefficients from the main analysis (1 pair of sine and cosine terms for each of the 259 SoC-CpGs) fall within the 95% confidence interval obtained in the Houseman-adjusted analysis, confirming that cell composition effects did not unduly influence SoC effect estimates in the original analysis. We have added a brief note on this and the other sensitivity analyses (batch, cell composition and village effects) in Results to the manuscript, with more details in Methods.

      If the reviewer is referring to the possibility that the SoC effect attenuation with age could be driven by different cell composition effects in the older cohort, we think that the replication of the timing of SoC effects across the 3 datasets analysed (including the additional longitudinal data; Fig. 4A), all of which have different confounding structures with respect to season of sample collection (Fig. 2A; Supp Fig. 1B), together with additional evidence of SoC effect attenuation with age in the longitudinal analysis (Fig. 3B) support this being a genuine age attenuation effect.

      Reviewer #3:

      Silver et al. Investigate the influence of seasonal variation (nutrition, infection, environment) on blood DNA methylation in two cohorts of children (233 [2y] and 289 [8y-9y]) from the same sustenance farming communities in rural Gambia. One cohort (450K,233) was extensively studied before in multiple publications, the second dataset (850k,289) is unpublished. Using cosinor modeling they find 768 CpGs with a significant seasonal pattern(SoC-CpG, FDR<0.05) in the probes that overlap between the 450k and 850k arrays. Look-up of these 768 SoC-CpGs in the second sample showed 61 SoC-CpGs with FDR 0.05 (no mention is made if the direction of effect is consistent, but we assume it is so).

      In fact we did report that the ‘direction’ of the effect (conception date at methylation maximum) is highly consistent with increased DNAm in conceptions at the peak of the rainy season across the two cohorts at the 61 SoC-CpGs with FDR<0.05 – see Fig. 2C.

      The authors notice that most SoCs seem to be attenuated in the 8-9y sample. Then the authors select out of the 768 SoC-CpG the FDR<0.05 and >=4% seasonal amplitude in this discovery sample: 257 which they bring further in (enrichment) analyses. It is unclear if all 257 are (nominally) significant in the replication sample.

      We did not check this because of evidence that, despite strong replication of effect direction (Fig. 4A), the amplitude of the SoC effect attenuates with age (Fig. 2E). This means that it would not be surprising if one or more SoC-CpGs failed to achieve nominal significance in the older cohort. This is now strongly supported by our additional analysis of longitudinal data confirming SoC effect attenuation with age and consistency of SoC effect direction (Figs. 3B and 4A).

      These SoC-CpGs are enriched for imprinted and oocyte germline loci. Roughly 10% of SoC-CpGs overlap with so-called meta-stable epialleles (MEs), on which the authors have published greatly. This is a large fold enrichment, and subsequently the main focus of the Results and Discussion. Indeed, it skews the Discussion heavily and one wonders what could have been found in the other 90%?

      Our strategy throughout the Results and Discussion was to focus on characteristics including metastability, parent of origin-specific methylation, histone modifications and gametic and early embryo methylation patterns that suggest a link to establishment of methylation states in the early embryo at SoC-CpGs. For these analyses all SoC-CpGs were considered at every stage and metastability was not the primary focus. However, as the reviewer suggests, we do repeatedly point out that many of the above contextual characteristics that are associated with SoC-CpGs have also been associated with metastability which we consider to be worthy of note, in part because it suggests that many SoC-CpGs may in fact be MEs, despite not having been previously identified as such. We have further cause to believe this could be the case because of i) the typically small sample size of multi-germ layer/tissue datasets used to screen for MEs, meaning that published screens for human MEs are likely to be underpowered and will hence fail to capture most MEs; and ii) the evidence that we present suggesting that environmentally-driven inter-individual variation at loci exhibiting ME-like properties may diminish with age, again suggesting that ME screens, which largely analyse adult tissues, will miss metastable loci present in infancy and early childhood.

      We had already made the point ii) above in the Discussion. However, given the reviewer’s concerns we have added an additional comment on point i).

      The Discussion is heavily geared to interpretation within their MEs focus and does little to discuss study weaknesses and strengths, to which the tail of the Results suggest there are multiple. For at the end of the Results and in the Methods we find additional sensitivity analyses and discussion points on a very strong enrichment for CpGs with a mean difference in methylation between the sexes (>1/3 of the 257), adjustments for genetic confounding and a high inflation factor in the discovery cohort.

      We have added an additional comment on the need for further functional analysis in cell and/or animal models at the end of our discussion on possible mechanisms underpinning the observed strong enrichment for sex effects at loci associated with periconceptional environment. We have performed an additional analysis of SoC effects on global methylation using predicted LINE1 and Alu element methylation to address the issue of genomic inflation in the discovery cohort (Methods ‘Inflation of test statistics’ and additional Supp. Fig. 14). We have commented on the potential for residual genetic confounding and the limitation of a lack of genetic data in the discovery cohort in the Discussion. We have also provided an additional comment on the potential influence of unmeasured inter-relatedness in our study population.

      Indeed, despite the strong and good flow of the Result section and the impressive (albeit somewhat one-side) look-up of SoC-CpGs in published datasets; the tail and Methods section leaves this reader with a strong suspicion of possible methodological issues on the measurement level already identified prior.

      The authors reports that the discovery cohort is biased in the collection of conception months (figure 2A), has a strong inflation of 1.3 (no QQ-plot is shown to assess bias in addition to inflation), no adjustment for genetic background could be made (which is false, as the 450k array contains several dedicated SNP probes, even hundreds when extracted with the omicsPrint package) and > 1/3 of SoC-CpGs is a sex CpG. For the latter observation the authors regressed out sex and repeated the analysis, noting no difference. However, regressing out sex does not help if sex is heavily correlated with confounding biological/sampling/technical covariates.

      The authors reason that the inflation is nothing to worry about citing single cohort studies on global effects on DNAm of methyl donors. Global DNAm is indeed often association with methyl donor intake but generally these studies investigate ALU or SINES repetitive elements and the PACE consortium reported only modest effects on select 450K array loci for prenatal folate supplementation, showing that their reasoning might hold on the ME loci (in/close to repetitive elements) but not the genome-wide analysis per se.

      The authors should convince the reader that their (discovery) data is valid. The data they do show in Supplemental tables 16 and 17 show that after functional normalization a strong effect of batches remains, while from my own experience these are normally nicely mitigated via functional normalization. Normally only strong cell type correlations remain in the first PCAs of the normalized data. But for ENID we see a remainder of sentrix row, often the strongest batch effect, and slide and plate remaining. Also, the biological, season and cohort specific variables are not noted here. We just must assume that the blank correction for the first 6 PCAs, rather than the actual adjustment for the measured batch/confounding effects, does not remove (or over adjusts) for biological/study design (village, genetic ancestry) effects. In addition to these observations figure 2C seems to indicate that the controls CpGs (elegantly selected by the authors) also show seasonal variation, just not as much as the SoC-CpGs. This leaves the reader to wonder: is there bias in their sample randomization across plates, rows and slides? This feeling is amplified by the fact that almost all SoC-CpGs seem to show an increase in DNAm in jul-aug (Suppl Fig. S5 and Figure 1B). [An observation that is not given enough prominence in the Results]. Which might or might not hint to a correlation with a batch effect (like sentrix row?).

      Our addition of a third longitudinal dataset with a very different confounding structure provides strong reassurance of the robustness of the reported SoC effects. However we recognise many of the concerns raised by the review and have therefore substantially extended our analysis of potential confounders in our analysis, including additional sensitivity analyses (see Supplementary Tables ST1p-1s).

      In our extended analysis of possible confounding of technical and biological covariates by SoC, we note that the majority of batch and biological covariates are categorical so that it was not possible to report correlation rho’s. We have instead reported p-values for corresponding association tests – see Supplementary Tables for further details of tests that were carried out. Also note that for simplicity season of conception is modelled as a binary variable (Dry: Jan-Jun; Rainy: July-Dec). We consider this to be a valid approximation to the main cosinor (Fourier) regression analysis since this showed a clear relationship between DNAm and dichotomised (Dry/Rainy) season of conception (Figs 2D & 4A). Note that we have not included month of collection as this completely confounds season of conception in the main ENID (2yr) analysis and cannot confound the EMPHASIS (7-9yr) analysis, as discussed in the manuscript (Fig. 2A). This is a key reason why we compared SoC effects across these two cohorts. Note that the month of collection also cannot confound the ENID 5-7yr (longitudinal) analysis as all samples are collected in the rainy season (additional Supp. Fig. 1B).

      The covariate correlation analysis confirms:

      • No correlation between SoC and all considered batch and biological covariates including principal components across all three analysed datasets (Supp Table, ST1p- 1r).

      • No correlation between sex and all considered batch and biological covariates; weak correlations with PC4 and PC3 in EMPHASIS and ENID 5-7yr datasets respectively (ST1q,1r); note also that the sex sensitivity analysis previously reported in the manuscript used methylation values that were pre-adjusted for sex using a regression model that included sex as the only adjustment covariate, alleviating concerns that there may be residual confounding due to strong correlations between technical/biological/sampling covariates and sex. We have added some additional comments on this to Results.

      • Expected strong correlations between SoC, month of conception and month of birth in all datasets (ST1p-1r).

      • Functional Normalisation (FN) removed most but not all of the effects of technical batch effects (sample plate, slide etc) from the DNAm array data used in the main ENID analysis (ST1p).

      • Samples are not perfectly randomised across 450k sample plate (month of birth [mob] and conception [moc]) and slide (mob and village) for the ENID 2yr cohort (ST1p).

      The last point raises the possibility of potential residual confounding due to array batch effects in the ENID analysis. We checked for this in two ways. First, we performed sensitivity analyses with batch and village ID variables included directly in the linear regression models, in addition to the PCs that served as proxies for batch variables in our original analysis. This suggested no residual confounding due to array batch or village ID effects (ST1s: ‘batch adjusted model’ and ‘village adjusted model’). Second, we confirmed that neither mob, moc nor village ID were associated with batch or any other covariates in the EMPHASIS or new ENID 5-7yr analyses (ST1q, ST1r). The tight correspondence of date of methylation maximum across all three datasets (cross-cohort and longitudinal analyses) (Figs. 2C, 3A and 4A) with different confounding structures (ST1p-1r) strongly suggests that the reported SoC associations are not driven by residual confounding.

      In summary, this analysis provides strong reassurance that our main analysis is not confounded by residual associations with technical and/or biological covariates considered in this analysis, and that the observed enrichment for previously identified sex-associations amongst SoC-CpGs is not driven by residual confounding due to sex.

      We have made multiple amendments to the manuscript to incorporate the longitudinal analysis; in the Introduction (lines 58-9); in the first section of Results; and we have made particular reference to the alignment of SoC effects across 3 datasets with different confounding structures. We have also amended several figure captions to distinguish the ENID 2yr and 5-7yr datasets and added the longitudinal dataset to Methods and to the study design schematic (revised Fig. 1), and visualised key results from this additional analysis in Figs. 3 and 4A. Finally we have added additional text on the sensitivity analyses in the main text and in Methods.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors used microelectrode recordings in patients with drug-resistant epilepsy, automatically detected interictal and ictal epleptiform discharges, and measured the directions of travel of both. They found that most interictal discharges are traveling waves with two opposite-facing predominant directions of travel. Furthermore, they found that the direction of travel for interictal traveling waves was similar to that for ictal discharges. They conclude that studying interictal discharge propagation can reveal information about seizure propagation. This is an elegant approach to answering the important question of whether the spatiotemporal propagation of IEDs can elucidate seizure propagation.

      The strengths of this paper are that it addresses an important question and uses elegant quantitative techniques to try to answer it in human subjects. It is relevant to epilepsy clinical care as well as our understanding of how information spreads in the brain.

      The authors' aims are largely met, but there are some questions about the methods and results that would be important to address to be sure that their conclusions are supported. These are:

      1. To be sure to demonstrate validation of their discharge detection methods. It would be important to report on positive predictive values for a random subset of detections, particularly on a testing set of data not used for training?

      This is an important validation step and we apologize for its omission. We have had two clinician co-authors (JDR and CAS) validate IED detections on a dataset of 78 random IED detections, including those that were rejected in our denoising steps (n = 9), and now report the positive predictive value and inter-rater reliability directly in the results on page 4, line 16.

      1. The authors write in the abstract and discussion that interictal discharges (IEDs) traverse the same path as ictal discharges (SDs), but the angle between the SDs and IEDs was 24 degrees, and the IEDs weren't exactly opposite the ictal wavefront (150 degrees). This raises questions as to whether these two classes of abnormal activity really follow the same path. It would be important to account for the sizable angle between their observed paths. In some ways this seems to refute more than support the main conclusions.

      We have taken steps to show that the angle discrepancies to which the reviewer refers are statistically insignificant and can be explained by expected biological variability. As discussed in response to each point below, we have added permutation tests showing that IED and SD distributions are significantly more similar than would be expected by chance, and that while we did not observe perfect antipodality in all distributions in all subjects, the amount and extent of antipodality we observed is highly improbable and supports the claims of the paper. Furthermore, that these empirical results in human participants were predicted by the computational model reported in (Liou et al., 2020) lends further contextual support for the results.

      1. The influence of sampling error and method of recording are important to discuss, and how they might alter the brain and its conduction of abnormal activity. It would be important to be clear about what effects if any these factors have on the conclusions and information recorded.

      These are excellent topics of discussion related to the previous point and we have discussed them directly in the 4th paragraph of the discussion section, on page 11, line 10.

      1. It would be important to explain how seizures were defined and the rationale for this definition. This is an elusive topic and one in great debate, so this would be helpful to understand your thinking and also to assess the paper's relevance to clinical epileptic events.

      We detected seizures from the times listed in the EMU reports and examined the corresponding time periods in the microelectrode data. Beyond this utilization of the gold standard (i.e. an attending neurologist’s definition) we didn’t formally operationally define what a seizure was, but anecdotally looked for a series of LFP discharges with the typical pattern of inter-discharge intervals, as in previous studies (Schevon et al., 2012; Smith et al., 2020, 2016). Once those discharges were discovered, we looked at whether multiunit activity phase locked to the LFP discharging in order to determine whether neurons recorded by the microelectrode array were ‘recruited’ into the seizure core, or remained in the ‘penumbra’ as described in the ‘seizure characterization’ section of the methods, starting on page 16, to which we have added text clarifying these points starting on page 15, line 6.

      Overall this is a very interesting study on an important topic, and one that is relevant to both basic science research and the clinical evaluation and care of patients with epilepsy.

      We thank the reviewer and agree with this assessment.

    1. Author Response

      Reviewer #3 (Public Review):

      This paper by Goodman et al. is the latest in a series focusing on the structural determinants of clustered protocadherin (cPcdh) isoform cis- and trans-interactions. The goal of this particular paper is to garner further details in support of the "isoform-mismatch chain-termination model" of cPcdh interaction, which was developed by the group in 2015. The model is based on their landmark initial crystallographic structural analysis of particular cPcdh ectodomains, as well as on earlier work from other groups showing that (at least) some cPcdh proteins interact homophilically in trans but promiscuously in cis. The model predicts that cis-dimers of various cPcdh isoforms form via the 5th and 6th extracellular cadherin repeats (EC5/6), and that these dimers then interact in trans strictly heterophilically via EC1-4 to form "dimers of dimers" as an initial event. If cPcdh repertoire between two cells primarily matches, then a linear "zipper" of such dimers will expand, increasing interaction and presumably associated intracellular signaling. Mismatching isoforms expressed in one cell but not the other will terminate this zipper chain, and thus cPcdh repertoire matching between cells will determine self/non-self recognition. Other groups have shown that homophilic matching between neurons is-depending upon the neuronal subtype-important for driving neurite self-avoidance or growth and branching of dendritic arbors, so the mechanisms of interaction will be important to understanding events in neural development.

      The present paper builds on others by the group (e.g., Rubinstein et al., 2015, Goodman et al., 2016, 2017, Brasch et al., 2019), and primarily extends these results to more isoforms, providing also more molecular detail. There are three main findings. First, the concept that cPcdh trans-interactions are strictly homophilic is supported by many new analyses using surface plasmon resonance (SPR) assays in which an ectodomain of one isoform is coupled to a chip and those of identical vs. distinct isoforms are flowed over it to measure interactions. The data are rigorous and nicely presented and demonstrate-unsurprisingly given many prior demonstrations-that trans interactions mediated by EC1-4 are strictly homophilic. A main advance here is in the methodology, which can quantitatively and directly measure such interactions, in contrast to the qualitative cell aggregation studies that were already published. The authors also present an informative mutagenesis series identifying 5 interfacial residues that, when mutated individually or in concert to match a different highly similar intra-family isoform quantitatively shift trans-interaction from homo- to heterophilic.

      The second main finding is the presentation of a new antiparallel trans-dimer structure of the gC4 EC1-4 interaction. While structures of other gamma Pcdhs have been published by the group before, the addition of the C4 structure is important for several reasons: 1) this isoform is the only one of all the cPcdhs that is essential for postnatal viability and normal neuronal survival in mice; 2) this isoform is the only one of the gamma Pcdh family that does not make it to the plasma membrane without dimerizing with a "carrier" cPcdh of some kind, which had cast doubt on whether it would interact in the same way as other cPcdhs; 3) A recent publication (not cited by the authors yet as it came out coincident with their submission) demonstrated that truncating or structure-disrupting mutations in the human PCDHGC4 gene result in significant neurodevelopmental disorders. The authors show that the structure of the C4 trans-dimer is similar to that found for other cPcdh isoforms, though the interaction is weaker than observed for others. They suggest that particular residues in the EC1:EC4 and EC2:EC3 trans interface may be responsible for this, though they do not follow up with mutation experiments to confirm. Doing so (mutating the identified C4 residues to those of, say gB2 or a delta2 Pcdh) would contribute to the novelty of the paper, as it is unclear as of yet how strength of cPcdh interactions might be regulated or manipulated.

      We thank the reviewer for drawing our attention to the missing citation for the recent paper on PCDHGC4 variants implicated in neurodevelopmental disorders (Iqbal et al. 2021), which we have now added. We have performed the requested experiments and, as now discussed in the manuscript, they validate the role of E78 but not D290 in significantly weakening the dimerization of γC4.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have previously developed a powerful time-lapse recording protocol that allows them to observe in real time the formation and degradation of phagosomes in specific phagocytic cells (C1-C3) in the developing C. elegans embryo. Using this protocol in combination with specific reporters, they find that LC3-positive vesicles fuse with phagosomes and that these vesicles are doublemembrane vesicles. Taking advantage of different genetic requirements for the formation of LAPs and autophagosomes, they furthermore provide evidence that these LC3-positive vesicles are autophagosomes. Having established that autophagosomes fuse with phagosomes, the authors demonstrate that preventing this fusion genetically (by blocking autophagosome biogenesis) results in a general engulfment defect and that this is due to a defect in the degradation of phagosomal content. The authors identify RAB-7 and the HOPS complex as necessary for autophagosome-phagosome fusion and the CED-1, CED-6, DYN-1 pathway as necessary for recruiting autophagosomes to phagosomes. Finally, the authors find that preventing autophagosome-phagosome fusion does not affect lysosomephagosome fusion thereby ruling out that the effects observed are indirect and a consequence of defects in lysosome-phagosome fusion rather than autophagosomephagosome fusion.

      This is a very rigorous and very convincing study that will have a big impact on our understanding of cell corpse engulfment and degradation. It also uncovers a novel function of autophagosomes (i.e. in cell corpse degradation). Its strength lies in the combination of time-lapse observation of specific organelles in specific cells in vivo and the use of mutations in genes that affect specific cell biological processes. There are no major weaknesses but it would have been nice to have evidence for autophagosome-phagosome fusion for example through EM images.

      We appreciate your praise of our work. We have performed the EM analysis and observed the attachment of double-membrane vesicles to the surfaces of phagosomes.

      Reviewer #2 (Public Review):

      This is an interesting manuscript from Zhou and colleagues which make several important steps forward in understanding how cell corpses are digested by phagocytes. First, they show that autophagosomes associate with nascent/growing phagosomes. Interesting, they find that many atg genes required to make autophagosomes are essential for efficient corpse clearance. These include homologs of atg13 and atg14, which in mammals are not required for the production of LAPs. The argument from the authors is that these non-LAP vesicles are double-membrane and fuse with phagosomes to contribute contents important for efficient corpse clearance. If correct, this would be a new way in which autophagosomes contribute to phagocytosis. This is a reasonable interpretation, but not the only interpretation of the data.

      Novel points include the recruitment of autophagosomes (non-LAPs presumably based on genetics) to phagosomes and the requirement for many atg genes in corpse elimination. That Ced-1 and components of this pathway drive recruitment of these vesicles is interesting. The genetics are very strong, convincing, and well done. It seems clear that these many ATG genes are playing a role in efficient disposal of corpses in phagosomes, but I do think it remains unclear how. It is not clear to me that a double-membrane autophagosome actually fuses with the phagosome. Possibly they are fusing to aid in degradation, but another possibility is that they are interacting in a way that contributes lipids for phagosome growth/expansion. What if this is a mechanism that allows phagosomes to grow their lipid membranes, rather than fusion to digest what's in the autophagosome? Atg2/9 drive the transfer of lipids from the ER to autophagosomes, that's how autophagosomes grow. It is possible that autophagosomes are intermediates that serve as lipid sources for phagosomes. Is there an autophagosome target (inside) that one could track to show actual degradation? It would be useful to discriminate these possibilities.

      We agree with Reviewer 2 that there are multiple possible mechanisms to support the functions of autophagosomes in facilitating phagosome maturation. The “phagosome growth/expansion” hypothesis is conceivable. One particular function of the increased phagosome membrane amount might be to support the extension of the transient phagosomal tubules that aids in the recruitment of lysosomal particles to phagosomes. We added it to the revised “Discussion”.

      Regarding whether there is an autophagosome target (inside) that one could track to show actual degradation, many cellular components are known to become cargos for autophagosomes. These include protein aggregates, RNA-protein (RNP) complexes, intracellular organelles, ribosomes, and lipid droplets. However, the cargos of individual autophagosomes differ a lot depending on whether the autophagy to be studied is triggered by stress, what kind of stress, and the condition of the cells. Therefore, it is assumed that the cargos of autophagosomes are not uniform. It is thus hard to predict whether and what substance(s) provided by the cargo would contribute to phagosome degradation once autophagosomes fuse to phagosomes. In C. elegans embryos, P granules are degraded inside autophagosomes. We considered tracking P granule components PGL-1 and/or PGL-3 as cargos of autophagosomes [6]. However, P granules are produced in the P4 lineage, not in the embryonic hypodermal cells that act as engulfing cells [6], and thus are not suitable candidate cargos. One candidate protein closely related to autophagosomal cargos is p62, a conserved cargo receptor for autophagosomes, which is encoded by the sqst-1 gene in C. elegans [6]. In the future investigation, we will track p62-labeled puncta and the relationship of these puncta with phagosomes.

      Based on the knowledge that in mammalian cells, among genes required for the biogenesis of autophagosomes, atg13, atg14, and ulk1 are not needed for the generation of the LAP vesicles, we predict that if the LGG+ puncta we are following in C. elegans embryos are LAP vesicles, the atg-13 or epg-8 (atg14) mutations are not supposed to affect the biogenesis of the LGG+ puncta. We observed that in atg-13 and epg-8 mutants, the existence and incorporation of the LGG+ puncta are severely diminished, indicating that these puncta are not LAP vesicles. This line of evidence, together with other evidence, allow us to propose that the LGG+ vesicles that fuse to phagosomes are canonical autophagosomes, not LAP vesicle.

      Reviewer #3 (Public Review):

      The manuscript by Peña-Ramos et al. describes a new role of autophagosomes in the maturation of phagosomes containing apoptotic corpses. The authors find that vesicles containing LGG-1 and or LGG-2 bind to and fuse with phagosomes and provide evidence that these structures are, ostensibly, double-membraned autophagosomes. They then proceeded to assess the role of such fusion events in the rate and extent of phagosome maturation, using mutants lacking components required for autophagosome formation. In addition, they provide evidence that fusion involves Rab-7 and the HOPS complex and is influenced by the presence of the phagocytic receptor CED-1. Lastly, they document that lysosome fusion with the phagosomes persists in the absence of autophagosomes.

      The findings are novel, generally clear and convincing. On the other hand, some of the interpretations are not unambiguous and, importantly, the ultimate mechanism underlying the defective maturation is not resolved or even addressed.

      By the authors' own admission, at least a fraction of the autophagosomes have acquired Rab-7 and likely fused with lysosomes. In that event, the fusing structures are autolysosomes and not necessarily double-membraned autophagosomes, as claimed. Delivery into the phagosome of the contents of the digested inner bilayer of the original autophagosome is likely to have occurred. If so, appearance of labeled contents inside the phagosome is being misinterpreted (or at least overinterpreted) to mean that the fusing structures are double-membraned, bona fide autophagosomes. The resolution of the images seems insufficient to distinguish these two possibilities. If autolysosomes are in fact the organelles fusing, how is this different from fusion of regular lysosomes? The conceptual novelty of the paper would be greatly diminished if what is being reported is homotypic fusion of two maturing (auto)phagolysosomes or fusion of a lysosomal organelle with phagosomes. Are the LGG-1/2-positive structures acidic and do they contain NUC-1?

      We appreciate Reviewer 3’s comments and questions pointing out the important question of whether it is autophagosomes or autolysosomes that fuse to phagosomes. To address the series of questions raised by Reviewer 3, we first need to clarify that the LGG+ RAB-7+ positive puncta are not necessarily autolysosomes. Recent studies have shown that autophagosomes directly acquire the Rab7 small GTPase to their surfaces. An autophagosome that is labeled with Rab7 should not be assumed to be an autophagolysosome (autolysosome). Below are the reports and our own evidence that demonstrate this point:

      Rab7 is independently recruited to the surfaces of each of the following organelles from the cytoplasm: late endosomes, lysosomes, phagosomes, and autophagosomes. Specific to autophagosomes, Gao et al (2018) [12] have shown that yeast Atg8 (LC3) recruits Rab7/Ypt7 to pre-autophagosomal structure through directly binding to Mon1-Ccz complex, the Guanine nucleotide exchange factor (GEF) for Rab7. Furthermore, the recruitment of Rab7 to the autophagosomes is necessary for the autophagosome-lysosome fusion in yeast [12]. In starved Drosophila fat cells, the PtdIns(3)P and Mon1-Ccz complex on the surfaces of autophagosomes act together to recruit Rab7 to autophagosomes [13]. Again, this recruitment step is essential for the subsequent autophagosome-lysosome fusion [13]. A conserved LC3-Mon1-Ccz1mediated recruitment of Rab7 to autophagosomes has also been reported for mammalian cells [14].

      Like in other organisms mentioned above, C. elegans RAB-7 also plays an essential role in the fusion between autophagosomes and lysosomes [1]. During the revision period, our quantitative analysis shows that in engulfing cells for C1, C2, and C3, on average 66.2% and 63.5% of LGG-1+ and LGG-2+ puncta are RAB-7+ (Fig 8 E). Remarkably, 100% of the LGG+ puncta observed on the surfaces of phagosomes prior to the fusion event are RAB-7+ (Fig 8 E), suggesting that RAB-7 plays a role in the incorporation of LGG+ puncta to phagosomes. This suggestion was confirmed by the observation that the deletion of rab-7 results in the blockage of the fusion of LGG+ puncta to phagosomes (Fig 8 G-J).

      Since RAB-7 is not a suitable lysosomal marker, to identify autolysosomes in the population of LGG+ puncta, as Reviewer 3 requested, we followed the subcellular localization of NUC-1, a lysosomal luminal DNase. In engulfing cells, on average 40.7% of the LGG-1+ and 36.5% of the LGG-2+ puncta observed on phagosomal surfaces and subsequently fuse to phagosomes are NUC-1+ (Fig 12 C), indicating that those puncta are likely autolysosomes. Considering that 100% of the LGG+ puncta found on phagosomal surfaces are RAB-7+, these results clearly indicate that an LGG+ RAB-7+ particle is not necessarily an autolysosome. Also, 59.7% and 63.5% of the LGG-1+ or LGG-2+ puncta that fuse to phagosomes, respectively, are NUC-1- and presumed to be autophagosomes that have not fused to lysosomes. The non-autolysosomal LGG+ puncta thus are likely to fuse with phagosomal membranes as double-membrane vesicles. Furthermore, we have observed double-membrane vesicles in close contact with phagosomal surfaces using the EM analysis (Fig 4).

      Regarding the acidification status of the LGG+ structure in C. elegans embryos, Manil-Segalen et al [1] used a GFP::mCherry::LGG-1 reporter to detect the acidification of LGG-1-labeled autophogosomal structures (the color turning from yellow (green + red) to red (quenching of the GFP signal in acidic pH)). They found that in 1-cell stage embryos, all LGG-1+ puncta were non-acidic, whereas later into embryonic development, some LGG-1+ puncta that moved away from the allophagic cluster became acidic. Overall, both non-acidic and acidic LGG-1+ puncta exist in the cytoplasm.

      In the original version of “Results”, we did not do a good job to emphasize that recruiting Rab7 to autophagosomes from the cytoplasm does not necessarily involve lysosomes; rather, it is a prerequisite for the fusion between autophagosomes and lysosomes. Thanks to Reviewer 3 for requesting us to quantify the percentage of LGG+ puncta that are RAB-7+ or NUC-1+, a lysosomal marker, the results of which help clarify the issue. We revised the section “The small GTPase RAB-7 is enriched on the surfaces of autophagosomes” in “Results” and added it to Fig 12C to report our quantitation results.

      In summary, as Reviewer 3 pointed out, if all the LGG+ puncta are autolysosomes, the novelty of our finding that LGG+ puncta fuse to phagosomes would be greatly diminished. However, our experimental results demonstrate that this is not the case. More than 59% of the fusion events between LGG+ puncta and phagosomes are actually fusion events occurring between the double-membrane, NUC-1- autophagosomes and phagosomes. This conclusion is further supported by the observation of double-membrane autophagosome-like structures on the surfaces of phagosomes inside gonadal sheath cells in rab-7 mutant worms (Fig 4). Another important and relevant finding is that in rab-7 null mutant embryos, numerous LGG+ puncta were observed on phagosomal surfaces (Fig 8 G-H). It was reported that the inactivation of rab-7 in C. elegans embryos blocks the formation of autolysosomes [1]. Therefore, our result indicates that in rab-7 mutants, non-autolysosomal LGG+ organelles, most likely autophagosomes, are indeed recruited to phagosomes. Together, these lines of evidence indicate that in wild-type embryos, both autophagosomes and autolysosomes fuse to phagosomes, and the amount of the LGG+ NUC-1- autophagosomes is more than 59% of the LGG+ population. Given that the membrane and lumen of autophagosomes have different components from that of lysosomes, we propose that whereas the autolysosomes could contribute lysosomal contents to phagosomes, autophagosomes and autolysosomes also deliver the substance(s) that are unique to the autophagosomal origin to phagosomes. This is one of the important novel points of this report. We further speculated a few possible substances delivered to phagosomes from autophagosomes in the revised Discussion. Please also see our response to Reviewer #2’s “Recommendations for the authors, point 6”.

      The conclusion that lysosomal fusion with phagosomes is normal is based on the quantitation of NUC-1 fluorescence acquired by the phagosomes. However, the quantitation was made relative to the fluorescence of phagosomes at time 0, when no NUC-1 is expected to be present. The validity of these measurements and comparisons between wildtypes and mutants is therefore questionable.

      The preceding comment is critical because it is generally believed that degradation of phagosomal contents is solely dependent on fusion of lysosomes that deliver degradative enzymes and make the phagosomal lumen suitably acidic for optimal function of the hydrolases. If these parameters are normal, as argued by the authors, what is preventing normal degradation from occurring? The authors invoke a mysterious molecule(s) delivered by the autophagosome as an essential component for optimal degradation. While provocative, this notion seems unfounded unless it is shown that the delivery of hydrolases and the luminal pH of the phagosome are normal, yet degradation of proteins, lipids and nucleic acids and/or their resorption is affected by the absence of the mysterious molecule(s). It is unclear whether the authors propose that the contents of autophagosomes are somehow required for hydrolase activity.

      By measuring the fold of increase of the NUC-1::mCherry signal inside the lumen of phagosomes at 60 min post the completion of engulfment over the 0 min time point (Fig 11F), we have obtained the rate of accumulation of NUC-1 inside the phagosomal lumen, not an absolute value that could represent the amount of NUC-1::mCherry molecules per unit area inside the wild-type, atg-7, and lgg mutant phagosomes. We agree that this assay displays the caveat pointed out by Reviewer 3. However, given the possible variation of the expression level of NUC-1::mCherry from embryo to embryo, our concern is that the absolute photon numbers per unit area, if not normalized, are not informative values to be compared between strains. That is why we normalized the signal intensity at Tn (n: minutes post engulfment) with that of T0min. We argue that, considering all the results reported in Fig 11 together, which include the time-lapse images (A-D), the quantification of the increase of mCherry signal inside phagosomal lumen over time (E), the fold-of-increase of the mCherry signal at T60min over T0min of multiple samples (F), and the time points when the fusion between lysosomes and phagosomes starts (G) in multiple samples of four different strains, we conclude that mutations in atg-7, lgg-1, or lgg-2 do not significantly alter the temporal pattern or the scale of the lysosome-phagosome fusion, although these results are not sufficient to prove that the absolute amounts of NUC-1::mCherry inside the lumen of phagosomes are the same in the mutant and the wild-type embryos.

      Given that autolysosomes are part of the NUC+ vesicle population that display lysosome features, why, then, in atg-7, lgg-1, and lgg-2 mutants, which suffer severe defect in the biogenesis of autophagosomes and hence the autolysosomes, the accumulation of NUC-1::mCherry signal inside the phagosomal lumen appears not to be significantly affected? This result could be due to that (1) there is a large pool of lysosomes in the cell that provides sufficient amount of lysosomes to fuse to phagosomes despite the lacking of autolysosomes, or (2) the autolysosome population is a very small portion comparing to the lysosomal pool, and our assay is not capable to detect small reduction of the lysosomal flow into phagosomes. More fundamentally, we suspect that in a cell that bears a stable lysosome pool, whether autophagosomes exist or not might not affect the overall lysosomal activity. In a cell in which autophagosomes are lacking, the same amount of NUC-1+ vesicles are still expected to fuse to phagosomes, albeit now within the NUC-1+ vesicle population, the portion of autolysosomes is greatly diminished. Take this assumption into consideration, the observation that the lack of autophagosome-phagosome fusion causes phagosome degradation defect appears to suggest that autophagosomes might provide unique materials/activities to phagosomes.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors describe how modulating the levels of beta-catenin in TECs affects thymic organization and thymopoiesis. They use a b5t-Cre to specifically stabilize or ablate beta-catenin in TECs. While stabilization of beta-catenin induces thymic dysplasia and significantly impedes development of thymocytes beyond the DN1 stage, loss of beta-catenin has a milder outcome limited to significantly reduced thymic weight starting and an overall reduction in thymocyte number but no significant effects in thymocytes subset distribution at 2weeks of age. This reduction of thymic weight is associated with a significant and selective reduction in the number of cTECs. On the basis of these findings the authors conclude that fine tuning of beta-catenin levels is essential for postnatal T cell development.

      Overall this is an interesting but descriptive study that does not address the physiological and molecular effects of stabilizing or ablating beta-catenin in TECs. The authors suggest that stabilization of beta-catenin function in TECs results in their terminal differentiation to keratinocytes on the basis of increased expression of only two markers involucrin and loricrin, which is a limited definition and further analysis of these cells would be needed for this conclusion. On the other hand the interesting selective effect of loss of beta-catenin function on cTECs versus mTECs has not been analyzed. Are cTECs reduced due to loss of survival/proliferation/differentiation? How is their molecular profile affected by the loss of beta-catenin? Does the selective reduction of the cTEC compartment affect survival/proliferation/differentiation of the individual DN subsets? The paper would benefit from more in depth mechanistic analysis in these directions.

      Thank you so much for stating that this is an interesting study. Thank you very much also for asking about the mechanism underlying the selective reduction in cTECs by LOF of B-catenin. According to the reviewer’s suggestion, we performed RNA sequencing analysis of cTECs and mTECs in B-cat LOF mice. We detected 92 genes that were significantly altered in B-cat LOF cTECs with the FDR corrected p-value of less than 0.05; there were 11 downregulated genes and 81 upregulated genes compared with control cTECs. We also detected 91 genes that were significantly altered in B-cat LOF mTECs; there were 53 downregulated genes and 38 upregulated genes compared with control mTECs. Thus, only less than 100 genes were significantly altered in cTECs and mTECs due to LOF of B-catenin, in agreement with the finding of no remarkable alteration in the expression of many functionally relevant molecules in cTECs and mTECs by quantitative RT-PCR analysis, as shown in the original manuscript. Nonetheless, we found that a vast majority of the genes that were significantly affected by LOF of B-catenin were different between cTECs and mTECs except Adh1 and Pglyrp1. Adh1 was upregulated in B-cat LOF cTECs and downregulated in B-cat LOF mTECs, whereas Pglyrp1 was upregulated by B-cat LOF in both cTECs and mTECs. Notably, Cdkn1a was elevated specifically in cTECs but not in mTECs in B-cat LOF mice. The expression of Cdkn1a, which encodes cyclin-dependent kinase (CDK) inhibitor p21, is linked to B-catenin activity. The upregulation of Cdkn1a in B-cat LOF cTECs compared with control cTECs was confirmed by quantitative RT-PCR analysis. We noticed no remarkable difference in other CDK family genes, including Cnnb1, Cnnb2, and Cnnd1, in cTECs and mTECs from B-cat LOF mice. These results suggest that the upregulation of Cdkn1a may contribute to the selective reduction in the number of cTECs by LOF of B-catenin. These new results are shown in new Figure 8 and explained as well as discussed in the revised manuscript.

      According to your additional suggestion, we also examined the DN subsets in B-cat LOF mice and found that the frequency of DN subsets defined by CD44 and CD25 was unchanged, although the number of all four DN subsets was significantly reduced in B-cat LOF mice. These new results are shown in new Figure 9 and explained in the revised manuscript.

      Thank you so much again for your comments to help improve the manuscript.

      Reviewer #3 (Public Review):

      The authors effectively utilized Beta5T-iCre to specifically manipulate beta-catenin expression in TECs and definitively showed that careful control of beta-catenin within TECs is needed for the proper development of TEC microenvironments critical for T cell development.

      Strengths: 1) The methods used allowed the authors to effectively targeted TECs while avoiding extrathymic side effects of manipulating beta-catenin in ways that impacted skin or other tissues leading to abortive development or improper separation of the thymus and parathyroid. 2) The results showed that beta-catenin GOF specifically in TECs results in thymic dysplasia and loss of thymic T cell development. 3) The results from the analysis of beta-catenin LOF indicate that beta-catenin in TECs is not essential for the generation of functional TECs that support T cell development but the loss of beta-catenin in TECs results in the reduction in the number of cTECs, which leads to the reduction in the number of thymocytes during the postnatal period. 4) The results demonstrated that GOF of beta-catenin in TECs results in trans-differentiation of TECs into terminally differentiated keratinocytes.

      Thank you so much for highlighting the strengths of this manuscript.

      Weakness: The fact that beta5T expression is restricted primarily to cTECs suggests that the models used may not accurately capture the impacts of gain of function and loss of function of beta-catenin to mTECs and the maintenance of the medulla in postnatal mice. Given that beta5T-expressing cells have been shown to give rise to both cTECs and mTECs during fetal development the models may more closely demonstrate the importance of fine-tuning beta-catenin expression during fetal development while missing impacts on postnatal mTECs.

      Regarding the specificity of B5t expression, it is true that B5t is abundant in cTECs but not detectable in other cells including mTECs. However, all mTECs, including mTECs in postnatal mouse and aged mouse, are derived from bipotent B5t-expressing TEC progenitors, as has been reported previously (Ohigashi et al., Cell Reports, 2015). Therefore, B5t-iCre enables efficient and conditional genetic manipulation for both cTECs and mTECs in a wide range of ontogeny, including postnatal mouse and aged mouse. A conceptually similar gene targeting strategy has been widely employed to study two major T cell lineages of CD4 T cells and CD8 T cells, as CD4-Cre is useful to delete floxed sequences in both CD4 T cells and CD8 T cells. In agreement with a recent study using the same β5t-iCre (Barthlott et al. Nature Communications, 2021), our study demonstrated the highly efficient deletion of Ctnnb1 floxed sequence in both cTECs and mTECs in postnatal mouse. The revised manuscript includes this explanation in the Discussion. Thank you very much for helping us improve the manuscript.

      The authors achieved their aims and the results strongly support their conclusions. The work clearly demonstrates the importance of proper regulation of Wnt/beta-catenin signaling in the development and maintenance of TEC microenvironments and should lead to more interest in defining the specific Wnts and Frizzleds that are important in the development and postnatal maintenance of specific TEC subsets. This work will be important in identifying clinical strategies to counteract thymic involution and the subsequent loss of T cell function.

      Thank you so much for the supportive comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors demonstrate that selective inactivation of carnitine acetyltransferase (Crat) – a key metabolic enzyme – in AgRP neurons attenuates the response of AgRP neurons to peanut butter (PB) chips, the release of dopamine in the nucleus accumbens, and the motivation to work for food when mice are fasted. The strength of this study is the demonstration that metabolic sensing by AgRP neurons is somehow linked to dopamine release in the nucleus accumbens, but a weakness is that it is unclear how the lack of Crat in AgRP neurons affects their responsiveness to PB chips or how AgRP neurons regulate dopamine release. The authors use of contemporary methods to monitor the kinetics of of AgRP neuron activity (fiber photometry) and dopamine release (GRAB-DA) in response to feeding fed or fasted mice with PB chips is commendable. The authors acknowledge that the neural circuits linking changes in AgRP neurons activity to release of dopamine is indirect because AgRP neurons do not directly synapse onto dopamine neurons; thus, their findings provide intriguing correlations without a clear understanding of the circuit(s) involved. The authors clearly demonstrate that the Crat knockout (KO) mice do not respond to PB chips the same was as WT mice; the KO mice respond to the first chip normally but responses to additional chips are blunted. The mechanisms underlying the blunted response are assumed to be due to a failure of metabolic sensing, but the mechanisms involved are not explored.

      1.1 We would like to thank the reviewer for reviewing our manuscript. Our studies use AgRP crat deletion as a model of impaired metabolic sensing to examine how metabolic sensing in AgRP neurons controls dopamine signaling, as this question has not been previously addressed. Our previous proteomic analysis of AgRP neurons in WT and KO mice highlighted numerous differences in metabolic pathways, mitochondrial function and synaptic control mechanisms (1). Although defining the mechanisms responsible for impaired metabolic sensing will be important for future studies, this does not affect the conclusions in this current manuscript – that metabolic sensing in AgRP neurons is required for normal dopamine signaling in the nucleus accumbens and dorsal striatum in response to caloric foods. Similar studies have employed genetic approaches to modify insulin signaling in AgRP neurons by deleting phosphatases regulating insulin signaling, not to define the mechanism of phosphatase function, but to assess the impact of altered insulin signaling in AgRP neurons (2).

      Reviewer #2 (Public Review):

      In their manuscript, Reichenbach et al perform experiments to demonstrate that knocking the metabolic enzyme carnitine acetyltransferase (Crat) out of hunger-promoting AgRP neurons impairs an animal's ability to accurately sense its nutritional state and thus decreases motivation to work for food and attenuates neural response to palatable food rewards. They accomplish this using in vitro and in vivo neural recording techniques, and thoughtful behavioral approaches. Specifically they show 1) impaired responses of AgRP neurons to glucose (in vitro) and sensory cues predicting food (in vivo), 2) impaired striatal dopamine release in response to palatable food presentation, and 3) decreased motivation to work for palatable rewards.

      Their work largely substantiates their conclusions. Their experiments are well described, and the phenotypes observed are generally clear. The data partially explain prior behavioral studies performed on these conditional knockout mice. Moreover, their data are consistent with and a valuable addition to previously published data showing how dorsal and ventral striatum differentially respond to nutrient intake with ventral striatum dopamine release increasing in response to sweet taste and dorsal striatum dopamine increasing in response to rewarding post-ingestive effects of nutrients. This study will be of interest to a fairly broad community of feeding, hypothalamus, and dopamine researchers.

      A limitation of this study is that it does not adequately address the possibility that decreased AgRP neuron responses to food presentation may be related to altered in vivo baseline activity or attenuated fasting-induced hyperactivity of these neurons.

      2.1 In these current studies, we have shown under fasted conditions that WT AgRP and KO AgRP neurons have similar electrophysiological properties ex vivo (Fig 1). For example, resting membrane potential, input resistance and spontaneous firing frequency are not different between WT and KO AgRP neurons. We also see that ghrelin-induced food intake and AgRP activity, as measured by population calcium activity, does not differ between genotypes, further supporting the idea that AgRP neurons respond normally to pharmacologic challenges. Finally, we have now added new data that shows impaired in vivo AgRP activity in response ip glucose (Figure 1-figure supplement 1J-M), which supports altered glucose responsiveness with ex vivo slice recordings (Fig 1). Based on this in vivo and ex vivo evidence we assume baseline activity is not different in vivo but rather responses to glucose are impaired.

      While their slice studies show mostly normal ex vivo electrophysiologic properties of these neurons, in other models ex vivo and in vivo measurements of AgRP neuron activity are not directly correlated. Specifically Kristen O'Connell's group has shown increased baseline AgRP neuron activity in diet-induced obese (DIO) mice that is not further increased by fasting in slice (Baver et al, 2014).

      2.2 We appreciate the reviewers comment and knowledge of the literature but respectfully point out that our studies have been conducted on mice fed a chow diet.

      By contrast, Michael Krashes's group recently shown that DIO mice have reduced baseline AgRP neuron activity using a fiber photometry approach in vivo (Mazzone et al, 2020). Of note, decreased baseline or fasting-induced AgRP neuron activity would not necessarily diminish the impact of the rest of the results presented. Moreover, it is not necessarily a question that must be answered by this study, but it should be acknowledged as a possibility that is important to test.

      2.3 We thank the reviewer for these comments but we would also like to point out that the paper by Mazzone (2020) (7) does not show a difference in baseline AgRP neural activity with photometry in HFD vs chow fed animals but rather that HFD feeding affects the change in activity to food presentation (either chow or HFD), a similar effect was recently reported by Beutler and colleagues (8).

      This is also a good opportunity to emphasise that Mazzone and colleagues use the same approach to quantify changes in population calcium changes as in our study. In their study, they used df/f (%) in which the change df/f after food presentation is compared to a baseline df/f period, hence the designation df/f (%). This normalised approach is also regularly used by Knight and colleagues, the pioneers in the use of GCaMP to measure population activity of AgRP neurons (8-10).

      This is the same approach as we have used, except that we call this a z-score, which is the statistical convention for this normalisation approach. Moreover, this normalisation approach is commonly used for statistical analysis with fibre photometry and miniscope approaches to measure calcium dynamics as an index of neural activity (4,5,7-9,11-19). Normalisation is essential since df/f depends on numerous factors including 1) GCaMP expression, 2) illumination wavelength, 3) light intensity, 4) cell type, 5) quality of surgery (ie gliosis), 6) position of fibre optic implant and because normalisation is required for in vivo calcium imaging, all studies are likely subjected to a similar experimental limitation of potential differences baseline cell firing frequency. For these reasons highlighted above, calcium imaging is not typically used to estimate baseline differences in activity, rather it is most useful to examine neural responses to different stimuli with the magnitude of change in neural activity to a given stimulus encoding meaningful information. This is why df/f (%) or z-score normalisation is important and standard across most studies (4,5,7-9,11-19).

      Additional minor concerns do not significantly dampen my generally positive opinion of the study. These include: 1) the lack of feeding data associated with AgRP neuron fiber photometry responses,

      2.4 We designed these experiments so that mice were given single PB pellets, weighing approximately 70 mg, and all was consumed during exposure – therefore all mice ate the approximate amount. We have now described this in the methods.

      “Peanut butter chips were measured to ~70mg per pellet and one pellet was given per trial. Mice consumed all of this peanut butter chip during each trial such that no differences in consumption were observed.”

      We have also added the additional data. During each trial the time to peanut butter consumption was not different between genotypes (new data - Figure 1-figure supplement 1G,H,I)

      2) analysis of operant GRAB-DA data by pellet retrieval event rather than by mouse, and

      2.5 We also recognised this as a weakness and have now repeated these experiments using multiple additional animals. All operant GRAB-DA data in the accumbens and dorsal striatum is now presented as the averaged nose poke or pellet responses recorded for each mouse. Data points in Fig 4 and Fig 6 now reflect analysis by mouse.

      We have also updated this in the methods section “For DA photometry during PR session, data are presented as the averaged dopamine response for each animal. On average mice collected ~3.5 pellets during the PR in ad libitum-fed conditions and ~6 pellets when fasted (Fig 4O).”

      3) incompletely described inclusion criteria for mice in photometry studies.

      2.6 – For AgRP photometry studies, only mice with a maximal z-score response to IP ghrelin greater than 4 were included for analysis, ensuring differences in AgRP neural activity to food cues were not related to differences in GCaMP expression or illumination rates.

      We have added the following text to the manuscript. “For AgRP GCaMP6 photometry studies, an increase in activity to ghrelin response was used as an index of correct viral expression and fibre optic placement. Only mice with a maximal peak z-score of >4 were included for analysis in experimental group, using this criterion 5/20 mice, across both WT and KO mice, were excluded for experimentation (Figure 1-figure supplement 1N-O)”.

      In addition, increases in dopamine release before contact with PB in the NAc or dorsal striatum (Fig 2L-P; Fig 5L,M) and before pellet retrieval (in Fig 4K-N; Fig 6 J, K) is similar between genotypes, suggesting equivalent capacity to increase dopamine release under stimuli not affected by AgRP input. Thus, genotype difference in response to palatable food or sucrose pellets could not be due to differences in GRAB-DA expression in the NAc. Moreover, a postmortem analysis was conducted to identify the localization of GFP expression (Figure 7).

      We have added the following text. “GRAB-DA responses in WT and KO mice were similar on approach to PB and prior to pellet retrieval in both the NAc and dorsal striatum showing that genotype differences in response to palatable food or sucrose pellets could not be due to differences in GRAB-DA expression in the NAc. Moreover, a post mortem analysis was conducted to identify the localization of GFP expression (Figure 7)”.

      References

      1. Reichenbach A, Stark R, Mequinion M, Denis RRG, Goularte JF, Clarke RE, Lockie SH, Lemus MB, Kowalski GM, Bruce CR, Huang C, Schittenhelm RB, Mynatt RL, Oldfield BJ, Watt MJ, Luquet S, Andrews ZB. AgRP Neurons Require Carnitine Acetyltransferase to Regulate Metabolic Flexibility and Peripheral Nutrient Partitioning. Cell reports. 2018;22(7):1745-1759.
      2. Dodd GT, Kim SJ, Mequinion M, Xirouchaki CE, Bruning JC, Andrews ZB, Tiganis T. Insulin signaling in AgRP neurons regulates meal size to limit glucose excursions and insulin resistance. Science advances. 2021;7(9).
      3. Goldstein N, McKnight AD, Carty JRE, Arnold M, Betley JN, Alhadeff AL. Hypothalamic detection of macronutrients via multiple gut-brain pathways. Cell Metabolism. 2021;33:1-12.
      4. Garfield AS, Shah BP, Burgess CR, Li MM, Li C, Steger JS, Madara JC, Campbell JN, Kroeger D, Scammell TE, Tannous BA, Myers MG, Andermann ML, Krashes MJ, Lowell BB. Dynamic GABAergic afferent modulation of AgRP neurons. Nature Neuroscience. 2016;19(12):1628-1635.
      5. Berrios J, Li C, Madara JC, Garfield AS, Steger JS, Krashes MJ, Lowell BB. Food cue regulation of AGRP hunger neurons guides learning. Nature. 2021;595(7869):695-700.
      6. Cavalcanti-de-Albuquerque JP, de-Souza-Ferreira E, de Carvalho DP, Galina A. Coupling of GABA Metabolism to Mitochondrial Glucose Phosphorylation. Neurochem Res. 2021.
      7. Mazzone CM, Liang-Guallpa J, Li C, Wolcott NS, Boone MH, Southern M, Kobzar NP, Salgado ID, Reddy DM, Sun FM, Zhang YJ, Li YL, Cui GH, Krashes MJ. High-fat food biases hypothalamic and mesolimbic expression of consummatory drives. Nature Neuroscience. 2020;23(10):1253-+.
      8. Beutler LR, Corpuz TV, Ahn JS, Kosar S, Song WM, Chen YM, Knight ZA. Obesity causes selective and long-lasting desensitization of AgRP neurons to dietary fat. eLife. 2020;9.
      9. Beutler LR, Chen YM, Ahn JS, Lin YC, Essner RA, Knight ZA. Dynamics of Gut-Brain Communication Underlying Hunger. Neuron. 2017;96(2):461-+.
      10. Chen Y, Lin YC, Kuo TW, Knight ZA. Sensory detection of food rapidly modulates arcuate feeding circuits. Cell. 2015;160(5):829-841.
      11. Betley JN, Xu S, Cao ZF, Gong R, Magnus CJ, Yu Y, Sternson SM. Neurons for hunger and thirst transmit a negative-valence teaching signal. Nature. 2015;521(7551):180-185.
      12. Chen JY, Campos CA, Jarvie BC, Palmiter RD. Parabrachial CGRP Neurons Establish and Sustain Aversive Taste Memories. Neuron. 2018;100(4):891-899 e895.
      13. Daviu N, Fuzesi T, Rosenegger DG, Rasiah NP, Sterley TL, Peringod G, Bains JS. Paraventricular nucleus CRH neurons encode stress controllability and regulate defensive behavior selection. Nat Neurosci. 2020;23(3):398-410.
      14. Jennings JH, Ung RL, Resendez SL, Stamatakis AM, Taylor JG, Huang J, Veleta K, Kantak PA, Aita M, Shilling-Scrivo K, Ramakrishnan C, Deisseroth K, Otte S, Stuber GD. Visualizing hypothalamic network dynamics for appetitive and consummatory behaviors. Cell. 2015;160(3):516-527.
      15. Lerner TN, Shilyansky C, Davidson TJ, Evans KE, Beier KT, Zalocusky KA, Crow AK, Malenka RC, Luo L, Tomer R, Deisseroth K. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell. 2015;162(3):635-647.
      16. Livneh Y, Ramesh RN, Burgess CR, Levandowski KM, Madara JC, Fenselau H, Goldey GJ, Diaz VE, Jikomes N, Resch JM, Lowell BB, Andermann ML. Homeostatic circuits selectively gate food cue responses in insular cortex. Nature. 2017;546(7660):611-+.
      17. Miletta MC, Iyilikci O, Shanabrough M, Sestan-Pesa M, Cammisa A, Zeiss CJ, Dietrich MO, Horvath TL. AgRP neurons control compulsive exercise and survival in an activity-based anorexia model. Nat Metab. 2020;2(11):1204-1211.
      18. Muir J, Lorsch ZS, Ramakrishnan C, Deisseroth K, Nestler EJ, Calipari ES, Bagot RC. In Vivo Fiber Photometry Reveals Signature of Future Stress Susceptibility in Nucleus Accumbens. Neuropsychopharmacology. 2018;43(2):255-263.
      19. Steinberg EE, Gore F, Heifets BD, Taylor MD, Norville ZC, Beier KT, Foldy C, Lerner TN, Luo L, Deisseroth K, Malenka RC. Amygdala-Midbrain Connections Modulate Appetitive and Aversive Learning. Neuron. 2020;106(6):1026-1043 e1029.
    1. Author response

      Reviewer #2 (Public Review):

      This work addresses the developmental origins of functionally distinct neuronal populations in the arcuate nucleus of the hypothalamus (ARH). During gestation, immature Pomc-expressing neurons differentiate into at least 3 subpopulations of mature POMC neurons, as well as non-POMC neuronal sub-types (eg., AgRP and KNDy neurons). The authors set out to address the issue of whether these diverse populations arise from a common progenitor or from multiple, molecularly distinct progenitor populations.

      They performed single cell RNA-seq on Pomc-expressing neurons (FACS-purified on the basis of expression of a Pomc-driven reporter transgene) across embryonic and early postnatal stages (E11.5 to P12). They also compared these transcriptional profiles to translational profiles of Pomc-expressing neurons at P5 and P12 generated with the TRAP-Seq approach. Clustering and developmental trajectory analyses confirm reports by other groups that immature Pomc-expressing neurons give rise to non-POMC cell fates (including AgRP/NPY and KNDy neurons) and that terminal differentiation of POMC neurons is achieved after P12. While the data generated here will be a useful resource for the field, there are weaknesses in the analyses used to support the central claim that POMC neurons arise from heterogenous progenitor populations.

      Strengths

      • This is an interesting topic that would be of interest to scientists studying neural circuits regulating energy balance. • The expression databases provide a valuable resource for the community. They can be mined to identify genes that can be used as markers and as the foundation for functional studies. They can also inform efforts to generate and stage specific ARH cell types using induced pluripotent stem cell technology.

      Weaknesses

      • My main concerns stem from the fact that all Pomc-expressing neurons in the developing ARH are considered as a single category of "progenitors" in these analyses. While they meet this strict definition because they do not express markers of terminally differentiated POMC neurons, the failure to distinguish between early and late progenitors limits the conclusions that can be drawn.

      We agree with the reviewer’s concern and in this revised version of the manuscript we have avoided the use of the terms “progenitors” or “precursors” while focusing on the embryonic and postnatal ages of the collected hypothalamic neurons.

      o The earliest stage analyzed here is E11.5, which represents the peak of POMC neuronal differentiation. To capture the precursors of these neurons (1.Pomchigh/Prdm12 at E11.5), it is necessary to perform transcriptomic analyses at earlier stages.

      This is answered in the Essential Revision 1.

      o EBFs have been shown to regulate neuronal differentiation and migration out of ventricular layer into mantle layer. It is critical to determine whether EBF-expressing neurons in the ARH similarly represent an "early" progenitor stage that follows cell cycle exit migration out of the ventricular zone and precedes the expression of transcription factors that specify a particular cell fate. If so, EBF-expressing neurons in 2.Pomc-med-Ebf1 could represent progenitors of 1.Pomc-hi-Prdm12 neurons. In support of this idea, the transcriptome of 2.Pomc-med-Ebf1 subcluster 1 neurons map onto the two major subpopulations of POMC neurons (Supplemental Figure 15).

      This matter has also been answered in the Essential Revision 1.

      • Because key terminal markers of POMC neurons are not expressed at P12 (i.e. Ttr, Anxa2), it is hard to precisely map progenitor populations onto neuronal subpopulations in the adult.

      We agree with the reviewer’s concern and have decided to remove this level of comparison from the revised manuscript. Therefore, the paragraphs from lines 453 to 497 in the Results section of the original manuscript are no longer present. Similarly, we have deleted from the Discussion the paragraphs referring to the comparison of the two transcriptome data sets.

      • While the data support the idea that there are several molecularly distinct subpopulations of POMC progenitors, these analyses do not provide clear answers to the following key questions: 1) Do AgRP/NPY and KNDy neurons arise from molecularly distinct populations of Pomc-expressing progenitors? 2) At what point in the developmental trajectory are molecularly distinct subpopulations of POMC neurons specified?

      One of the most intriguing findings of this study is that most Pomc-expressing neuronal clusters are already present in E11.5 embryos, showing their distinctive feature genes and characteristic transcript level from this early time point. For example, cells from cluster 4.Pomclow/Otp, which give rise to Npy/Agrp neurons, express the feature gene, Otp already at E11.5. At this time point cells from cluster 5.Pomclow/Tac2, which give rise to the KNDy neurons, express similar feature genes as cluster 1, except that the level of Pomc transcripts in cluster 5 is much lower than in cluster 1. In addition, cells from cluster 5 do never express Otp. Thus, cluster 4 and 5 do not share their repertoire of feature transcripts at E11.5 suggesting that they arise from molecularly distinct populations of Pomc-expressing neurons.

    1. Author Response:

      Reviewer #3 (Public Review):

      In their manuscript entitled "Evolution of binding preferences among whole-genome duplicated transcription factors" the authors investigate the evolution of transcription factors following genome duplication including the implication for the evolution of their binding sites.

      The hypothesis tested in this study is very much of interest to a large readership, furthermore the authors have carefully selected approaches to allow them to assess the different aspects of the implications of gene duplication. This includes reciprocal knockouts of paralogues, DNA binding domains swapping.

      One of my major criticism stems with the writing and the figure preparation. The writing is extremely dense making difficult to understand arguments and results. Similarly the figures have many panels at the cost of readability and ease of comprehension. Some of the supplementary figures are better suited for the main manuscript for instance

      We improved figures and presentation. In particular, we subdivide three figures (1, 2 and 4) and improve the labeling and captions of each figure.

      The methods need attention, as for instance the parameters used to identify cut sites and avoid noise are not provided are not explained (NGS data processing).

      We expanded the method section to provide more details on our pre-processing procedures and other aspects of the analysis.

      "Promoter binding quantification" The authors summed the normalized genome coverage over the promoter region of each gene, by doing so the authors remove potential inherent variability between replicates, the authors need to keep replicate separate and thereby gain confidence in their results.

      For all correlations, we now show the mean and standard deviation of the correlations between individual repeats in the main figures. Moreover, we added the individual repeats of pbs z-scores and motif signals in Figure 5 and 6 (new Figure 8 and 9). Of note, the supplementary. Figures already showed the individual repeats for the critical analysis (i.e. correlation matrixes).

      "Relative, gene-specific binding changes upon paralog deletion or DBD swapping" the authors select signal from the 100 strongest bound promoters, why choosing those top 100 and not all promoters?

      This is because we do not expect TFs to bind all promoters but only tens-to-hundreds of them. In the majority of promoters, binding signal is weak, likely reflecting some noise, and changes are irrelevant. Of note and for better consistency, we now define the targets as we did in the other figures (i.e. promoters with a z-score above 3.5 or the 50 top promoters) without any significant changes. Please note that for this analysis we also consider the Top 40 targets of the paralog pair, even if they are not all among a TFs targets. Finally, most of our conclusion are based on correlations, which do account for all promoters.

      Need more info on 2A the authors need to indicate conservative and radical replacement. It is hard to see the impact of the changes highlighted

      We updated the figure to better distinguish between AA changes that do not affect functional properties (charge, hydrophobicity), and those that do.

      "Consistent with our sequence analysis, DBD swapping perturbed binding for three of the four zinc-cluster TFs tested, although in none of these cases was DBD swapping sufficient for swapping promoter preferences (Fig. 2D,E). The fourth TF (Yrr1) remained largely invariant (Pearson's r>0.8) to DBD swapping, as did the eleven additionally tested TFs, taken from six different families. Of note, this invariance to DBD swapping characterizing most TFs was observed not only when comparing promoter preferences, but also when comparing in-vivo preferences to DNA 7-mers (Fig. 2D and fig. S2). We conclude that, for most duplicate pairs, the variations driving divergence". This section is very difficult to read

      We updated and simplified this section: “Consistent with their strong DBD sequence divergence, DBD swapping perturbed promoter binding for three of the four zinc cluster TFs tested. However, in none of these was DBD swapping sufficient for switching promoter preferences to those of the paralog from which the DBD was taken (Figure 4B,C). Further, in all other twelve cases studied, binding preferences remained largely invariant to the swapping of the DBD (Pearson's r>0.8). Of note, this invariance to DBD swapping was also observed when comparing in-vivo 7-mer DNA sequence preferences (Figure 4B and Figure 4-figure supplement 1). We conclude that, for most paralog pairs, the variations driving divergence in promoter binding preferences are located outside the DBDs.”

      Page 6 "The fourth TF (Yrr1)", this transcription factor is not shown on the figures

      We do not refer to this TF explicitly in the text now, and to simplify the figure we only annotate the exemplary DBD-swaps from Figure 2D (new Figure 4B).

      Page 7: "Two TFs completely lost binding signals (Pip2, Hms2)," how much is this impacted by the number of binding sites

      We believe this result is independent of the number of binding sites, since we measure correlation between all promoters. In general, we do not detect a strong correlation (r=0.13) between the number of targets, i.e. promoters with a z-score>=3.5, and the effect of paralog deletion.

      Page 7: "Cooperative interactions were generally minor (e.g. Stp2), as were compensatory interactions (e.g. Pdr3 or Ecm22; Fig. 3C,D). Therefore, strong interactions between TFs paralogs are rare and existing ones tend to increase mutation fragility." Figure 3D highlights gene specific variation upon domain swapping, yet the authors do not explore this

      We agree. This is beyond the scope of the present MS and will be studied in future projects.

      Page 11: "Second, mutations within Rph1's DBD prevented its binding to Gis1-specialized sites, thereby reducing paralog interference (Fig. 6E)." Figure 6D suggest Rph1 to bind upon Gis1 ko, the difference in binding affinity upon swapping is not massive, more variants should also intervene

      Please note that we base our statement not on a comparison of DBD swap against the paralog-deletion strain, but rather on a comparison of DBD-swap against the wild-type background. We believe this is the right comparison since the DBD-swapped was done within the wild-type background. In this comparison, the effect of the DBD-swap is comparable to the effect of the paralog deletion. We now emphasize the actual backgrounds in the figure caption.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study fused images from CMR and T1 mapping to reconstruct 3D anatomical models of the heart for HCM patients. Using the model, they investigated potential contributions of diffusive fibrosis to arrhythmogenesis of the heart model in response to focal stimulus. They found that the diffusive fibrosis contributed to increased incidence of ventricular arrhythmias.

      The study is of some interest. However, there are some concerns regarding its publication in its present form.

      1) Details are unclear about how the imaging segmentation and alignment were conducted. Especially when CMR and T1-mapping data were fused together, how the slice images were aligned as mismatch is of a challenge and can affect the simulation results and conclusion.

      The short-axis LGE-CMR and post-contrast T1 acquisitions for each patient were performed consecutively during a single scan as described by Chu et al., 2017. In our study, we used the z-axis coordinate of the post-contrast T1 map (a short-axis, single slice) to select the corresponding short-axis LGE-MRI slice with the same position and orientation of the ventricle. Each set of a post-contrast T1 map and a corresponding short-axis LGE-MRI slice was visually inspected for differences in anatomy, cardiac phase, and distribution of enhancement, and only images found to be in agreement by the radiologists were used in this study. Information along these lines is now provided in the manuscript (Geometrical Reconstruction).

      The short-axis LGE-CMR stack was used to reconstruct the LV geometry (3D volume). The LV myocardium was segmented in the CardioViz3D software using a validated semi-automatic landmark-based method used in previous studies by our team. This has been clarified with additional detail and citations (Arevalo et al., 2016; Shade et al., 2020b; Cartoski et al., 2019 in the revised submission (Geometrical Reconstruction).

      The LV myocardium of the single post-contrast T1 map was segmented using the same method as above. The signal intensity profile of the myocardium from the corresponding LGE-CMR slice was normalized to the intensity profile (relaxation times) of the T1 map myocardium. Using thresholds of 350 and 450 ms, as described in the manuscript, we calculated the resulting standard deviations from the mean of the low signal intensity region of the LGE-CMR slice. These new, personalized standard deviation thresholds were then applied to each LGE-CMR slice of the short-axis stack to produce the regions of focal scar and diffuse fibrosis in the virtual heart model. Clarification has been added (Geometrical Reconstruction) in the revised submission.

      2) It is unclear what is the spatial resolution of the CMR, and how the spatial resolution of about 330 micrometre was achieved for the finite element model.

      The LGE-CMR resolution is 2x2x8 mm and the post-contrast T1 map resolution is 1.5x1.5x8 mm. In the revised manuscript, we have added the spatial resolution of the CMR images (Imaging Data).

      As to the reviewer’s second question, the equations of action potential propagation in the heart (a partial differential equation describing current flow in electrically interconnected cells, coupled to a set of ordinary differential and algebraic equations describing transmembrane currents) are solved on a left ventricular finite element mesh constructed from the segmented images. The finite element tetrahedral mesh needs to be of resolution 300-400 um (average resolution of 355um in our study) to achieve a stable converging solution of the equations; numerous studies have established and validated this spatial resolution value.

      To construct the finite element tetrahedral mesh from the segmented images at the needed spatial resolution, we used the Mimics Innovation Suite from Materialise. The software uses an input target finite element edge length and generates a computational mesh with a tight edge length distribution around the input value. We have provided information regarding how the spatial resolution of the finite element mesh was achieved (including references supporting the requirement for mesh resolution); (Geometrical Reconstruction).

      3) It is unclear how the incorporation of fibre structures was done and validated. Given that fact that at different stages of HCM and individual differences, the fibre structures are different in different subjects. Without consideration of this, conclusions based on the diffusive fibrosis are non-conclusive.

      We do not assume that the fiber orientations are the same in each patient’s heart. Only the very general rules that fiber tracts follow in the left ventricle are the same among subjects, but the fiber orientations in each left ventricle are specific to the geometry of that ventricle.

      Fiber orientations were assigned in each model on the basis of the individual geometry of the ventricles in the following manner: Fiber orientations were assigned to each individual ventricular computational mesh on a per-element basis using an efficient rule-based approach that we developed and extensively validated (see reference Bayer et al., 2012 in this manuscript); the approach is now a staple in our field and is used widely in patient-specific ventricular simulation studies. The fiber orientation methodology uses the Laplace–Dirichlet method to define transmural and apicobasal directions at every point in the patient-specific ventricular mesh. It then employs bi-directional spherical linear interpolation to assign fiber orientations based on a general set of fiber orientation properties (rules) derived from a large amount of histological and diffusion tensor MRI data. We have provided more detail in the revised manuscript (Geometrical Reconstruction).

      4) It is also unclear how the physiological model for the HCM was developed and validated for the patient-specific model.

      The HCM-specific cell model used to represent regions of diffuse fibrosis in this study is a modification of the ten Tusscher human ventricular model (see citation in manuscript). Modifications were made to the ion channel kinetics in this model based on experimental data from human HCM cardiomyocytes reported by Coppini et al. 2013 (cited in the manuscript). In that study, measurements were made via whole-cell voltage and current clamp in cardiomyocytes collected during myectomy from patients with HCM from regions shown to contain substantial amounts of diffuse fibrosis. Specific changes included 107% increase of INaL maximal conductance, 19% increase of ICaL maximal conductance, 34% decrease of IKr maximal conductance, 27% decrease of IKs maximal conductance, 85% decrease of Ito maximal conductance, 15% decrease of IK1 maximal conductance, 34% increase of sodium-calcium exchanger (NCX) activity, and 43% reduction of Sarcoplasmic/Endoplasmic Reticulum Calcium ATPase (SERCA) activity. The net results of the changes to the cell model include increased action potential duration at 90% repolarization from 280 to 330 ms (+18%) and diminution of the notch after depolarization. We have provided more detail and citations in the revised manuscript (Electrophysiological Properties).

      Reviewer #2 (Public Review):

      The overall aims of this work are to use computer models of electrical activation to (i) understand how remodelling of structure and function in hypertrophic cardiomyopathy promotes ventricular arrhythmias, and (ii) to assess whether a model-based approach could be used to predict the risk of arrhythmias in specific patients.

      The approach taken by the authors builds on previous work by this group, where a personalized mesh representing the ventricles is constructed from automated analysis of cardiac MRI. Models of human electrophysiology are then solved on this mesh with simulated pacing, to identify vulnerability to arrhythmias.

      The major strength of this approach is that it presents an environment within which an investigation that may be technically difficult, time-consuming, or unethical in a patient can be undertaken to guide treatment or assess risk. It is very promising.

      However, although the methodology used is sound, there are important assumptions that underpin this approach and limit the extent to which the outcomes are trustworthy. These include:

      1. MRI physics. The MR signal is produced from a finite volume of tissue, which is about an order of magnitude larger than the size of finite elements used in the computer model. Thus, the personalised mesh may not capture small scale features that could be important for initiation of arrhythmias.

      The reviewer is correct – the MRI scan and thus the computational mesh constructed from the segmented images does not capture small scale features. It indeed is possible that small scale features could lead to arrhythmogenesis in these patients, but it is unknown and highly unlikely that they would dominate arrhythmogenesis in the HCM-remodeled substrate. Since we stratify each personalized HCM substrate as arrhythmogenic or not by inducibility of arrhythmia following pacing, we capture all the contributions of MRI-visualized structural remodeling to arrhythmogenesis. While there may be additional contributions from small scale heterogeneities to arrhythmogenesis, this would not change the patient’s positive stratification result. If small-scale heterogeneities not “seen” by the MRI are the main mechanism for arrhythmogenesis in these HCM patients, then there will be a large discrepancy between our results of simulations and the clinical outcome. As the reviewer can see from the results presented in the paper, this is not the case. Thus, while small heterogeneities might additionally contribute to arrhythmogenesis, they do not alter the predominant mechanism on which stratification is based. We have added new text in the Study Limitations.

      1. Cardiac mechanics. The mesh used to solve the computer model is static, whereas the heart contracts with every beat. Mechanical contraction not only changes the shape of the heart and the thickness of the ventricular wall, but also feeds back into electrical activity.

      Again, the reviewer is correct in that assessment. However, it is important to emphasize that in our translational effort to bring computational modeling to the point-of-care and clinical decision making, there are simplifications that need to be made to personalized models to make the models computationally tractable while being clinically useful. No model will ever be correct in every respect – but we strive to develop clinically-useful models, that make better predictions than those made by the current clinical criteria. While there is certainly mechanoelectrical feedback, the arrhythmogenic propensity imparted by HCM remodeling on the ventricular substrate is the main factor that leads to a stratification of high risk. Again, if that was not the case, and mechanoelectrical feedback played a major role, there would be a large discrepancy between results of our simulations and the clinical outcome, which is clearly not the case in the paper.

      Importantly, recently we published a paper in which we conducted a detailed simulation of ventricular arrhythmia in a patient-specific model with structural remodeling that incorporated full mechanics, hemodynamics and feedbacks (Salvador, M., Fedele, M., Africa, P. C., Sung, E., Dede, L., Prakosa, A., Chrispin, J., Trayanova, N. and Quarteroni, A. (2021) 'Electromechanical modeling of human ventricles with ischemic cardiomyopathy: numerical simulations in sinus rhythm and under arrhythmia', Computers in Biology and Medicine, 136, pp. 104674.). The model was very complex and took days to execute for one pacing site. Nonetheless, the simulations demonstrated that when a purely electrophysiological model was compared to this complex multi-physics model, in all cases where arrhythmia was inducible in the electrophysiological model, it was also inducible in the multi-physics model and vice versa. The only difference between the models was that in the case of the multi-physics model, the arrhythmia was unstable, while it was stable in the electrophysiological model. This indicates that including additional complexities in our patient-specific models (such as mechanoelectrical feedback) would not change the stratification of whether an arrhythmia will occur or not.

      We have added text and citations in the Study Limitations along these lines.

      1. Population variability. The electrical model used in this study is a standard representation for the human ventricles. This is adjusted to capture some features of electrical activity in fibrotic regions, but these are not well characterised so assumptions are made. The patterns of electrical activation and recovery in the human heart vary from place to place within the human ventricles, with time within the same patient in response to external effects including autonomic activity, and from one patient to another. Hypertrophic cardiomyopathy is usually a progressive disease, so patterns of fibrosis may change over time.

      The reviewer is correct -- we are making a number of assumptions in our HCM model. Until there is a way to characterize personalized electrophysiology non-invasively, the models must make these assumptions. Despite these assumptions, this model is able to make correct predictions and to stratify patients better than the current clinical criteria. Of course, there is always room for improvement, like in any model.

      We do not state that assessment with our risk predictor is a one-time effort. Since HCM is a progressive disease, our risk predictor should be applied again when new patient imaging is acquired during follow up visits. These are already recommended for patients with HCM to monitor the changes in fibrotic remodeling. We already had text in the Discussion of the original submission stating this.

      Nevertheless, this study has found evidence that diffuse fibrosis plays a role in the vulnerability to arrhythmias in hypertrophic cardiomyopathy, and found that, in this group of 26 patients, a model-based approach can provide a more accurate risk stratification than other methods based on patient clinical data.

      Thank you very much for the comment.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors try to identify ATR-mediated phosphorylation sites in male meiosis of mice and performed phosphoproteomics using two distinct mouse models. The paper focuses on important topics in the field. Since ATR has key functions in meiosis, successful identification of ATR-mediated phosphorylation sites would have a profound impact.

      The study has certain technical issues in experimental design and data interpretations.

      The rationale as to why they used Rad1-cKO was not well described. According to the co-submitted manuscript, Rad1-cKO spermatocytes experience meiotic arrest, and the cellular composition is totally different between controls and Rad1-cKO testes. The "RAD1-dependent" phenotype may simply reflect the difference in cellular composition in testis. With this criterion, any phosphorylation sites present after the mid-pachytene stage in normal spermatogenesis can be categorized as "RAD1-dependent".

      We have altered the figure and text in the manuscript to more clearly explain the rationale for using Rad1-cKO and combining the generated data with the data from the rapid 4 hour ATRi treatment. Importantly, we now consider the phosphorylation sites impaired after a quick 4 hour treatment with ATRi (New Supplementary File 1), which is expected to be too quick to induce an appreciable pachytene arrest. Therefore, the final ATR-dependent and RAD1-dependent dataset is unlikely to include phosphorylation sites that are only shown as being depleted due to a persistent mid-pachytene arrest (these sites should appear as RAD1-dependent and ATR-independent).

      There are two different experiments for ATR inhibitor (ATRi)-treated mice (2 pairs after 2.5-3 days of treatment, and 2 pairs 4 hours after a single dose). However, these results are not distinguished in the analysis, and there is no evaluation of testicular morphology after ATRi treatment.

      We addressed the point of separating the data from 4 hour and 2-3 days of treatment. We also have now also addressed testicular morphology after 4 hour ATRi treatment and did not observe any defect (new Figure 5-figure supplement 3A-B).

      Finally, the authors showed ATR-dependent localization of SETX and RANBP3 and discussed interesting data. However, it has not been determined whether these localization changes were due to the functions of identified phosphorylation sites or some other mechanisms.

      We agree with the reviewer that it would be very interesting to address the role of specific phosphorylation sites in SETX and RANBP3. However, we feel this would require significantly additional effort and time, which would not be realistic in the current manuscript, and is beyond the scope of this resource paper.

      Reviewer #3 (Public Review):

      In this study, Sims et al. perform a phosphoproteomic analysis of the ATR signaling pathway in mouse testis. By studying the different phosphorylated peptides found in testis samples from ATR inhibited mice and from mutant mice for the member of the ATR-activating 9-1-1 complex, RAD1, authors defined a comprehensive map of the ATR signaling pathway in the mouse testis. In general, the methodological approach performed is appropriate to accomplish the desired goal and the results obtained are well explained and properly discussed. The conclusions raised by the authors are supported by the results obtained and the manuscript reads easily. Thus, overall the manuscript is of high quality. Furthermore, the information provided in this study is novel since to my knowledge this is the first attempt to characterize the ATR signaling pathway in the testis. In my opinion, these data will be very relevant to better understand the role of the ATR in mouse spermatogenesis, and in meiosis in particular, in the future.

      Thank you, we appreciate the positive remarks.

      Nonetheless, I have a few major concerns about this manuscript. Firstly, I think an important part of the description of the results is placed in a related preprint by the authors (Pereira et al. https://www.biorxiv.org/content/10.1101/2021.04.09.439198v1). In my opinion, this manuscript lacks a more detailed analysis of the ATR signaling on DNA repair and chromosome axis structure, which are fundamental to understand the meiotic prophase. Secondly, the manuscript falls short of providing novel insights about ATR roles during the meiotic prophase. As ATR function on the meiotic prophase has been extensively studied, the ATR phosphoproteome should provide either some clues about possible novel functions ATR may do during the meiotic prophase or spermatogenesis, or provide a mechanistic explanation of how ATR performs its meiotic functions (e.g., meiotic sex chromosome inactivation or meiotic recombination). The final section of the results is an attempt at doing sol, but to me, the data provided only suppose a small incremental advance in our knowledge of how ATR promotes MSCI. I would have liked the authors to expand this section to prove the utility of the data.

      We agree with the reviewer that it would be very interesting to address more details of the roles of ATR in meiosis and the underlying molecular mechanisms. However, we feel this would require significantly additional effort and time, which would not be realistic in the current manuscript, and is beyond the scope of this resource paper. We note that the revised version of the manuscript now reports the exciting finding that ATR is important for the proper localization of CDK2 in meiotic spreads. While the details and mechanisms remain unknown, we believe this finding, together with other reported findings in this resource paper, open new directions to study meiotic ATR signaling.

    1. Author Response:

      Evaluation Summary:

      Are enzymes found in organisms that optimally grow at colder temperatures are more active than the same enzymes found in organisms that optimally grow at warmer temperatures? Here, an assessment of the catalytic constants for approximately 2200 enzymes (obtained from the BRENDA database) showed no correlation between the relative catalytic activity and the optimum growth temperature. Further support for this conclusion was obtained from the measurement of the catalytic constant from a selection of ketosteroid isomerases from organisms that optimally grow between 15 and 46 degrees centigrade. These are interesting results, although the significance with respect to earlier studies has not been clearly explained.

      We have made the relationship between previous work and our work more explicit. Earlier studies have used a limited number of specific cases to compare enzyme rates from different organisms (for example, n = 28, Figure 1C, Figure 1D). In this work, we performed a systematic analysis of 2223 enzyme reactions, reducing confirmation bias, and we have clarified this point. Prior work developed physical models about enzyme catalysis but were based on data that do not appear to be representative.

      Reviewer #2 (Public Review):

      The authors are trying to understand how enzymes evolve to best enable organisms to adjust to changes in the temperature of their environment. The paper reports an analysis of 2223 values of kcat from the BRENDA database, for 815 organisms with known optimal growth temperatures, and for which there are at least two variants per reaction. This analysis fails to show the expected preference for values of [(kcat)cold/(kcat)warm] > 1 observed in earlier studies.

      This is a useful attempt to use one large databases to gain insight into how enzymes evolve to enable organisms to adapt to changes in temperature. They have done a good job in curating the BRENDA database to identify data that meets their criteria for analysis.

      There are deficiencies that should be corrected.

      (1) The first concerns the reported values of [(kcat)cold/kcat)warm]. Figure 1D shows "Rate comparisons of warm-adapted and cold-adapted enzyme variants made at identical temperatures." I think that it is important that these kinetic parameters be reported for catalysis at a common temperature, but it is not clear to me that is the case for the author's analysis. For example, they write beginning on line 234 that "The rate ratio kcold/kwarm per reaction was determined by dividing rate of the enzyme from the organism with the minimum TGrowth by the rate of the enzyme from organism with the maximum TGrowth." My reading of this sentence is that these rate constants kcat [not rates] were determined individually at the organisms optimal growth temperatures, and not at identical temperatures as reported in Figure 1D. This will complicate the author's interpretation of the two sets of results.

      Analysis of kinetic parameters at a common temperature supports the conclusions of this work.

      (2) The author's fail to present a clear physical model to use in analyzing these results.

      For example, they write on line 35 that: "According to the rate compensation model of temperature adaptation, this challenge is met by cold-adapted enzyme variants providing more rate enhancement than the corresponding warm-adapted variants (Figure 1A)"

      I cannot recall hearing the term rate compensation model, but am familiar with discussions on the differences in properties of enzymes isolated from organisms that have adapted to warm and cold environments. The term cold adapted enzymes is not appropriate, because it is the organism not the enzyme, that adapts to the change to a cold environment. This is accomplished through the natural selection of enzymes with kinetic parameters, stability, etc. that optimize the organisms chances of survival in a cold climate. The kinetic parameters for essentially all enzymes will decrease with decreasing temperature. The most highly evolved metabolic enzymes have kinetic parameters kcat/Km close to the diffusion controlled limit, because this optimizes energy production from metabolism. A decrease in temperature will cause the values of kcat and therefore kcat/Km for these enzymes to decrease, to the detriment of the organism. This may be overcome by selection of enzymes with values of kcat/Km close to that observed for the parent [unevolved] organism. The result is that larger kinetic parameters kcat, for catalysis at a common temperature, will be observed for enzymes isolated from the cold-adapted, compared to the unevolved parent organism. This simple application of Darwin's principals of natural selection is strongly supported by the data reported in Figure 1D.

      The reviewer presents a model that presumes that there would be greater selection to optimize energy production. This is also the model supported by the prior data (Figure 1D).

      However, the more extensive data in our work do not support the model that the reviewer notes and that has been widely accepted in the literature –this is the central conclusion of this work and we have attempted to clarify this, as noted above. The strict Darwinian interpretation for our observations is that there is not a strong selection for enzyme rates to be maximized, as described in the Discussion.

      An alternative model, consistent with the data we present, is that there are different selective pressures on enzymes than rate maximization. We note that it is possible that different metabolic strategies may be more advantageous at different life stages or in different communities (see Wortel et al., 2018, now cited in our main text). These models can be tested experimentally –e.g., by examining how variations of a weak-link enzyme fare over time under different growth conditions. There is much more to be learned from linking the properties of enzymes to evolution, and we expect the relationship between fundamental rate constants and selection to be complex, fascinating and important.

      We use the term rate compensation to refer to the phenomenon and not the physical explanation; there is no need for a physical explanation of a phenomenon in the absence of evidence for the phenomenon itself. We have clarified that we have introduced this term in the Introduction: According to what we term the rate compensation model of temperature adaptation, this challenge has been suggested to be met by cold-adapted enzyme variants providing more rate enhancement than the corresponding warm-adapted variants (Figure 1A).

      We use the term “cold-adapted” in agreement with literature usage: from an organism that is cold adapted. We have clarified this language usage: We use the term “cold-adapted variant” to refer to an enzyme from an organism annotated with lower TGrowth values.

      Finally, “cold-adapted” is not synonymous with “having faster enzymes”, which is often how it is used in the literature and how it is implied in the reviewer’s model.

      (3) The paper alludes to, but does not clearly explain extensions of these ideas that are based on one model for how enzymes work. Enzymes often undergo large conformational changes during their catalytic cycle, and so must have sufficient flexibility for these changes to occur with rate constants that support catalysis. This predicts that the enhancement for catalysis observed for enzymes from cold-adapted organisms, might best be achieved through mutations that favor an increase in protein flexibility. There will also be natural selection of enzymes for thermophilic organisms that optimize the organisms chances of survival in a hot climate, where heat denaturation of the protein catalyst is minimized through the selection of stiffer protein catalysts. This analysis predicts a decrease in enzyme flexibility with increasing preferred growth temperature, that might give rise to an increase in protein stability with increasing optimal growth temperature.

      We agree that there are many fascinating aspects of temperature adaptation at the level of individual enzymes, their mechanisms, and their particular rate-limiting steps that remain to be explored. These were not the subject of our study. The goal of our manuscript was to test the previously presented rate compensation model of enzyme cold adaptation.

      (4) The authors should consider the possibility that the pressure to compensate for the cold-induced decrease in kcat for enzymes from cold-adapted organism will be strongest for highly evolved metabolic enzymes with values of kcat/Km close to the diffusion controlled limit. In cases where the enzyme starts out as less than perfect, an organism adapting to the cold might derive smaller, or even negligible advantages, from natural-selection of enzymes with enhanced kinetic parameters. For example, the organism might also minimize the effect of this change in kinetic parameter, by an adjustment or diversion of flux through the networks of metabolic pathways in which the enzyme functions. One possible explanation for the weak correlation observed between kcat and Tgrowth for ketosteroid isomerase is that the organisms studied gain little from optimization of the activity of this enzyme in cold-adapted organisms. One risk in the use of the larger BRENDA database may be the failure to account for differences in the pressure for enzymes to evolve to enable organisms adapt to cold environments.

      We considered these and additional models. For example, interestingly, the opposite of what the reviewer proposed has been suggested in the literature –that the slowest enzymes (“least perfect”) are under the heaviest selection pressure for optimization (see Noda-Garcia et al., 2018). Although our data indicates that temperature exerts a weaker force on enzyme activity than previously proposed, it is indeed possible that subgroups of enzymes do indeed adapt to temperature through changes in activity. Deciphering this and other pressures is an important future challenge. We did not parse the data in this report out of concerns for “p-hacking” or multiple hypothesis testing.

      Reviewer #3 (Public Review):

      Enzyme catalysis underlies all living processes. Understanding the effects of temperature on enzymes is important in understanding how they are adapted to particular environmental conditions, and also relates to the response of organisms and even ecosystems to changes in temperature. The essential question is: what determines optimal growth rates of organisms, and the optimal temperature of other biological processes? Two potentially important factors are enzyme stability and catalytic activity.

      This manuscript collates data from previous investigations and presents new results on KSI variants, aiming to look at the interesting question of what factors are important in relating enzyme activity and stability to optimum growth temperatures of organisms. It presents a useful survey of published data, particularly focusing on the enzyme ketosteroid isomerase (KSI) for which new resluts for a number of variants are presented, building on nice recent work by this group. The main finding in this manuscript is that enzyme optimum temperatures do not correlate well with enzyme activity. This has been found also previously. The manuscript provides quite an extensive analysis and is consistent with previous results and findings. There is useful information in this manuscript, and the compilation of data will be useful to the community, but some crucial aspects and recent relevant work are not covered, and the discussion is limited. The analysis does not identify any relevant determinant of optimum temperature, and the focus on a single temperature in each case may be misleading.

      We do not agree that our analysis is “misleading.” We would characterize the prior analysis based on a small number of examples that were not randomly selected as potentially misleading. In contrast, we tested the prior conclusions with all relevant data that are available. We also highlight the power of collecting more data by further reporting the rate enhancement of 20 enzyme variants in depth. Temperature compensation through activity may still occur in specific settings, as we have noted in the Discussion.

      We agree with Reviewer #3 about the vast potential to use temperature dependencies to relate to evolutionary pressures and adaptations from molecules to organisms. This is a prime area for future investigation.

      Previous analyses have shown that optimum rates of enzymes do not correlate with optimal growth temperatures (e.g. Elias et al (2014) Trends in Biochemical Sciences 39, 299; Peterson (2004) Journal of Biological Chemistry 279, 20717; Thomas & Scopes (1998) Biochemical Journal 330, 1087; Lee et al (2007) FASEB Journal 21, 1934). This is particularly notable for psychrophilic (cold adapted) enzymes, but is also apparent from the fact that enzymes from the same organism often have quite different optimum temperatures. The data collected in the current manuscript are consistent with previous analyses and so are usefully confirming of this. The authors note that optimal growth temperatures may not correlate with activity for a number of reasons, including that the individual enzyme rate may not be under evolutionary pressure. Also, obviously, as noted by the authors, factors other than temperature are also important in enzyme evolution.

      We agree that it is obvious that factors other than temperature are important in evolution, but here we address whether the adaptation to temperature is accompanied by a common response. As noted, more catalysis for organisms at lower temperature was concluded previously and (as noted by Reviewer #2) is expected. However, this conclusion, upon further analysis (carried out herein) appears not to hold. Thus, even when organisms are adapting to temperature, other factors appear to be dominant. This was not previously known. The analyses the reviewer notes refer to thermal parameters derived from the temperature dependence of the rate constant for a given enzyme as a function of temperature, rather than what is addressed herein –the relative rate constant for enzymes from organisms with different growth temperatures.

      There is somewhat better correlation of enzyme stability with optimum growth temperatures, but it is not strong. Therefore, other factors must be important in determining optimum growth temperatures. The authors briefly mention some possibilities. One factor is that a given enzyme may not be a bottleneck in a metabolic pathway. It is not clear that KSI is in fact a metabolic limiter. Also, for many metabolic pathways, it may be essential to consider the kinetics of the pathway as a whole, which may not be determined by a single enzyme. Directly relevant here is the recent proposal of the 'inflection point hypothesis', which provides an explanation of these observations (Prentice et al. Biochemistry (2020) 59, 3562), which the authors do not mention, and may not be aware of. This hypothesis proposes that, rather than alignment of optimum temperatures or stabilities, rather the inflection points of enzymes in a metabolic pathways are aligned at the mean environmental temperature for the organism. This has the effect of coordinating relative enzyme rates and preventing metabolic disruption as temperature fluctuates. Also relevant here is that the response of metabolic pathways in general is not determined solely by a single enzyme. Prentice et al. show that, in general, the temperature-dependent properties of each enzyme in the pathway is important in determining the temperature dependence of the whole pathway.

      We thank Reviewer #3 for bringing this work to our attention and we have included it in the revised manuscript. This paper points out additional complexities regarding metabolic coordination of relative enzyme rates, enhancing points made in the Discussion.

      It is certainly important to understand what molecular features determine the temperature dependence of enzyme activity and its relationship to stability. Some previous proposals are mentioned in the manuscript. One important factor at the molecular level, mentioned by the authors, is work of Åqvist, Brandsval and coworkers, who have convincingly shown that activation entropy and enthalpy differ significantly between psychrophilic enzymes and their mesophilic and thermophilic counterparts. For small soluble enzymes, this is particularly due to changes at the enzyme surface, which may also affect stability. As mentioned by the authors, there have been many proposals over the years that suggest a relationship between stability and activity, though there is not a simple general relationship.

      The cited study is based on molecular dynamics simulations and underlying potentials which can provide models to be tested via experiment. Our analyses relate to this model in that they suggest that rate compensation (to temperature) is not general and so a universal linkage of temperature, flexibility and catalysis is not expected.

      Also directly relevant for the discussion here is what factors limit enzyme activity as temperature increases. The traditional view is that loss of activity is due to protein unfolding at high temperatures (the poor correlation of stability with growth temperatures found here indicates that this cannot be a general explanation). There is increasing evidence that this simple picture is wrong (see e.g. Daniel & Danson. (2010) Trends in Biochemical Sciences 35, 584). This behavior may be accounted for by conformational (e.g. two state) effects as proposed by Danson et al, distinct from the 'flexibility' proposals mentioned in the supporting information here. The introduction of the manuscript here states that "reaction rates are reduced at lower temperatures" , which might naively seem obvious but actually is not universally true, many reactions do not display simple Arrhenius-type behavior (see e.g. Kohen and Truhlar PNAS 2001 98 848). Many enzymes show a temperature of optimum activity, i.e. activity drops above the optimum temperature but before unfolding occurs. As the authors note, Arcus et al. show that this can be accounted for by an activation heat capacity, significantly larger in psychrophiles. Signatures of this behavior are apparent at the large scale (e.g. Schipper et al Global Change Biol. 2014 20 3578; Alster et al (2016) Front. Microbiol. 7:1821) and it appears to be generally important.

      We also are enthralled by the many proposals put forward for the physical and thermodynamic behavior of enzymes and we look forward to rigorous tests of the predictions of these models. Like Reviewer #3, we expect that there are many different features and properties of enzymes to discover!

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript: "Cavefish adapt to hypoxia by constitutive overexpression of hypoxia-inducible factor genes and increased erythrocyte development" by Corine van der Weele and William Jeffery addresses an interesting topic. The authors find that cavefish of the species Astyanax mexicanus develop more red blood cells than surface fish of the same species when raised in the laboratory under normoxic conditions. The authors perform a detailed analysis of the developmental origins of hematopoiesis and find an expansion of the hematopoietic domains in cavefish embryos. Further the authors test how cavefish respond to chemically induced hemolitic anemia and how growth is affected under artificial hypoxic conditions. Finally, they look at transcriptional regulation of known hypoxia response genes.

      Major comments:

      While the authors are performing a careful and conclusive developmental analysis of the hematopoiesis in these fish,there are a few inconsistences. It would be nice to see comparable timepoints in both the insitu and the qPCR analysis. For example, the authors show in Figure 1G/H 36 and 84 hpf timepoints while the qPCR is performed at different stages. This is especially relevant as the authors make quantitative statements from whole mount in situ analysis which are not necessarily suited to do quantitative comparisons. Furthermore, they are not using the same genes as readout as they use hbb2 in the insitu and hbbe2 in the qPCR analysis.

      We thank the reviewer for this comment. We now report gene expression experiments done at the same developmental stages and with same genes in revised Figure 2. More specifically: (1) the hbb2, hbbe2, and gfi1aa qPCR results are shown at 24 and 60 hpf, (2) the hbb2 in situ hybridization results are shown at 24 and 60 hpf in parallel with the qPCR results for this gene, (3) in situ hybridization results for hbb2 are still included at 36 hpf because this stage most obviously illustrates expression enhancement in the cavefish yolk mass, (4) in situ results for hbbe2 are not presented because after many attempts using probes made from the coding or UTR regions of this gene, we were unable to obtain a reactive RNA probe, and (5) gfi1aa in situ is shown only for the 24 hpf stage because this was the stage in which this gene was strongly expressed according to qPCR data. The overall conclusion that hemoglobin and gfi1aa transcription factor genes are upregulated in cavefish compared to surface fish remains the same.

      While the growth study is conceptually interesting, it is unclear why the authors average all 12 larvae even though they keep them individually. They mention that this is to avoid "pseudoreplication", however I am not sure why that would be the case. It would be important to see all the data. Also, in Figure 4B the statistics for the comparison between surface normoxic and hypoxic are not shown, even though the text mentions it as significant.

      Because the growth study has now been repeated using a hypoxia chamber to better control oxygen levels, these comments are no longer specifically relevant. New data is presented in revised Figure 6. In this figure, data changes for individual larvae and their statistical significance are included.

      Reviewer #3 (Public Review):

      In this manuscript van der Weele and Jeffery propose two evolutionary mechanisms, via which cavefish (Mexican tetra Astyanax mexicanus) have adapted to a hypoxic environment based on direct comparison with hypoxia-sensitive surface dwellers of the same species. These adaptations are increased erythrocyte production and heightened transcription of hypoxia-inducible factor 1 (hif1). Using a combination of time lapse imaging, bright field microscopy and in situ hybridization with a battery of hematopoietic markers, they provide clear and compelling evidence that cavefish have a larger number of erythrocytes than surface dwellers and that this increase stems from an expansion of two erythropoiesis domains during early development. To address the functional relevance of increased erythrocytes they induced hemolytic anemia using the drug phenylhydrazine (PHT) and showed that cavefish are less sensitive to this disorder than surface fish, possibly as a result of the increase in erythrocytes. Nevertheless, both types of fish exhibited developmental anomalies post-treatment, including reduced tail length. They further propose that the larger number of erythrocytes may promote adaptation to hypoxia by countering the negative impact of this environmental stressor on growth. Indeed, surface fish appear to be more susceptible to hypoxia-induced stunted growth of the post-anal tail than cavefish, although the numbers are quite variable. Lastly, the authors go on to show that cave dwellers express constitutively high transcript levels of hif1 genes and downstream targets of Hif1. One of these targets is the growth suppressor IGFBP1a, which could explain growth restriction of surface fish under hypoxia. Overall, even though increased erythrocyte production is a well-documented response to life in hypoxic environments, this study provides an interesting perspective on this adaptation seen through the lens of evolutionary biology.

      The data support the main conclusions that erythropoiesis and hif1 transcription are enhanced in cavefish but do not convincingly identify the functional relevance of these traits to hypoxia adaptation in cavefish. In this regard:

      1) The induction of hemolytic lysis using PHT and associated developmental defects raises concerns about lack of drug specificity.

      We appreciate this concern and cannot entirely exclude non-specific effects of PHZ. However, the PHZ approach has been used successfully in other systems, including zebrafish, to “ablate” erythrocytes and check development. In the zebrafish case (see Pelster and Bruggren,1996 in the reference list), similar concentrations of PHZ resulted in no effects on development (blood cells were not determined), which seems inconsistent with any non-specific effects. In addition, we have now performed a new experiment (see revised Figure 5D) in which treatment with a very high level of PHZ resulting in hemolysis of all erythrocytes reduced tail growth without producing notochord abnormalities, suggesting that non-specific effects on the latter cannot explain PHZ effects on growth.

      2) The connection between erythrocyte number and organismal growth is not clearly established. The reduced tail length defect that is observed following PHT treatment, even if specific, is likely to be an indirect consequence of abnormal notochord morphology rather than arrested tail growth.

      As also described immediately above, to address this concern, we have performed a new experiment in which PHZ effects on tail length have been measured for a shorter period and at higher PHZ concentration resulting in the virtual absence of red blood cells (revised Figure 5D). The results showed significant reduction in tail growth compared to controls but no effects on notochord morphology, providing evidence that effects on growth are not related to notochord defects.

      3) Because the above connection was not clearly established, the lack of growth inhibition in cavefish following exposure to hypoxia (thought to be offset by the elevated number of erythrocytes in cavefish) is not readily explainable. The data itself showing a difference in tail growth between cavefish and surface dwellers exposed to hypoxia is not strong due to high variability and low numbers.

      New results using a hypoxia chamber have now more clearly established a differential relationship between hypoxia and growth in surface fish and cavefish.

      4) Increased hif1 transcript levels may contribute to enhanced hypoxia adaptation in cavefish, however Hif1 is known to be primarily regulated at the post-translational level, resulting in its enhanced stability and activity. It is unclear whether the activity of Hif1 is elevated in cavefish relative to surface fish as several known Hif1 targets are not up-regulated in cavefish relative to surface fish.

      Thank you for reminding us about the known regulation of Hif1 at the post-translational level. We do not address this point in the revised manuscript because all of our work has been carried out at the transcriptional level. In this, we show significant upregulation of all of the hif1 genes, and two downstream target genes. Three downstream target genes are not upregulated by hypoxia, but we do not currently understand why. Nothing is known about Hif1 regulation of downstream targets in Astyanax.

      In addition, there are a few technical concerns

      5) The manner in which hypoxic water was generated (bubbling of nitrogen gas) is unlikely to maintain a constant value over time. Furthermore, covering the tank with foil will not prevent gas exchange. Hence there is a concern that the hypoxic values may be variable between trials and even within the same trial.

      Thank you for this suggestion. To address the concerns about stability of oxygen depletion within and between trials, we have now repeated all of the hypoxia experiments using a hypoxia chamber.

      6) It is important that transcript levels for the q-PCR reference gene be stable across normoxic and hypoxic conditions. A discussion about the considerations that went into selecting RPL13a as a reference gene was not provided.

      We have provided discussions justifying the use rpl13a and other genes as references for qPCR analysis.

    1. Author Response:

      Reviewer #1:

      Lee et al. identify miR-20b as a molecular regulator of hepatic lipid metabolism through the post-transcriptional regulation of the nuclear receptor PPAR alpha. Through mechanistic studies the authors identified the 3'UTR of PPARa as a direct target for miR-20b regulation of expression. The experiments are well controlled and the study provides deep mechanistic insight into the miR-20b/PPARa circuit in modulating hepatic lipid metabolism. Furthermore, the authors provide evidence that targeting the miR-20b pathway to enhance PPARa activation via synthetic ligand fenofibrate. The studies provide much needed mechanistic insight into molecular regulators of hepatic lipid metabolism in response to nutrient stress such as high fat diet. While this is a detailed and thorough assessment of this pathway, there are several issues that were identified in the review of this article outlined below:

      1) The authors state there is no off target expression of miR-20b in adipose tissue in their over expression experiments. However, per figure 4 supplement 1, EpiWAT has increased expression over controls in HFD fed conditions. Furthermore, figure 4 supplement 2 shows a functional difference in EpiWAT weight in HFD where miR-20b treated mice have higher fat weight. The authors need at the least to discuss the potential role of adipose tissue in promoting their observed phenotype.

      This is a good point. We increased the number of samples and carefully analyzed the changes of both the expression of Mir20b and the weight of epididymal adipose tissue. We observed that slight increase of Mir20b expression in epididymal adipose tissue of AAV- miR20b HFD-fed mice compared to AAV-control NCD-fed mice, not HFD-fed mice. The expression of Mir20b in adipose tissue of between AAV-control HFD and AAV-Mir20b HFD mice was not significantly changed (Figure 5-figure supplement 1).

      We have revised the text and added the discussion about the potential role of adipose tissue (page 25-26, line 582-594). Hepatic steatosis could be affected by adipose tissue through free fatty acid (FFA) release and hepatic uptake of circulating FFAs (Rasineni et al., 2021). Our results showed that the epididymal adipose tissue of HFD-fed mice was enlarged upon AAV-Mir20b treatment; however, the serum FFA levels in these mice were comparable to those in mice treated with the AAV-Control (Figure 5-figure supplement 4B)). Of note, the expression of genes related to lipolysis did not change in adipose tissues, and that of hepatic FA transporter, CD36, was decreased by AAV-Mir20b treatment (Figure 5Q and Figure 5-figure supplement 4A). In addition, excess hepatic triglycerides (TGs) are secreted as very low density lipoproteins (VLDLs), and the secretion rate increases with the TG level (Fabbrini et al., 2008). VLDLs deliver TGs from the liver to adipose tissue and contributes expansion of adipose tissue (Chiba et al., 2003). Together, these reports suggest that adipose tissue is also remodeled by the liver in HFD-fed mice and non-alcoholic fatty liver disease (NAFLD) patients. Therefore, the levels of hepatic TGs are unlikely affected by epididymal adipose tissue, and the increase in fat content (Figure 5-figure supplement 3) may be a consequence of increased hepatic TG levels.

      2) Figure 5 shows anti-miR-20b essentially restores PPARa expression. However, the rescue effects in terms of body weight, liver triglycerides and liver damage are only modestly improved. The authors need to discuss this modest effect and potentially offer alternative mechanisms aside from PPARa as the physiological target.

      Previously, we introduced AAV treatment after four weeks of high fat diet (HFD) feeding. Anti-Mir20b treatment significantly changed the expression of PPARA; however, the effect on the pathophysiological properties of the liver was significant but modest. We thought that this was because there was not enough time to make a proper impact on the liver. Thus, to maximize the effect of ani-Mir20b, the AAV was administered when the HFD was started. The new results showed more significant effects of anti-Mir20b (Figure 6).

      We also observed that other nuclear receptors, such as RORA, RORC, and THRB, could be potential targets of MIR20B (Figure 2H and Figure 2-figure supplement 3). However, in the patient data, there was no significant correlation between the expression of those nuclear receptors and that of MIR20B. In addition, among the candidate targets, only PPARA was selected as an overlapped predicted target of MIR20B by various miRNA target prediction programs, including miRDB, picTAR, TargetSCAN, and miRmap (Figure 2J, Figure 2-figure supplement 2). Consistent with these results, we observed that Ppara, not other nuclear receptors, is the target gene of MiR20b in both AAV-Mir20b and AAV-anti- Mir20b mice (Figure 5-figure supplement 2, Figure 6-figure supplement 2). Thus, we focused on PPARA as a MIR20B target in NAFLD.

      3) The authors performed experiments with mutated 3'UTR of PPARa and show mutated PPARa is refractory to regulation by miR-20b. However, the authors provide no functional evidence that mutating the 3'UTR of PPARa elicits changes in hepatic lipid metabolism. Discussion of this point is needed at the minimal.

      Thank you for your comment. To provide functional evidence, we tried to establish the PPARA 3’UTR mutation knock-in (KI) system in cells. However, we could not succeed because of technical difficulties and time constraints. Alternatively, we introduced the wild type PPARA open reading frame (ORF) followed by either the wild type (WT) or mutant (Mut) 3’UTR of PPARA in HepG2 cells, and analyzed the importance of the 3’UTR of PPARA. As shown in Figure 2-figure supplement 5C, MIR20B significantly suppressed the expression of PPARA and its target genes in PPARA-3’UTR WT expressing cells. Furthermore, Oil Red O staining showed that MIR20B expression increased the intracellular lipid content in these cells (Figure 2-figure supplement 5B). However, MIR20B did not have an effect on either the expression of PPARA and its target genes or intracellular lipid content in PPARA-3’UTR Mut expressing cells (Figure 2-figure supplement 5C, D). We have added the new results in page 17-18, line 350-359 and Figure 2-figure supplement 5.

      Reviewer #2:

      1) In the experiments depicted in Figures 1D and E, did OA treatment of HepG2 and/or Huh-7 cells produce a reduction in the levels of mRNA encoding PPARalpha (or PPARalpha protein levels) in concordance with the shown rise in mRNA for miR-20b?

      Thank you for your question. The samples used in Figure 1C–E were also analyzed to observe the changes in the expression of PPARA (Figure 2-figure supplement 4A–C). In each sample, the increase in MIR20B expression resulting from oleic acid (OA) treatment and HFD was accompanied by a reduction in the levels of PPARA mRNA.

      2) Moreover, Figure 1 shows a fuller landscape of the transcriptional impact of microRNAs in context of obese livers in mice and human. Given this, what made miR20-b more interesting than, for example, miR106a, miR-17, or others that also appear to be robustly regulated? Why focus on miR20b?

      This is a very good point. In the analysis of the regulatory network, other miRNAs including MIR129 and MIR106A appeared to possibly regulate nuclear receptors in NAFLD. We further confirmed the relationship between candidate miRNAs and NAFLD progression in patient samples. As shown in the revised Figure 1B, we observed that the expression of MIR20B was more robustly and significantly changed with NAFLD progression than that of MIR129 and MIR106a. This tendency was also confirmed in other experiments using OA- treated HepG2 and Huh7 cells or HFD-fed mice (Figure 1-figure supplement 4). Thus, we focused on the role of MIR20B in NAFLD. Nevertheless, we do not rule out the possibility that other miRNAs may be involved in NAFLD progression. Subsequent studies may uncover the roles of other miRNAs in liver physiology.

      3) What does the rank and p-value exactly represent in tabular part of Figure 1A? This is very unclear as shown, including the figure legend.

      The p-values in the table of Figure 1A were obtained from the hypergeometric distribution used for testing the enrichment of downregulated nuclear receptors among the targets of a miRNA. In other words, they indicate the probability of having downregulated nuclear receptors among the miRNA targets. They were calculated by the following equation:

      where N is the total number of genes analyzed, M is the number of candidate target genes of the miRNA, D is the downregulated NR genes, and O is the observed overlap between miRNA targets and the downregulated NR genes as described in the Materials and Methods (page 9, line 155-157). The ranks in the table were determined according to the p-value. The legend of Figure 1A has been modified as follows:

      “Figure 1. MIR20B expression is significantly increased in the livers of dietary and genetic obese mice and humans. (A) The miRNA regulatory networks for NR genes downregulated in the transcriptome of NAFLD patients. The adjusted p-values in the table represent the enrichment of miRNA targets in the downregulated NR genes (hypergeometric distribution).”

      4) Figure 1, supplement 1 shows characteristics of patients involved in data for Figure 1, etc. This shows that the normal patients are younger than the other two groups, the M-F ratio is not identical (more female in the normal group), and the total cholesterol levels are not well matched either. What other parameters are available? Hemoglobin A1c? Fasting glucose? In the end, we need to know that the groups, apart from the severity of NAFLD and NASH, were well matched. Given the small size of each group (n = only 4-5, this matching is critical to avoid confounding of the relationship between miR-20b, PPARalpha, and NAFLD/NASH progression.

      Thank you for your comment. Accordingly, we have included the patient information in a table (Figure 1-figure supplement 1A, B). To increase the statistical power and prevent confounding effects, we increased the number of samples and tried to match them to compare age, weight, and male/female ratio between the groups. Due to the limited number of patient samples, the cohorts could not be perfectly matched. Nevertheless, there were no significant differences in age and male/female ratio among the three groups. Specifically, serum AST, ALT, and fasting glucose levels were significantly increased with progression from normal to non-alcoholic steatohepatitis (NASH), but total cholesterol was comparable as previously reported (Chung et al., 2020). We have revised the text in page 7-8, line 118- 130.

      5) The title of Figure 2 relates to PPARalpha. However, in Figure 2G, it is clear that several NRs are downregulated by miR20b overexpression in cells. Although the paper focuses on PPARalpha, should the authors not explore at least some of the other hits to ensure that the impact of PPARalpha is of particular importance vs. others?

      This is a good point. We also observed that other nuclear receptors, such as RORA, RORC, and THRB, could be potential targets of MIR20B (Figure 2H and Figure 2-figure supplement 3). However, in the patient data, there was no significant correlation between the expression of those nuclear receptors and that of MIR20B. In addition, among the candidate targets, only PPARA was selected as an overlapped predicted target of MIR20B by various miRNA target prediction programs, including miRDB, picTAR, TargetSCAN, and miRmap (Figure 2J, Figure 2-figure supplement 2). Consistent with these results, we observed that PPARA, not other nuclear receptors is the target gene of MIR20B in both AAV-Mir20b and AAV-anti-Mir20b mice (Figure 5-figure supplement 2, Figure 6-figure supplement 2). Thus, we focused on PPARA as a MIR20B target in NAFLD.

      6) In Figure 3, the data show, presumably, that OA induces miR20b, which then represses PPARalpha and, in turn, CD36 downstream of PPARalpha. If this is the case, then how does OA continue to get into the cells? Once CD36 expression falls dramatically, doesn't the key OA uptake mechanism fall with it? Then, does the induction of miR20b abate? Or, does FATP6 or another uptake mechanism account for OA entry into these cells?

      This is a good point. FA uptake was decreased by overexpression of MIR20B, and was accompanied by a considerable decrease in CD36 expression (Figure 4B, J). However, other lipid transporters such as FATPs were not significantly altered (Figure 4-figure supplement 5), suggesting that FA uptake is continued by these transporters. The expression of CD36 is relatively low in normal hepatocytes, and the molecule may not be the primary fatty acid transporter in these cells (Wilson et al., 2016). Furthermore, the decrease in FA uptake upon CD36 KO is modest even during a HFD (Wilson et al., 2016). In addition, we observed that the expression of MIR20B is induced and increased for up to 24 h by OA treatment. This is followed by a slight decrease, remaining at a constant elevated level (Figure 4-figure supplement 6). Together, the findings indicated that other fatty acid transporters contributing to FA uptake account for the entry of OA into cells. We have added these discussion in page 25, line 571-581.

      7) Similarly, what happens to AGPAT, GPAT, and DGAT expression in context of OA treatment and modulation of miR20b? Does the capacity of the cell to store OA in the form of triglyceride inside of lipid droplets change, so that the amount of free OA or oleyl-CoA inside the cell rises? Could this impact the transcriptional phenotype?

      This is a very good point. Accordingly, we analyzed the transcriptional phenotype in the context of OA treatment and modulation of MIR20B. The expression of glycerolipid synthetic genes, including AGPATs, GPATs, and DGATs, was increased by OA treatment, but MIR20B overexpression did not influence the expression of lipogenic genes except for that of DGAT1. However, treatment with anti-MIR20B significantly reduced the expression of glycerolipid synthetic genes, including GPATs and DGATs, under OA treatment (Figure 4C, N). These results suggested that MIR20B is necessary but not sufficient to induce the expression of glycerolipid synthetic genes under OA treatment. We have shown that OA induces the expression of MIR20B (Figure 1C), which can explain why MIR20B overexpression did not show an additional enhancement under OA treatment. The increase in DGAT1 expression induced by MIR20B might contribute to the increase in TG formation and capacity to store OA. This could change the flux of oleyl-CoA to TG synthesis, not β-oxidation with reduced expression of lipid oxidation-associated genes (Figure 4B). Thus, we can expect that the decrease in OA uptake and increase in TG formation induced by MIR20B resulted in reduced amounts of OA or oleyl-CoA inside the cell. However, as lipid consumption through FA oxidation is decreased by MIR20B, free OA or oleyl-CoA might be maintained at a stably increased level compared to that of OA-untreated MIR NC or MIR20B condition, and the impact of the changes in OA or oleyl-CoA levels on the transcriptional phenotype might not be significant as found in a constant elevated level of MIR20B by OA (Figure 4-figure supplement 6). We have added these results in Figure 4C and the Discussion (page 26, line 595-610). Due to technical constraints, we could not measure the amounts of free OA and oleyl-CoA.

      8) In Figure 3P, would the impact of anti-miR on the effect of OA on FASN be lost in PPARalpha KO cells? This would really test the functional relevance of the purported transcriptional hierarchy.

      Thank you for your valuable comment. We tested the impact of anti-MIR20B treatment on OA-treated PPARA knock-down (KD) cells, not KO cells, due to technical constraints. PPARA KD cells showed a significant decrease in PPARA expression. As shown in Figure 4- figure supplement 4I, anti-MIR20B treatment enhanced the expression of PPARA but did not have a significant effect on fatty acid synthase (FASN) expression in both control and PPARA KD cells. In addition, PPARA KD did not affect FASN expression. The expression patterns of PPARα target genes differ between mice and humans. FASN is regulated by PPARα in mice, but this is unclear in humans (Rakhshandehroo, Hooiveld, Muller, & Kersten, 2009; Rakhshandehroo, Knoch, Muller, & Kersten, 2010). Moreover, fenofibrate, a PPARα agonist, reduces the expression of FASN in methionine choline-deficient (MCD)-fed mice (Cui et al., 2021). Here, we used human HepG2 cells to investigate the effect of OA and MIR20B. It is plausible that FASN might not be regulated by PPARα in our system. We have added these results in Figure 4-figure supplement 4I.

      9) The authors should really at least perform a bulk RNAseq analysis to confirm the similarity of the effect of miR20b or anti-miR seen in cells, at the mouse or human liver tissue level. As it is, they only look at 3 FAOX genes, 2 FA uptake associated genes, and 2 FA synthesis genes. This is not very comprehensive as a validation of the in vitro data, although it is intriguing. Or, at the very least, look at a large validated set of PPARalpha target genes in vivo.

      Thank you for your comment. Accordingly, we selected PPARα target genes altered by MIR20B in OA-treated cells (Figure 4-figure supplement 1A, B), and then examined the hepatic expression of PPARα target genes in HFD-fed mice treated with MIR20B or anti- MIR20B (Figure 5R and 6R). The expression of most PPARα target genes was decreased by OA treatment and the HFD, and MIR20B treatment further reduced their expression. In contrast, anti-MIR20B treatment rescued the reduced expression of PPARα target genes under OA treatment and the HFD. These results suggested that MIR20B suppresses PPARA in vivo, which is consistent with the results from cells. We have added these results in Figure 4-figure supplement 1A, B, Figure 5R, and Figure 6R.

      10) Notably, the figures in general do NOT show individual data points. This is the standard for visual display, rather than bar graphs with simple SEM bars.

      Thank you for your comment. We have revised the graphs to include individual data points.

      11) The in vivo data (e.g. Figure 4) are very low n values. Augmenting this would add confidence to the data. As an example, of inconsistencies potentially stemming from very low n, the liver weights (Figure 4F) are not very different across groups, although the triglyceride levels in the livers (Figure 4H) are more than twice as high. The images of liver specimens shown as examples (Figure 4F) are also more dramatic than the weights would indicate. Note also that the body weights of the mice (Figure 4C) are different as well, and this alone could explain the livers being modestly heavier. Indeed, the extent of body weight excess mirrors the extent of liver weight excess, suggesting that the entire animal may be larger across multiple metabolic tissues including adipose. This is proven in Figure 4D, where the fat mass looks to be larger as well. To this end, Figure 4 supplement 2 shows multiple tissue weights to be increased in this model, suggesting that specificity for hepatic steatosis may be low.

      Thank you for your comment. Accordingly, we conducted additional in vivo experiments with larger n values (n = 10). Then, we replaced the liver images with more representative ones. AAV-Mir20b robustly induced the hepatic expression of Mir20b and significantly increased the liver weight and hepatic TG levels (Figure 5F, 5I). In the liver of normal human, intrahepatic TGs do not exceed 5 % of the liver weight (Fabbrini & Magkos, 2015). In our results, TG levels were increased more than three times by the HFD, but the impact on liver weight was limited, as TGs did not account for more than 10 % of the liver weight (Figure 5I). Excess hepatic TGs are secreted as very low density lipoproteins (VLDLs), and the secretion rate increases with the TG level (Fabbrini et al., 2008). VLDLs deliver TGs from the liver to adipose tissue and other metabolic tissues (Heeren & Scheja, 2021). The excess hepatic TGs induced by MiR20b were presumably transferred to epididymal adipose tissue, contributing to the increase in adipose tissue weight, while inguinal and brown adipose tissues were not significantly affected by MiR20b (Figure 5-figure supplement 3). Together, the fat mass measured by EchoMRI included intrahepatic and adipose TGs, and mirrored the increases shown in Figure 5D. In addition, MiR20b induced the expression of hepatic DGAT1, which could explain increased TG secretion through VLDLs (Figure 4C) (Alves- Bezerra & Cohen, 2017; Liang et al., 2004).

      Conversely, the supply of FFAs from adipose tissue might have contributed to hepatic steatosis. However, we observed that there were no significant changes in the expression of Mir20b and lipolytic genes in adipose tissue (Figure 5-figure supplement 4A). Furthermore, the serum FFA levels in the AAV-Control and AAV-Mir20b groups under the HFD were comparable (Figure 5-figure supplement 4B). These findings suggested that increased intrahepatic TG levels constituted the specific and primary effect of AAV-Mir20b.

      12) In figure 5 S1, the anti-miR20b substantially reduces the weights of multiple tissues in mice fed a HFD, given this, why does overall body weight (figure 5c) show such a modest difference. Figure 5 E and F also suggest that the overall weights would have been lower than shown in Figure 5C. In the end, instead of bar graphs of the final weights, the entire weight curve for the mice fed the HFD should have been shown.

      Thank you for your comment. To make our results more robust, we increased the sample size (n = 10). Moreover, we provided the entire weight curve and revised the results (Figure 6C). AAV-anti-Mir20b treatment significantly reduced the liver weight (Figure 6F). The weight of adipose tissue, including epididymal white adipose tissue (EpiWAT), tended to decrease; however, the difference was not significant (Figure 6-figure supplement 3). As indicated in a previous question (#11), the change in hepatic TG levels could affect the weight of other tissues. In our revised Figure 6C, we show that the overall weight change might be higher than the sum of weight change of specific metabolic tissues, such as the liver and adipose tissues.

      13) How well were the NAFLD vs. normal GSE individuals matched? This is very important, since PPARalpha emerges from comparing these data sets. Matching is very important to make sure that the differences in NR expression does not stem from a confound that went along win parallel with the NAFLD cohort vs. the normal GSE cohort.

      This is a very good point. PPARA emerged from regulatory network analysis (Figure 1A) and was selected as target of MIR20B through the analysis of RNA-seq data from MIR20B- overexpressing HepG2 cells (Figure 2). By constructing a regulatory network in NAFLD patients, we determined that MIR20B is responsible for NR regulation in NAFLD. As shown in Figure 1A, we analyzed the differential expression of NR in NAFLD using public GSE data (GSE130970) consisting of patients with NAFLD and age- and weight-matched normal controls (Hoang et al., 2019). To verify the expression of MIR20B, we assessed the miRNA levels in another non-coding RNA GSE dataset (GSE40744) in the original manuscript (previous Figure 1B). However, in the process of reviewing GSE40744 patients’ information with physicians, we found that some of the patients were virus-infected. Thus, we removed the data from GSE40744 and truly apologize for the confusion.

      In the revised manuscript (page 16 line 303-304), we examined the expression of MIR20B and other candidate miRNAs such as MIR129 and MIR219A in patient samples from the Asan Medical Center (Seoul, Republic of Korea), who were diagnosed by pathologists and age- and weight-matched. As shown in Figure 1B, MIR20B is one of the main miRNAs involved in NAFLD progression. In addition, the expression of PPARA was significantly negatively correlated with that of MIR20B (Figure 2-figure supplement 3).

      Reviewer #3:

      In this manuscript, Le et al. use an elegant combination of cultured cells, patient samples, and mouse models to show that miR-20b promotes non-alcoholic fatty liver disease (NAFLD) by suppressing PPAR-alpha. The authors show that miR-20b inhibits PPAR-gamma expression, resulting in reduced fatty acid oxidation, decreased mitochondrial biogenesis, and increased hepatocyte lipid accumulation both in vitro and in vivo. Inhibition of miR-20b in mouse NAFLD models leads to increased PPAR-gamma, reduced hepatic lipid accumulation, decreased inflammation, and improved glucose tolerance. Overall, the data are well-controlled and support the authors' conclusions.

      Strengths:

      1) In Figure 1, the authors show miR-20b is increased in NAFLD patients, mouse obesity/NAFLD models, and cultured liver cancer cells treated with oleic acid (OA). The use of multiple complementary approaches is very powerful, although more information regarding the diagnoses in the 13 patient samples would be helpful (see below).

      Thank you for your comment. Accordingly, we have included the patient information in a table (Figure 1-figure supplement 1A, B). To increase the statistical power and prevent confounding effects, we increased the number of samples and tried to match them to compare age, weight, and male/female ratio between the groups. Due to the limited number of patient samples, the cohorts could not be perfectly matched. Nevertheless, there were no significant differences in age and male/female ratio among the three groups. Specifically, serum AST, ALT, and fasting glucose levels were significantly increased with progression from normal to non-alcoholic steatohepatitis (NASH), but total cholesterol was comparable as previously reported (Chung et al., 2020). We have revised the text in page 7-8, line 118- 130.

      2) In Figure 2, the authors show that PPAR-alpha is a direct target of miR-20b. These data include a luciferase reporter assay regulated by the 3'UTR of PPAR-alpha. Importantly, when the 3'UTR is mutated, suppression of luciferase expression by miR-20b is no longer observed. The authors use multiple different algorithms to predict miR-20b targets, look for overlap, and then confirm PPAR-alpha as the most important "hit" in vitro.

      3) Figure 3 highlights changes in fatty acid metabolism in HepG2 cells transfected with miR-20b, miR-NC, or anti-miR-20b and treated with oleic acid. Figure 3, supplement 4 shows that anti-miR-20b can alleviate OA-induced hepatic steatosis in both HepG2 cells and primary hepatocytes. The use of another (primary) cell line here is important, because HepG2 is a liver cancer cell line, and metabolic changes in HepG2 cells might not be representative of non-neoplastic hepatocytes.

      4) In Figure 4, the authors show that miR-20b promotes hepatic steatosis, increases liver weight, increases liver injury markers, and impairs glucose tolerance and insulin sensitivity in HFD-fed mice. Conversely, anti-miR-20b inhibits hepatic steatosis, decreases liver weight and liver injury markers, and improves glucose tolerance and insulin sensitivity in HFD-fed mice (Figure 5). Anti-miR-20b also inhibits hepatic steatosis and fibrosis and decreases liver injury markers in MCD-fed mice (Figure 8). These in vivo studies provide excellent support for the authors' hypothesis regarding the role of miR-20b in promoting fatty liver disease. The liver readily takes up small nucleic acids, including miRs and anti-miRs. Thus, the possibility of using anti-miR-20b as a therapeutic for fatty liver disease is intriguing, and supported by these experiments.

      5) In Figure 6, in HepG2 cells, the authors demonstrate that PPAR-alpha overexpression (or to a lesser extent fenofibrate treatment) is able to rescue the transcriptional effects of miR-20b overexpression. Conversely, siPPAR-alpha can rescue the transcriptional effects of anti-miR-20b. Similar results are shown in Figure 7-fenofibrate is able to at least partially suppress some of the metabolic phenotypes that are exacerbated by miR-20b overexpression in HFD-fed mice (the decreased lean/BW ratio, elevated fasting glucose, some transcriptional changes). Again, it is nice to see that the in vitro data is supported by in vivo results.

      Thank you for your comments.

      Weaknesses:

      1) In Figure 3, figure supplement 2, it seems the effects of miR-20b overexpression in primary hepatocytes may be a bit overstated. While it does seem that miR-20b enhances the accumulation of fat in primary hepatocytes upon OA treatment, miR-20b overexpression alone does not seem to have significant effects on steatosis (A), cholesterol (B), or triglycerides (C).

      Thank you for your comment. We have revised the text; “Unlike in HepG2 cells (Figure 2A-C), MIR20B alone did not induce lipid accumulation in primary hepatocytes without OA treatment, but MIR20B significantly increased lipid accumulation in the presence of OA (Figure 4-figure supplement 2)” (page 19, line 383-385). “Figure 4-figure supplement 2. MIR20B enhances lipid accumulation in primary hepatocytes under OA-treatment” (the title of Figure 4-figure supplement 2)

      2) Histologic analysis of mouse liver samples by a pathologist is lacking. In Figure 4, is there increased inflammation and/or fibrosis with miR-20b overexpression, or just increased steatosis? In Figure 4 and Figure 8, it would be helpful if steatosis, fibrosis, and inflammation were quantified/scored histologically.

      Thank you for your comment. Accordingly, we have conducted histological analysis and measured the NAFLD activity score (NAS) and fibrosis score by a pathologist. We have added the scoring graphs in Figure 5H, 6H, 7H, 8I, 8J, 9G, and 9H. In Figure 5G and 5H, AAV-Mir20b significantly increased steatosis but the increase of inflammation was not significant under the HFD; However, AAV-anti-Mir20b significantly decreased steatosis and inflammation, fibrosis under the MCD (Figure 8H-J). In addition, the combination of AAV-anti- Mir20b with fenofibrate significantly alleviated steatosis, inflammation, and fibrosis compared to AAV-Control under the MCD (Figure 9F-H).

      3) The effects of anti-miR-20b on hepatic triglycerides and inflammatory markers in vivo are modest (Figures 5 and 8). Perhaps an enhancement could be seen by combining anti-miR-20b with fenofibrate. While the authors show that fenofibrate's effects are suppressed with miR-20b overexpression, they don't examine what happens when fenofibrate is combined with anti-miR-20b. To me, this experiment is critical to determine if PPAR-alpha activity could be further maximized to combat NAFLD (beyond what is seen with fenofibrate alone).

      This is a very good point. Accordingly, we performed a new experiment in which fenofibrate was combined with anti-Mir20b to treat MCD-fed mice. The combination showed further improvements compared with those obtained by fenofibrate treatment alone. The results have been described in page 23-24, line 518-536.

      “Recently, drug development strategies for NAFLD/NASH are moving toward combination therapies (Dufour, Caussy, & Loomba, 2020). However, the efficacy of developing drugs, including fenofibrate, against NAFLD/NASH is limited (Fernandez-Miranda et al., 2008). Thus, we tested whether the combination of anti-Mir20b and fenofibrate would improve NAFLD in MCD-fed mice. The levels of hepatic Mir20b were reduced after administration of AAV-anti-Mir20b in MCD-fed mice compared to those in mice administered with AAV-Control, and this reduction was also observed after fenofibrate treatment (Figure 9A). Interestingly, the combination of AAV-anti-Mir20b and fenofibrate increased the levels of PPARα to a greater extent than AAV-Mir20b alone (Figure 9B, C). AAV-anti-Mir20b or fenofibrate administration significantly reduced the liver weight and hepatic TG levels, and co- administration further reduced hepatic steatosis (Figure 9D, E). Histological sections showed that the combination of AAV-anti-Mir20b and fenofibrate improved NAFLD, as evidenced by the effects on both lipid accumulation and fibrosis in the liver (Figure 9F-H). Consistently, the levels of AST and ALT were significantly lower after combined treatment with AAV-anti- Mir20b and fenofibrate than after a single treatment (Figure 9I, J). In addition, the expression of genes related to hepatic inflammation, such as Tnf and Il6 (Figure 9K), and fibrosis, such as Acta2, Col1a1, Fn, and Timp1, (Figure 9L), was further decreased by the combination of AAV-anti-Mir20b and fenofibrate. These results suggest that AAV-anti-Mir20b may increase the efficacy of fenofibrate, especially its effect on fibrosis, and provide a more effective option for improving NAFLD/NASH."

    1. Author Response:

      Reviewer #1 (Public Review):

      Cell surface proteins are of vital interest in the functions and interactions of cells and their neighbors. In addition, cells manufacture and secrete small membrane vesicles that appear to represent a subset of the cell surface protein composition.

      Various techniques have been developed to allow the molecular definition of many cell surface proteins but most rely on the special chemistry of amino acid residues in exposed on the parts of membrane proteins exposed to the cell exterior.

      In this report Kirkemo et al. have devised a method that more comprehensively samples the cell surface protein composition by relying on the membrane insertion or protein glycan adhesion of an enzyme that attaches a biotin group to a nearest neighbor cellular protein. The result is a more complex set of proteins and distinctive differences between normal and a myc oncogene tumor cells and their secreted extracellular vesicle counterparts. These results may be applied to the identification of unique cell surface determinants in tumor cells that could be targets for immune or drug therapy. The results may be strengthened by a more though evaluation of the different EV membrane species represented in the broad collection of EVs used in this investigation.

      We thank the reviewer for recognizing the importance of the work outlined in the manuscript. We have addressed the necessary improvements in the essential revisions section above.

      Reviewer #2 (Public Review):

      This paper describes two methods for labeling cell-surface proteins. Both methods involve tethering an enzyme to the membrane surface to probe the proteins present on cells and exosomes. Two different enzyme constructs are used: a single strand lipidated DNA inserted into the membrane that enables binding of an enzyme conjugated to a complementary DNA strand (DNA-APEX2) or a glycan-targeting binding group conjugated to horseradish peroxidase (WGA-HRP). Both tethered enzymes label proteins on the cell surface using a biotin substrate via a radical mechanism. The method provides significantly enhanced labeling efficiency and is much faster than traditional chemical labeling methods and methods that employ soluble enzymes. The authors comprehensively analyze the labeled proteins using mass spectrometry and find multiple proteins that were previously undetectable with chemical methods and soluble enzymes. Furthermore, they compare the labeling of both cells and the exosomes that are formed from the cells and characterize both up- and down-regulated proteins related to cancer development that may provide a mechanistic underpinning.

      Overall, the method is novel and should enable the discovery of many low-abundance cell-surface proteins through more efficient labeling. The DNA-APEX2 method will only be accessible to more sophisticated laboratories that can carry out the protocols but the WGA-HRP method employs a readily available commercial product and give equivalent, perhaps even better, results. In addition, the method cannot discriminate between proteins that are genuinely expressed on the cell from those that are non-specifically bound to the cell surface.

      The authors describe the approach and identify two unique proteins on the surface of prostate cell lines.

      Strengths:

      Good introduction with appropriate citations of relevant literature Much higher labeling efficiency and faster than chemical methods and soluble enzyme methods. Ability to detect low-abundance proteins, not accessible from previous labeling methods.

      Weaknesses: The DNA-APEX2 method requires specialized reagents and protocols that are much more challenging for a typical laboratory to carry out than conventional chemical labeling methods.

      The claims and findings are sound. The finding of novel proteins and the quantitative measurement of protein up- and down-regulation are important. The concern about non-specifically bound proteins could be addressed by looking at whether the detected proteins have a transmembrane region that would enable them to localize in the cell membrane.

      We thank the reviewer for recognizing the strengths and importance of this work. We also thank the reviewer for mentioning the issue of non-specifically bound proteins. As addressed above in the essential revisions sections, we believe that any low affinity, non-specific binding proteins are likely removed in the multiple wash/centrifugation steps on cells or the multiple centrifugation steps and sucrose gradient purification on EVs. Given the likelihood for removal of non-specific binders, we believe that the secreted proteins identified are likely high affinity interactions and their differential expression on either cells or EVs play an important part in the downstream biology of both sample types. However, the previous data presentation did not clarify which proteins pertained to the transmembrane plasma membrane proteome versus secreted protein forms. For further clarity in the data presentation (Figure 3D, 4D, 5D), we have bolded proteins that are also found in the SURFY database that only includes surface annotated proteins with a predicted transmembrane domain (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). We have also italicized proteins that are annotated to be secreted from the cell to the extracellular space (Uniprot classification). We have updated the text and caption as shown below:

      New Figure 3:

      Figure 3. WGA-HRP identifies a number of enriched markers on Myc-driven prostate cancer cells. (A) Overall scheme for biotin labeling, and label-free quantification (LFQ) by LC-MS/MS for RWPE-1 Control and Myc over-expression cells. (B) Microscopy image depicting morphological differences between RWPE-1 Control and RWPE-1 Myc cells after 3 days in culture. (C) Volcano plot depicting the LFQ comparison of RWPE-1 Control and Myc labeled cells. Red labels indicate upregulation in the RWPE-1 Control cells over Myc cells and green labels indicate upregulation in the RWPE-1 Myc cells over Control cells. All colored proteins are 2-fold enriched in either dataset between four replicates (two technical, two biological, p<0.05). (D) Heatmap of the 15 most upregulated transmembrane (bold) or secreted (italics) proteins in RWPE-1 Control and Myc cells. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/standard deviation. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 Control and Myc cells. (F) Upregulated proteins in RWPE-1 Myc cells (Myc, ANPEP, Vimentin, and FN1) are confirmed by western blot. (G) Upregulated surface proteins in RWPE-1 Myc cells (Vimentin, ANPEP, FN1) are detected by immunofluorescence microscopy. The downregulated protein HLA-B by Myc over-expression was also detected by immunofluorescence microscopy. All western blot images and microscopy images are representative of two biological replicates. Mass spectrometry data is based on two biological and two technical replicates (N = 4).

      New Figure 4:

      Figure 4. WGA-HRP identifies a number of enriched markers on Myc-driven prostate cancer EVs. (A) Workflow for small EV isolation from cultured cells. (B) Labeled proteins indicating canonical exosome markers (ExoCarta Top 100 List) detected after performing label-free quantification (LFQ) from whole EV lysate. The proteins are graphed from least abundant to most abundant. (C) Workflow of exosome labeling and preparation for mass spectrometry. (D) Heatmap of the 15 most upregulated proteins in RWPE-1 Control or Myc EVs. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/SD. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 Control and Myc cells. (F) Upregulated proteins in RWPE-1 Myc EVs (ANPEP and FN1) are confirmed by western blot. Mass spectrometry data is based on two biological and two technical replicates (N = 4). Due to limited sample yield, one replicate was performed for the EV western blot.

      New Figure 5:

      Figure 5. WGA-HRP identifies a number of EV-specific markers that are present regardless of oncogene status. (A) Matrix depicting samples analyzed during LFQ comparison--Control and Myc cells, as well as Control and Myc EVs. (B) Principle component analysis (PCA) of all four groups analyzed by LFQ. Component 1 (50.4%) and component 2 (15.8%) are depicted. (C) Functional annotation clustering was performed using DAVID Bioinformatics Resource 6.8 to classify the major constituents of component 1 in PCA analysis. (D) Heatmap of the 25 most upregulated proteins in RWPE-1 cells or EVs. Proteins are listed in decreasing order of expression with the most highly expressed proteins in EVs on the far left and the most highly expressed proteins in cells on the far right. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/SD. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 EVs compared to parent cells. (F) Western blot showing the EV specific marker ITIH4, IGSF8, and MFGE8.Mass spectrometry data is based on two biological and two technical replicates (N = 4). Due to limited sample yield, one replicate was performed for the EV western blot.

      Authors mention time-sensitive changes but it is unclear how this method would enable one to obtain this kind of data. How would this be accomplished? The statement "Due to the rapid nature of peroxidase enzymes (1-2 min), our approaches enable kinetic experiments to capture rapid changes, such as binding, internalization, and shuttling events." Yes, it is faster, but not sure I can think of an experiment that would enable one to capture such events.

      We thank the reviewer for this comment and giving us an opportunity to elaborate on the types of experiments enabled by this new method. A previous study (Y, Li et al. Rapid Enzyme-Mediated Biotinylation for Cell Surface Proteome Profiling. Anal. Chem. 2021) showed that labeling the cell surface with soluble HRP allowed the researchers to detect immediate surface protein changes in response to insulin treatment. They demonstrated differential surfaceome profiling changes at 5 minutes vs 2 hours following treatment with insulin. Only methods utilizing these rapid labeling enzymes could allow for this type of resolution. A few other biological settings that experience rapid cell surface changes are: response to drug treatment, T-cell activation and synapse formation (S, Valitutti, et al. The space and time frames of T cell activation at the immunological synapse. FEBS Letters. 2010) and GPCR activation (T, Gupte et al. Minute-scale persistence of a GPCR conformation state triggered by non-cognate G protein interactions primes signaling. Nat. Commun. 2019). We also believe the method would be useful for post-translational processes where proteins are rapidly shuttling to the cell surface. We have updated the discussion to elaborate on these types of experiments.

      "Due to the fast kinetics of peroxidase enzymes (1-2 min), our approaches could enable kinetic experiments to capture rapid post-translational trafficking of surfaces proteins, such as response to insulin, certain drug treatments, T-cell activation and synapse formation, and GPCR activation."

      The authors do not have any way to differentiate between proteins expressed by cells and presented on their membranes from proteins that non-specifically bind to the membrane surface. Non-specific binding (NSB) is not addressed. Proteins can non-specifically bind to the cell or EV surface. The results are obtained by comparisons (cells vs exosomes, controls vs cancer cells), which is fine because it means that what is being measured is differentially expressed, so even NSB proteins may be up- and down-regulated. But the proteins identified need to be confirmed. For example, are all the proteins being detected transmembrane proteins that are known to be associated with the membrane?

      As mentioned above, we utilized the most rigorous informatics analysis available (Uniprot and SURFY) to annotate the proteins we find as having a signal sequence and/or TM domain. Data shown in heatmaps are based off of significance (p < 0.05) across all four replicates, which supports that any secreted proteins present are likely due to actual biological differences between oncogenic status and/or sample origin (i.e. EV vs cell). We have addressed this point in a previous comment above.

      The term "extracellular vesicles" (EVs) might be more appropriate than "exosomes" to describe the studied preparation.

      As we describe above in response to earlier comments, we have systematically changed from using exosomes to small extracellular vesicles and better defined the isolation procedure that we used in the methods section.

      Reviewer #3 (Public Review):

      The article by Kirkemo et al explores approaches to analyse the surface proteome of cells or cell-derived extracellular vesicles (EVs, called here exosomes, but the more generic term "extracellular vesicles" would be more appropriate because the used procedure leads to co-isolation of vesicles of different origin), using tools to tether proximity-biotinylation enzymes to membranes. The authors determine the best conditions for surface labeling of cells, and demonstrate that tethering the enzymes (APEX or HRP) increases the number of proteins detected by mass-spectrometry. They further use one of the two approaches (where HRP binds to glycans), to analyse the biotinylated proteome of two variants of a prostate cancer cell line, and the corresponding EVs. The approaches are interesting, but their benefit for analysis of cells or EVs is not very strongly supported by the data.

      First, the authors honestly show (fig2-suppl figures) that only 35% of the proteins identified after biotinylation with their preferred tool actually correspond to annotated surface proteins. This is only slightly better than results obtained with a non-tethered sulfo-NHS-approach (30%).

      We thank the reviewer for this comment. The reason we utilize membrane protein enrichment methods is that membrane protein abundance is low compared to cytosolic proteins and their identification can be overwhelmed by cytosolic contaminants. Nonetheless, despite our best efforts to limit labeling to the membrane proteins, cytosolic proteins can carry over. Thus, we utilize informatics methods to identify the proteins that are annotated to be membrane associated. The Uniprot GOCC (Gene Ontology Cellular Component) Plasma Membrane database is the most inclusive of membrane proteins only requiring they contain either a signal sequence, transmembrane domain, GPI anchor or other membrane associated motifs yielding a total of 5,746 proteins. This will include organelle membrane proteins. It is known that proteins can traffic from the internal organelles to the cell surface so these can be bonified cell surface proteins too. To increase the informatics stringency for membrane proteins we have now applied a new database aggregated from work by the Wollscheid lab, called SURFY (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). This is a machine learning method trained on 735 high confidence membrane proteins from the Cell Surface Protein Atlas (CSPA). SURFY predicts a total of 2,886 cell surface proteins. When we filter our data using SURFY for proteins, peptides and label free quantitation (LFQ) area for three methods, we find that the difference between NHS-Biotin and WGA-HRP expands considerably (see new Figure 3-Supplemental Figure 1 below). We observe these differences when the datasets are searched with either the GOCC Plasma Membrane database or the entire human Uniprot database. The difference is especially large for LFQ analysis, which quantitatively scores peptide intensity as opposed to simply count the number hits as for protein and peptide analysis. Cytosolic carry over is the major disadvantage of NHS-Biotin, which suppresses signal strength and is reflected in the lower LFQ values (24% for NHS-biotin compared to 40% for WGA-HRP). We have updated the main text and supplemental figure below:

      "Both WGA-HRP and biocytin hydrazide had similar levels of cell surface enrichment on the peptide and protein level when cross-referenced with the SURFY curated database for extracellular surface proteins with a predicted transmembrane domain (Figure 3 - Figure supplement 1A). Sulfo-NHS-LC-LC-biotin and whole cell lysis returned the lowest percentage of cell surface enrichment, suggesting a larger portion of the total sulfo-NHS-LC-LC-biotin protein identifications were of intracellular origin, despite the use of the cell-impermeable format. These same enrichment levels were seen when the datasets were searched with the curated GOCC-PM database, as well as the Uniprot entire human proteome database (Figure 3 - Figure supplement 1B). Importantly, of the proteins quantified across all four conditions, biocytin hydrazide and WGA-HRP returned higher overall intensity values for SURFY-specified proteins than either sulfo-NHS-LC-LC-biotin or whole cell lysis. Importantly, although biocytin hydrazide shows slightly higher cell surface enrichment compared to WGA-HRP, we were unable to perform the comparative analysis at 500,000 cells--instead requiring 1.5 million--as the protocol yielded too few cells for analysis."

      Figure 3-Figure Supplement 1. Comparison of surface enrichment between replicates for different mass spectrometry methods. (A) The top three methods (NHS-Biotin, Biocytin Hydrazide, and WGA-HRP) were compared for their ability to enrich cell surface proteins on 1.5 M RWPE-1 Control cells by LC-MS/MS after being searched with the Uniprot GOCC Plasma Membrane database. Shown are enrichment levels on the protein, peptide, and average MS1 intensity of top three peptides (LFQ area) levels. (B) The top three methods (NHS-Biotin, Biocytin Hydrazide, and WGA-HRP) were compared for their ability to enrich cell surface proteins on 1.5 M RWPE-1 Control cells by LC-MS/MS after being searched with the entire human Uniprot database. Shown are enrichment levels on the protein, peptide, and average MS1 intensity of top three peptides (LFQ area) levels. Proteins or peptides detected from cell surface annotated proteins (determined by the SURFY database) were divided by the total number of proteins or peptides detected. LFQ areas corresponding to cell surface annotated proteins (SURFY) were divided by the total area sum intensity for each sample. The corresponding percentages for two biological replicates were plotted.

      There are additional advantages to WGA-HRP over NHS-biotin. These include: (i) labeling time is 2 min versus 30 min, which would afford higher kinetic resolution as needed, and (ii) the NHS-biotin labels lysines, which hinders tryptic cleavage and downstream peptide analysis, whereas the WGA-HRP labels tyrosines, eliminating impacts on tryptic patterns. WGA-HRP is slightly below biocytin hydrazide in peptide and protein ID and somewhat more by LFQ. However, there are significant advantages over biocytin hydrazide: (i) sample size for WGA-HRP can be reduced a factor of 3-5 because of cell loss during the multiple washing steps after periodate oxidation and hydrazide labeling, (ii) the time of labeling is dramatically reduced from 3 hr for hydrazide to 2 min for WGA-HRP, and (iii) the HRP enzyme has a large labeling diameter (20-40 nm, but also reported up to 200 nm) and can label non-glycosylated membrane proteins as opposed to biocytin hydrazide that only labels glycosylated proteins. The hydrazide method is the current standard for membrane protein enrichment, and we feel that the WGA-HRP will compete especially when cell sample size is limited or requires special handling. In the case of EVs, we were not able to perform hydrazide labeling due to the two-step process and small sample size.

      Indeed the list of identified proteins in figures 4 and 5 include several proteins whose expected subcellular location is internal, not surface exposed, and whose location in EVs should also be inside (non-exhaustively: SDCBP = syntenin, PDCD6IP = Alix, ARRDC1, VPS37B, NUP35 = nucleopore protein)…

      We thank the reviewer for this comment. We have elaborated on this point in a number of response paragraphs above. The proteins that the reviewer points out are annotated as “plasma membrane” in the very inclusive GOCC plasma membrane database. However, this means that they may also spend time in other locations in the cell or reside on organelle membranes. We have done further analysis to remove any intracellular membrane residing proteins that are included in the GOCC plasma membrane database, including the five proteins mentioned above. We also have further highlighted proteins that appear in the SURFY database, as discussed above and in our response to Reviewer 2’s comment. To increase stringency, we have bolded proteins that are found in the more selective SURFY database and italicized secreted proteins. Due to our new analysis and data presentation, it is more clear which markers are bona fide extracellular resident membrane proteins. We have updated the Figures and Figure legends as mentioned above, as well as added this statement in the Data Processing and Analysis methods:

      "Additionally, to not miss any key surface markers such as secreted proteins or anchored proteins without a transmembrane domain, we chose to initially avoid searching with a more stringent protein list, such as the curated SURFY database. However, following the analysis, we bolded proteins found in the SURFY database and italicized proteins known to be secreted (Uniprot)."

      The membrane proteins identified as different between the control and Myc-overexpressing cells or their EVs, would have been identified as well by a regular proteomic analysis.

      To directly compare surfaceomes of EVs to cells, we are compelled to use the same proteomic method. For parental cell surfaceomic analysis, a membrane enrichment method is required due to the high levels of cytosolic proteins that swamp out signal from membrane proteins. Although EVs have a higher proportion of membrane to cytosol, whole EV proteomics would still have significant cytosolic contamination.

      Second, the title highlights the benefit of the technique for small-scale samples: this is demonstrated for cells (figures 1-2), but not for EVs: no clear quantitative indication of amount of material used is provided for EV samples. Furthermore, no comparison with other biotinylation technics such as sulfo-NHS is provided for EVs/exosomes. Therefore, it is difficult to infer the benefit of this technic applied to the analysis of EVs/exosomes.

      We appreciate the reviewer for this comment. We have updated the methods as mentioned above in our response to the Essential Revisions. In brief, the yield of EVs post-sucrose gradient isolation was 3-5 µg of protein from 16x15 cm2 plates of cells, totaling 240 mL of media. Since we had previously demonstrated that our method was superior to sulfo-NHS for enriching surface proteins on cells, we proceeded to use the WGA-HRP for the EV labeling experiments.

      In addition, the WGA-based tethering approach, which is the only one used for the comparative analysis of figures 4 and 5, possibly induces a bias towards identification of proteins with a particular glycan signature: a novelty would possibly have come from a comparison of this approach with the other initially evaluated, the DNA-APEX one, where tethering is induced by lipid moieties, thus should not depend on glycans. The authors may have then identified by LC-MS/MS specific glycan-associated versus non-glycan-associated proteins in the cells or EVs membranes. Also ideally, the authors should have compared the 4 combinations of the 2 enzymes (APEX and HRP) and 2 tethers (lipid-bound DNA and WGA) to identify the bias introduced by each one.

      We thank the reviewer for this comment. We performed analysis to determine whether there was a bias towards Uniprot annotated “Glyco” vs “Non-Glyco” surface proteins within the SURFY database identified across the WGA-HRP, APEX2-DNA, APEX2, and HRP labeling methods. We performed this analysis by measuring the total LFQ area detected for each category (glycoprotein vs non-glycoprotein) and dividing that by the total LFQ area found across all proteins detected in the sample. We found similar normalized areas of non-glyco surface proteins between WGA-HRP and APEX2-DNA suggesting there is not a bias against non-glycosylated proteins in the WGA-HRP sample. There were slightly elevated levels of Glycoproteins in the WGA-HRP sample over APEX2-DNA. It is not surprising to us that there is little bias because the free-radicals generated by biotin-tyramide can label over tens of nanometers and thus can label not just the protein they are attached to, but neighbors also, regardless of glycosylation status. We have added this as Figure 2-Supplement 3, and amended the text in the manuscript below in purple.

      Figure 2 – Figure Supplement 3: Comparison of enrichment of Glyco- vs Non-Glyco-proteins. (A) TIC area of Uniprot annotated Glycoproteins compared to Non-Glycoproteins in the SURFY database for each labeling method compared to total TIC area. There was not a significant difference in detection of Non-Glycoproteins detected between WGA-HRP and APEX2-DNA and only a slightly higher detection of Glycoproteins in the WGA-HRP sample over APEX2-DNA.

      "As the mode of tethering WGA-HRP involves GlcNAc and sialic acid glycans, we wanted to determine whether there was a bias towards Uniprot annotated 'Glycoprotein' vs 'Non-Glycoprotein' surface proteins identified across the WGA-HRP, APEX2-DNA, APEX2, and HRP labeling methods. We looked specifically looked at surface proteins founds in the SURFY database, which is the most restrictive surface database and requires that proteins have a predicted transmembrane domain (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). We performed this analysis by measuring the average MS1 intensity across the top three peptides (area) for SURFY glycoproteins and non-glycoproteins for each sample and dividing that by the total LFQ area found across all GOCC annotated membrane proteins detected in each sample. We found similar normalized areas of non-glyco surface proteins across all samples (Figure 2 - Figure supplement 4). If a bias existed towards glycosylated proteins in WGA-HRP compared to the glycan agnostic APEX2-DNA sample, then we would have seen a larger percentage of non-glycosylated surface proteins identified in APEX2-DNA over WGA-HRP. Due to the large labeling radius of the HRP enzyme, we find it unsurprising that the WGA-HRP method is able to capture non-glycosylated proteins on the surface to the same degree (Rees et al. Selective Proteomic Proximity Labeling Assay SPPLAT. Current Protocols in Protein Science. 2015). There is a slight increase in the area percentage of glycoproteins detected in the WGA-HRP compared to the APEX2-DNA sample but this is likely due to the fact that a greater number of surface proteins in general are detected with WGA-HRP."

      As presented the article is thus an interesting technical description, which does not convince the reader of its benefit to use for further proteomic analyses of EVs or cells. Such info is of course interesting to share with other scientists as a sort of "negative" or "neutral" result. Maybe a novelty of the presented work is the differential proteome analysis of surface enriched EV/cell proteins in control versus myc-expressing cells. Such analyses of EVs from different derivatives of a tumor cell line have been performed before, for instance comparing cells with different K-Ras mutations (Demory-Beckler, Mol Cell proteomics 2013 # 23161513). However, here the authors compare also cells and EVs, and find possibly interesting discrepancies in the upregulated proteins. These results could probably be exploited more extensively. For instance, authors could give clearer info (lists) on the proteins differentially regulated in the different comparisons: in EVs from both cells, in EVs vs cells, in both cells.

      We appreciate the reviewer for this critique and have updated the manuscript accordingly. We have changed the title to “Cell surface tethered promiscuous biotinylators enable small-scale comparative surface proteomic analysis of human extracellular vesicles and cells” to more accurately depict the focus of our manuscript which, as the reviewer highlighted, is that this technology allows for comparative analysis between the surfaceomes of cells vs EVs. We appreciate the fine work from the Coffey lab on whole EV analysis of KRAS transformed cells. They identified a mix of surface and cytosolic proteins that change in EVs from the transformed cells, whereas our data focuses specifically on the surfaceome differences in Myc transformed and non-transformed cells and corresponding small EVs. We believe this makes important contributions to the field as well.

      To further address the reviewer’s suggestions, we additionally have significantly reorganized the figures to better display the differentially regulated proteins. We have removed the volcano plots and instead included heatmaps with the top 30 (Figure 3 and Figure 4) and top 50 (Figure 5) differentially regulated proteins across cells and EVs. We have also updated the lists of proteins in the supplemental source tables section. See responses to Reviewer 2 above for the updates to Figures 3-5. We have additionally included supplemental figures with lists of differentially upregulated proteins in the EV and Cell samples, which are shown below:

      Figure 3 – Supplement 3: List of proteins comparing enriched targets (>2-fold) in Myc cells versus Control cells. Targets that were found enriched (Myc/Control) in the Control cells (left) and Myc cells (right). The fold-change between Myc cells and Control cells is listed in the column to the right of the gene name.

      Figure 4 – Supplement 1: List of proteins comparing enriched targets (>1.5-fold) in Myc EVs versus Control EVs. Targets that were found enriched (Myc/Control) in the Control EVs (left) and Myc EVs (right). The fold-change between Myc EVs and Control EVs is listed in the column to the right of the gene name.

      Figure 4 – Figure Supplement 2: Venn diagram comparing enriched targets (>2-fold) in Cells and EVs. (A) Targets that were found enriched in the Control EVs (purple) and Control cells (blue) when each is separately compared to Myc EVs and Myc cells, respectively. The 5 overlapping enriched targets in common between Control cells and Control EVs are listed in the center. (B) Targets that were found enriched in the Myc EVs (purple) and Myc cells (blue) when each is separately compared to Control EVs and Control cells, respectively. The 12 overlapping enriched targets in common between Myc cells and Myc EVs are listed in the center.

      Figure 5 - Supplement 1: List of proteins comparing enriched targets (>2-fold) in Control EVs versus Control cells and Myc EVs versus Myc cells. (A)Targets that were found enriched (EV/cell) in the Control samples are listed. The fold-change values between Control EVs and Control cells are listed in the column to the right of the gene name. (B)Targets that were found enriched (EV/cell) in the Myc samples are listed. The fold-change values between Myc EVs and Myc cells are listed in the column to the right of the gene name.

    1. Author Response:

      Reviewer #1 (Public Review):

      Roy et al. investigated glucose-induced changes of selected neutrophil functions using neutrophils from diabetic mice and glucose-exposed murine and human neutrophils. They reconfirm earlier findings that glucose renders neutrophils less responsive to fMLF-mediated chemotaxis and show that expression and surface presentation of the corresponding receptor FPR1, a chemotactic receptor that is high in the signaling hierarchy, is downregulated within the first hour of glucose treatment. Similarly, other elements of neutrophil chemotactic responses including the phospholipase PLC and the cytokine MIP-1/CCL3 are also affected, while the expression of the chemokine receptor CCR1 remains unaltered. Interestingly, supplementing the CCFR1-targeting cytokine CCL3 could restore neutrophil chemotactic fitness and wound healing and thus, might be beneficial for diabetic wound management.

      Conclusions are supported by the data but the study, in its current stage, needs further analysis. The findings suggest a more general effect on the neutrophil expression pattern induced by glucose and unfortunately, this is not addressed and mechanistic insights to explain the observed effects are entirely missing. The finding that CCL3 levels are reduced and that external addition brings neutrophil chemotactic response back to normal is of high translational potential.

      We appreciate the reviewer for their thorough evaluation of our manuscript. As the reviewer appreciates, no single manuscript can possibly address all questions, but impactful manuscripts, as we believe our manuscript is, often also open new areas for follow-up research. We acknowledge that our manuscript does not address all the interesting questions that were raised by the reviewers and the editor, but our data do reveal important information regarding the culprit responsible for the impaired chemotaxis responses in diabetic neutrophils, which had been known but ignored for decades (10), and how this impairment adversely impacts the dynamics of neutrophil trafficking into diabetic wound during the acute phase of healing, early after injury. Our data establish a new paradigm that blames inadequate neutrophil response early after injury (not excess neutrophils in chronic diabetic ulcers), for rendering diabetic wounds vulnerable to infection, which in turn contributes to setting the stage for the sustained and non-resolving inflammatory environment as diabetic wounds age and become chronic. We further show that neutrophil depletion in diabetic animals renders diabetic wounds significantly more vulnerable to infection, indicating that as impaired as diabetic neutrophils may be with respect to their bactericidal functions as has been reported (11, 12), they still maintain some antimicrobial functions. Finally, we show that by jumpstarting the neutrophil influx in diabetic wound through the use of CCL3 which engages CCR1, CCR4, and CCR5 auxiliary receptors (13-15), we can reduce infection levels in diabetic wounds by ~2 log orders in a manner that is completely dependent on neutrophil influx into the wound and significantly stimulate healing. As the reviewer is aware, there is only one FDA-approved therapy (Becaplermin) showing modest effectiveness in stimulating wound healing in diabetic wounds (16-21), and there are no treatments to address infection in diabetic wounds, except for the use of systemic antibiotics which are routinely included in the management of diabetic patients with chronic ulcers (22, 23). Antibiotic overuse can have disastrous consequences, leading to the emergence and the spread of antibiotic resistance, cytotoxicity, allergic reactions, and immunological and neurological diseases (24-29). Our data reveal therapeutic potential for CCL3 topical treatment to enhance infection control and stimulate healing in diabetic wounds.

      Reviewer #2 (Public Review):

      Diabetic wound closure remains a major clinical problem, in the sense that in diabetes, skin wounds do not repair on time and may get infected, installing a feed-forward cycle of inflammatory and tissue damage outcomes. This study chiefly demonstrates that this vicious cycle can be broken by ensuring an exuberant and effective neutrophilic response takes place.

      As such the data presented herein are exemplar in demonstrating: i) the substantial non-redundant role of physiological inflammation (that is, a good inflammatory response when needed is a good thing); ii) the fact that tissue inflammation resolves after a proper and effective neutrophil response and iii) a specific receptor target (FPR1) is affected by diabetes (high glucose) and is central to a defective inflammatory response to infective agents.

      All in all I found the experimental model used very helpful in demonstrating this important physio-patholoigical link between: inflammation onset -> inflammation resolution.

      We appreciate reviewer’s thorough evaluation of our manuscript and for their recognition of the importance of our findings. We are also in complete agreement with the reviewer that proper inflammatory responses are absolutely a good thing when needed (i.e., early after injury or in response to infection) and neutrophils also play an important in initiating the resolution of inflammatory responses.

      Reviewer #3 (Public Review):

      In this study, Roy et al., have focussed on investigating what happens in infections associated with impaired healing in diabetic wounds. Specifically, they have identified that a certain type of white blood cell (i.e. a neutrophil) is dysfunctional, leading to a delay in its ability to help fight off the infections. The findings of the study are interesting, and the therapeutic possibility of treating diabetic wounds with CCL3 is novel and is likely to be of interest to the field. However, the very low n numbers used in the study questions the validity and robustness of the data.

      We apologize to the reviewer for this confusion in the figure legends. Nowhere in this manuscript did we rely on N=2 for our analyses and we completely agree with the reviewers that N=2 is not statistically robust. We indicated N=2 for experiments involving RT-PCR but we had repeated these experiments at least two independent times so the actual number is N>4. We have corrected the figure legends accordingly to address the reviewer’s concern.

    1. Author Response:

      Reviewer #2 (Public Review):

      Oberle et al. provide a detailed analysis of how descending projections from the auditory cortex interact with ascending auditory projections on neurons in the shell region of the inferior colliculus on a cellular basis. Using optogenetic activation of auditory cortical neurons or projections and electrical stimulation of fibres in combination with whole-cell patch clamp recordings in vivo and in vitro, they show that most neurons in the shell region of the inferior colliculus receive several monosynaptic cortical inputs. In vitro, these descending synapses show sublinear summation with a major tonic component for prolonged stimuli. Both in vivo and in vivo experiments support the idea that descending cortical inputs and ascending inputs from the central inferior colliculus temporally overlap and both activate NMDA and non-NMDA receptors. This cooperativity of inputs leads to supra-linear summation and boosting of the response.

      Strengths:

      • The manuscript provides a first detailed analysis of a loop between the cortex and midbrain. It elegantly combines in vivo and in vitro electrophysiological techniques to study this network on a cellular/synaptic level.

      • These experiments thoroughly characterize the nature of cortical and midbrain excitatory inputs onto shell IC neurons and elucidate how they integrate the ascending and descending inputs on a cellular level.

      Weaknesses:

      • A major weakness of this study is that they do not directly show that ascending and descending inputs to the IC shell neurons actually coincide, but only imply that this should be the case, considering different latency measurements. Latencies that are measured in the anesthetized preparation may change in the awake behaving animals which may change the timing of the respective inputs.

      We rectify this issue in our revision with new data showing that the latency of sound-evoked activity in the superficial IC is similar in anesthetized and awake mice. We acknowledge that the conduction velocity of descending axons may differ between anesthetized and awake state. However, existing data show that conduction velocities of cortical axons increase in the alert brain compared to non-alert conditions (Stoelzel et al., 2017). Taken together, we would expect an increased temporal coincidence of ascending and descending signals in awake compared to anesthetized animals, which all available evidence suggests would enhance NMDAR-dependent non-linearities such as those we described (Gasparini et al., 2004; Gasparini and Magee, 2006; Losonczy and Magee, 2006; Takahashi and Magee, 2009; Branco et al., 2010; Branco and Häusser, 2011). We now revise our Results to highlight that our latency measurements in anesthetized mice represent the upper bound for the arrival of auditory cortical EPSPs.

      In addition, the authors do not show to what extent coincidence of ascending and descending inputs to shell IC neurons is maintained for longer and more complex sounds as compared to click stimuli.

      Previous work shows that auditory cortico-collicular neurons sustain firing during long, complex sounds (Williamson and Polley, 2019), and our data show that descending transmission is maintained for extended periods of corticofugal activity both in vitro and in vivo (Figure 4E-H). Thus, we would expect temporal overlap of ascending and descending inputs to occur under these conditions as well. We agree that Reviewer #2 touches upon an important knowledge gap. However, we believe that a full investigation of which sounds do and do not engage descending modulation merits a separate, in-depth study.

      • The manuscript does not address the question of whether the different neuron types that they encounter in the shell region based on the firing pattern to current injections, vary in their input latencies, their number and distribution of NMDA receptors or their integrative properties. This may have some additional effect on how these neurons process ascending and descending information.

      We agree that correlating intrinsic and synaptic properties could reveal something interesting. However, our initial analyses (Figure 3) did not show any striking correlation between membrane biophysics and the half-width or amplitude of descending EPSPs. As such, we had no a priori basis to hypothesize that synaptic integration differs systematically with measurable membrane properties, and the low-throughput of dual pathway stimulation experiments (Figures 6 and 8) precluded collecting a large dataset needed to convincingly determine if any synaptic non-linearity does or does not meaningfully correlate with the cellular biophysics.

      We acknowledge this limitation of our study in our revised Discussion. Future studies, perhaps leveraging cell-type specific markers for different IC neurons (Goyer et al., 2019; Naumov et al., 2019; Silveira et al., 2020; Kreeger et al., 2021) will be required to clarify this issue.

      • The authors have not demonstrated that silencing of descending inputs from the AC affects IC shell activity.

      We did not initially perform this experiment given the extensive literature establishing that silencing auditory cortex modifies the magnitude, timing, and/or selectivity of IC neuron sound responses (Yan and Suga, 1999; Nwabueze-Ogbo et al., 2002; Popelár et al., 2003; Nakamoto et al., 2008, 2010; Anderson and Malmierca, 2013; Popelář et al., 2016; Weible et al., 2020). Indeed, these classic results were a major motivation for us to focus on the cellular mechanisms that support corticofugal transmission. We thus reasoned that a cortical inactivation experiment would be largely confirmatory of prior knowledge, and limited in its potential for mechanistic interpretation given the known caveats of cortical loss-of-function manipulations (Li et al., 2019; Andrei et al., 2021; Slonina et al., 2021). However, we acknowledge that such an experiment is useful to frame our cellular-level findings in a broader, systems-level context. As such, we address Reviewer #2’s concern in our revision with a new experiment demonstrating that auditory cortical silencing indeed affects sound-evoked activity in the IC of awake mice.

      Reviewer #3 (Public Review):

      Overall, this manuscript is generally nicely written and well-illustrated. I don´t really have any major issues. I like the manuscript but I have a few comments and some issues that need to be addressed.

      My main concern is that the authors claim several times that the projections to the central nucleus of IC are weak and they neglect their potential functional role. I think this is a little bit unfortunate. It is true that the large AC projection primarily targets the cortical regions or shell of IC, but it is beyond doubt that it also targets the central nucleus (e.g. Saldaña's studies) . We cannot know whether it is a weak projection or not without central nucleus recordings. Admittedly, these experiments would be challenging, so I would ask the authors to tone down a bit these comments throughout the ms. Also, the reason for the 'weak' projection to the central nucleus may be due to the size and location of the injections made in the auditory cortex. Thus, I would like to see the injections site of Chronos if possible. Likewise, fig 1B is too small and of low quality (at least in my pdf file for review) to appreciate details of labeling. I would suggest that the authors make a separate figure showing the injection site in the AC and larger and clearer labeling in the IC.

      We agree that in vitro recordings from the central IC in adult mice are quite challenging. As suggested we have toned down claims of the “weak” projection to central IC and provide micrographs of Chronos injection sites. However, we concur that this is an important point. Thus, we include a new transsynaptic tracing experiment showing the somata of presumptive postsynaptic targets of auditory cortex neurons in the IC. Although the data show that the majority of cortico-recipient IC neurons are located in the shell regions, a few central IC neurons are indeed clearly labeled. Future studies will be required to test the extent and potency of this direct auditory cortex->central IC projection, and to compare the synaptic properties with our results in the shell IC.

      Also I wonder if the title of the manuscript should refer to the non-lemniscal IC as most of the data is related to this area.

      We have changed the title of the paper to Synaptic Mechanisms of Top-Down Control in the Non-Lemniscal Inferior Colliculus.

      While the dogma is that the descending projections are glutamatergic, the authors may care to consider a recently published paper https://www.frontiersin.org/articles/10.3389/fncir.2021.714780/full, which challenges this view by showing that inhibitory long-range VIP-GABAergic neurons target the IC. It would be interesting if the authors could comment on how this projection may have influenced the results of the present study.

      We thank Reviewer #3 for pointing out this new study which does indeed relate to our work. However, we don’t think direct GABAergic projections contributed much, if at all to our results. Indeed, the experiments of Figure 5A did not reveal any inhibitory postsynaptic potentials following bath application of NBQX as one might expect from direct stimulation of VIP-GABA axons (these experiments were performed without SR95531 in the bath). Rather, it may be that the VIP-GABA synapses have low release probability, transmit mainly via non-synaptic diffusion (e.g., spillover), or may primarily release the neuropeptide VIP which would be difficult to detect via whole-cell patch-clamp electrophysiology. We now address the work of Bertero et al. in the Discussion section.

      References

      Anderson LA, Malmierca MS (2013) The effect of auditory cortex deactivation on stimulus-specific adaptation in the inferior colliculus of the rat. Eur J Neurosci 37:52–62.

      Andrei AR, Debes S, Chelaru M, Liu X, Rodarte E, Spudich JL, Janz R, Dragoi V (2021) Heterogeneous side effects of cortical inactivation in behaving animals. eLife 10:e66400.

      Branco T, Clark BA, Häusser M (2010) Dendritic discrimination of temporal input sequences in cortical neurons. Science 329:1671–1675.

      Branco T, Häusser M (2011) Synaptic integration gradients in single cortical pyramidal cell dendrites. Neuron 69:885–892.

      Gasparini S, Magee JC (2006) State-dependent dendritic computation in hippocampal CA1 pyramidal neurons. J Neurosci Off J Soc Neurosci 26:2088–2100.

      Gasparini S, Migliore M, Magee JC (2004) On the initiation and propagation of dendritic spikes in CA1 pyramidal neurons. J Neurosci Off J Soc Neurosci 24:11046–11056.

      Goyer D, Silveira MA, George AP, Beebe NL, Edelbrock RM, Malinski PT, Schofield BR, Roberts MT (2019) A novel class of inferior colliculus principal neurons labeled in vasoactive intestinal peptide-Cre mice. eLife 8:e43770.

      Kreeger LJ, Connelly CJ, Mehta P, Zemelman BV, Golding NL (2021) Excitatory cholecystokinin neurons of the midbrain integrate diverse temporal responses and drive auditory thalamic subdomains. Proc Natl Acad Sci U S A 118:e2007724118.

      Li N, Chen S, Guo ZV, Chen H, Huo Y, Inagaki HK, Chen G, Davis C, Hansel D, Guo C, Svoboda K (2019) Spatiotemporal constraints on optogenetic inactivation in cortical circuits. eLife 8:e48622.

      Losonczy A, Magee JC (2006) Integrative properties of radial oblique dendrites in hippocampal CA1 pyramidal neurons. Neuron 50:291–307.

      Nakamoto KT, Jones SJ, Palmer AR (2008) Descending projections from auditory cortex modulate sensitivity in the midbrain to cues for spatial position. J Neurophysiol 99:2347–2356.

      Nakamoto KT, Shackleton TM, Palmer AR (2010) Responses in the inferior colliculus of the guinea pig to concurrent harmonic series and the effect of inactivation of descending controls. J Neurophysiol 103:2050–2061.

      Naumov V, Heyd J, de Arnal F, Koch U (2019) Analysis of excitatory and inhibitory neuron types in the inferior colliculus based on Ih properties. J Neurophysiol 121:2126–2139.

      Nwabueze-Ogbo FC, Popelár J, Syka J (2002) Changes in the acoustically evoked activity in the inferior colliculus of the rat after functional ablation of the auditory cortex. Physiol Res 51 Suppl 1:S95–S104.

      Popelár J, Nwabueze-Ogbo FC, Syka J (2003) Changes in neuronal activity of the inferior colliculus in rat after temporal inactivation of the auditory cortex. Physiol Res 52:615–628.

      Popelář J, Šuta D, Lindovský J, Bureš Z, Pysanenko K, Chumak T, Syka J (2016) Cooling of the auditory cortex modifies neuronal activity in the inferior colliculus in rats. Hear Res 332:7–16.

      Silveira MA, Anair JD, Beebe NL, Mirjalili P, Schofield BR, Roberts MT (2020) Neuropeptide Y Expression Defines a Novel Class of GABAergic Projection Neuron in the Inferior Colliculus. J Neurosci 40:4685–4699.

      Slonina ZA, Poole KC, Bizley JK (2021) What can we learn from inactivation studies? Lessons from auditory cortex. Trends Neurosci:S0166-2236(21)00203-4.

      Stoelzel CR, Bereshpolova Y, Alonso J-M, Swadlow HA (2017) Axonal Conduction Delays, Brain State, and Corticogeniculate Communication. J Neurosci Off J Soc Neurosci 37:6342–6358.

      Takahashi H, Magee JC (2009) Pathway interactions and synaptic plasticity in the dendritic tuft regions of CA1 pyramidal neurons. Neuron 62:102–111.

      Weible AP, Yavorska I, Wehr M (2020) A Cortico-Collicular Amplification Mechanism for Gap Detection. Cereb Cortex N Y N 1991 30:3590–3607.

      Williamson RS, Polley DB (2019) Parallel pathways for sound processing and functional connectivity among layer 5 and 6 auditory corticofugal neurons. eLife 8:e42974.

      Yan J, Suga N (1999) Corticofugal Amplification of Facilitative Auditory Responses of Subcortical Combination-Sensitive Neurons in the Mustached Bat. J Neurophysiol 81:817–824.

    1. Author Response:

      Reviewer #1:

      Regression models are a widespread statistical technique used in epidemiological studies. Most commonly used regression models do not explicitly parameterize the relationship between the independent variables and the variance or skewness of the dependent variable. Generalized Additive Models for Location, Scale and Shape (GAMLSS) is a regression technique that provides the flexibility to estimate parameters of the dependent variable distribution (mean, median, variance, e.t.c) as a function of independent variables. This manuscript uses data from the 1970 British birth cohort study to showcase the use of GAMLSS in epidemiological studies and further compares the results to quantile regressions.

      The primary concern with this manuscript is its overall goal. In its current form, it is hard to assess whether the manuscript is meant to be a tutorial on how to fit GAMLSS and interpret its output from an epidemiological context, or it is meant to be a research report investigating the association between three risk factors (sex, social class, physical activity) with two outcomes (BMI and mental wellbeing).

      We have edited the manuscript to form a tutorial. We have also provided additional detail to rationalise the risk factors and outcomes used. That these associations are of substantive interest helps to motivate the use of GAMLSS.

      The modelling choices in the manuscript are only suited if it is aimed to be a tutorial. For example, the rationale for the choice of the outcomes (BMI and mental wellbeing) is reported to be the fact that they are often measured on a continuous scale. Similarly, authors only interpret the unadjusted estimates because they were similar to those from an adjusted model. Although these are acceptable choices for a tutorial, if the manuscript's goal was to estimate the true association between these variables, it has several shortcomings. Such as i) the disadvantages of dichotomizing a continuous independent variable are well known(1); ii) it is recommended to choose potential confounders based on a Directed Acyclic Graph (DAG) to ensure the estimates are unbiased(2); iii) a clear rationale for estimating this effect and what is already known in the literature about the association should be mentioned in the introduction.

      Yet, interpretations provided in the results section and parts of the discussion imply that they are to be taken as estimates of a true association. For example, i) estimates for variable sex is contrasted with that of social class (Page 7 Line 204), ii) argument comparing results of previous studies on BMI and use of a national representative sample (Page 9 line 252 to 258), iii) using GAMLSS and British Birth Cohort data are reported as strengths of the manuscript (Page 10 Line 309), iv) arguments about limitations to make causal claims for the estimates and other data complexities (Page 11 Line 319 to 334).

      We agree that understanding causality is challenging. Binary risk factors were used to aid interpretation of the potentially complex GAMLSS results; findings do not differ when using categorical form (see appendix tables). We have now discussed the potential for confounding and reverse causality in the discussion.

      “The study also has limitations. As in all observational studies, causal inference is challenging despite the use of longitudinal data. Associations of social class at birth with outcomes for example could be explained by unmeasured confounding—this may include factors such as parental mental health. This is challenging to falsify empirically owing to a lack of such data collected before birth. In contrast, sex is randomly assigned at conception, and thus its associations with outcomes are unlikely to be confounded. However, sex differences in reporting may bias associations with mental wellbeing. Physical activity and mental wellbeing were ascertained at broadly the same age, so that associations between the two could be explained by reverse causality; existing evidence appears to suggest bi-directionality of links between physical activity and both outcomes.32 51 Finally, attrition led to lower power to precisely estimate smaller effect sizes (e.g., gender differences in mental wellbeing) or confirm null effects. Such attribution could potentially bias associations—those in worse health and adverse socioeconomic circumstances are disproportionately lost to follow-up.52 53 The focus of principled approaches to handle missing data in epidemiology has been on the main parameter of interest—typically beta coefficients in linear regression models—and further empirical work is required to investigate the potential implications of (non-random) missingness for the variability and other moments of the outcome distribution.”

    1. Author Response:

      Reviewer #2:

      In Zhang et al.'s paper, with 7T fMRI, they used different face parts as stimuli to explore the functional organization within the face specific areas, and found consistent patterns between different subjects in rFFA and rOFA. In these areas, the posterior region was biased to eye, and the anterior region was biased to mouth. To exclude potential confounds, they also ran several control experiments to show that the preference to eyes and mouth is not due to the eccentricity or upper-lower visual field preference. Based on what they found, they claim that there exists a finer scale functional organization within the face areas.

      In general, I think the whole study is carefully designed, and the results are solid and interesting. However, I am not very comfortable about the claim about the organization of the face areas. Typically, when we talk about the organization, it either has more than 2 subdivisions or it has a continuous representation of certain features. In this paper, the results are mainly about the comparison between two face parts, and they failed to find other distinctive subareas showing preference to other face parts. Therefore, I would suggest that the authors could tune down their claim from functional organization to functional preference.

      We have followed the advice from the reviewer to tune down the claim of functional organization in our manuscript. To emphasize both the functional preferences to different face parts within face-selective regions and the consistent spatial profile across different individuals, we now use “spatial tuning of face parts” in the manuscript.

      Reviewer #3:

      Zhang and colleagues investigated the spatial distribution of feature tuning for different face-parts within face-selective regions of human visual cortex using ultra-high resolution 7.0 T fMRI. By comparing the response patterns elicited by images of face-parts (hair, eyes, nose, mouth and chin) with whole faces, they report a spatial pattern of tuning for eyes and mouth along the posterior-anterior axis of both the pFFA and OFA. Within the pFFA this pattern spatial tuning appeared to track the orientation of the mid fusiform sulcus - an anatomical landmark for face-processing in ventral temporal cortex. Two additional control experiments are conducted to examine the robustness of the original findings and to rule out potentially confounding variables. These data are consistent with recent evidence for similar face-part tuning in the OFA and add to the growing body of work showing the topographical mapping feature based tuning within visual cortex.

      The conclusions of this paper are mostly supported by the data, but some aspects of the data acquisition, analysis and interpretation that require further clarification/consideration.

      1) It is currently unclear whether the current data are in full agreement with recent work (de Haas et al., 2021) showing similar face-part tuning within the OFA (or IOG) bilaterally. The current data suggest that feature tuning for eye and mouth parts progresses along the posterior-anterior axis within the right pFFA and right OFA. In this regard, the data are consistent. But de Haas and colleagues also demonstrated tuning for visual space that was spatially correlated (i.e. upper visual field representations overlapped upper face-part preferences and vice-versa). The current manuscript found little evidence for this correspondence within pFFA but does not report the data for OFA. For completeness this should be reported and any discrepancies with either the prior, or between OFA and pFFA discussed.

      In the current study, three participants had data from both retinotopic mapping and face part mapping experiments. Consistent and robust part clustering were found in the right pFFA and right OFA. Following the reviewer’s suggestion, we analyzed these data for the right OFA and found the spatial patterns of eyes vs. mouths are similar to the patterns of visual field sensitivity on the vertical direction (i.e., upper to lower visual field), which are consistent with de Haas and colleagues’ findings. Note that we used more precise functional localization of OFA, while de Haas et al’s analysis was based on anatomically defined IOG, for which OFA is a part of. We have added this result in the Results session (Page 16), and also added a supplemental Figure 4-figure supplement 1.

      2) It is somewhat challenging to fully interpret the responses to face-parts when they were presented at fixation and not in the typical visual field locations during real-world perception. For instance, we typically fixate faces either on or just below the eyes (Peterson et al., 2012) and so in the current experiment the eyes are in the typical viewing position, but the remainder of the face-parts are not (e.g. when fixating the eyes, the nose mouth and chin all fall in the lower visual field but in the current experimental paradigm they appear at fixation). Consideration of whether the reported face-part tuning would hold (or even be enhanced) if face-parts were presented in their typical locations should be included.

      Our early visual cortex and some of the object-selective visual areas are sensitive to visual field locations. To dissociate the visual field tuning and face part tuning in face processing regions, in the main experiment of the current study the face part stimuli were presented at fixation to avoid the potential confounding contribution from visual field location. The spatial correlation between face part tuning and visual field tuning has been observed in posterior part of the face network. It is unlikely that presenting the face parts at the fixation was responsible for the observed face part tuning. To directly test the role of stimulus location, we reanalyzed the data from control experiment 2 in which face parts were presented at their typical locations. Contrasting eyes above fixation vs. nose & mouth below fixation revealed similar anterior-posterior bias in the right pFFA, showing that the face part tuning in the right pFFA is invariant to the visual field location of stimuli. See comparison in the figure below, note that the maps of eyes on top vs. nose & mouth on bottom are unsmoothed:

      3) Although several experiments (including two controls) have been conducted, each one runs the risk of being underpowered (n ranges 3-10). One way to add reassurance when sample sizes are small is to include analyses of the reliability and replicability of the data within subjects through a split-half, or other cross-validation procedure. The main experiment here consisted of eight functional runs, which is more than sufficient for these types of analyses to be performed.

      Following the reviewer’s suggestion, we split the eight runs data from each participant in the main experiment into two data sets (odd-runs and even-runs), and estimated the eyes-mouth biases within each data set. Then we calculated the correlation coefficient between such biases across different voxels between the two data sets to estimate the reliability of the results in the right pFFA. The results demonstrate strong reliability of the data within participants. We have added these results in the Results session (Page 7 and Figure 2-figure supplement 1).

      4) The current findings were only present within the right pFFA and right OFA. Although right lateralisation of face-processing is mentioned in the discussion, this is only cursory. A more expansive discussion of what such a face-part tuning might mean for our understanding of face-processing is warranted, particularly given that the recent work by de Haas and colleagues was bilateral.

      The right lateralization of face-processing has been observed in face-selective network. Both the neural selectivity to faces (Kanwisher et al., 1997) and the decodable neural information of faces (Zhang et al., 2015) are higher in the right than in the left hemisphere. The neural clustering of face part tuning and consistent spatial patterns across individuals in the right rather than in the left face selective regions provides a potential computational advantage for right lateralization for face processing. The clustering of neurons with similar feature tuning have been found extensively in the ventral pathway, which may help to support a more efficient neural processing. Therefore, one of the neural mechanisms underlying the functional lateralization of face processing could be the existence of spatial clustering of face part tunings in the right hemisphere. We have added more discussion about the relevance between our results and lateralization of face processing.

    1. Author Response:

      Reviewer #3 (Public Review):

      In this manuscript the authors make several conclusions, according to the abstract:

      1 - LTG activity is essential by contributing to a process independent of PG recycling.

      2 - LTGs are important because of their catalytic activity rather than because of a protein-protein interaction.

      3 - LTG mutants are hypersusceptible to production of periplasmic polymers.

      4 - LTGs prevent toxic periplasmic crowding and their function is temporally separate from PG synthesis.

      The authors perform a series of genetic experiments that lead to their conclusions. Their first conclusion is well supported by data showing that a PG recycling mutant does not have the same defects as their LTG mutant.

      Their second conclusion needs more justification/explanation. They show a catalytic mutant of RlpA is unable to sustain growth as the only LTG in the cell. However, I am confused by their wording around RlpA in general. In the text they note that their delta_7 mutant, which encodes RlpA, 'has no highly active LTGs' (lines 130-131). Does that imply that RlpA is not an LTG? In the discussion they note that E.coli RlpA has no LTG activity. Is this enzyme known to have LTG activity in V.cholerae? One important control would be to show that the catalytically inactive protein is stable (i.e. that the defect is not due to protein misfolding). This could be supported by looking at protein stability via Western or even quantifying the fluorescence data in Figure S3b.

      Alignment of VcRlpA with P. aeruginosa RlpA, which has been demonstrated in vivo and in vitro to be an active LTG, suggests VcRlpA retains the active site residues required for PG cleavage. This, as well as the inability of a VcRlpA^D145A mutant (based on the alignment with catalytically inactive EcRlpA) to rescue native RlpA depletion from the ∆LTG mutants suggests that VcRlpA is an active LTG and that this activity is required in the absence of all other annotated V. cholerae LTGs. We agree that “no highly active LTGs” is confusing and we have changed the text to simply describe the ∆7 LTG mutant as being significantly depleted in LTG activity as measured by anhMurNAc abundance in the sacculus. Lastly, we have conducted Western Blots demonstrating in the revised manuscript that our catalytic site mutant is indeed produced and stable (Figure S3).

      Their third conclusion also needs more support. The authors do a series of experiments showing that delta7 is more susceptible to SacB. What are the data that show sacB produces large polysaccharides molecules in the periplasm rather than (or in addition to) the cytoplasm? This would be important to show as these data are the main test of the authors model.

      In native B. subtilis as well as in E. coli, SacB has a canonical Sec signal peptide which is annotated as being cleaved after residue Ala29 (Uniprot G3CAF6_BACIU) to be released extracellularly. A reference (Pereira, et al, 2001) has been added in support of SacB functioning extracellularly and not in the cytoplasm of its native host, B. subtilis.

      The authors have other data that all argue for their model that LTG deficient strains have an excess of periplasmic crowding. The suppressor of delta_opgH is intriguing, but does not restore the morphological defects in delta_7, suggesting that the increase in length during prolonged growth may not be caused by periplasmic crowding, or at least is not alleviated by deletion of OpgH. What then does the deletion of OpgH suppress? Here, I was confused by the experiments in low salt. The authors write that the cells lyse (line 222) but this is not shown anywhere. Growing the cells continually in low salt may not be the hypoosmotic challenge the authors presume. A challenge typically implies an acute change in osmolarity, rather than a prolonged exposure, which may allow cells to adapt.

      We do not fully understand the role of OpgH, but here is our working model: LTGs have at least two essential functions – 1) PG release and 2) mitigating periplasmic crowding, either or both of which can become more important based on osmotic conditions. Since MltG seems to be the main PG release factor (at least based on E. coli), which can be partially supplanted by collective action of other LTGs, the ∆7 suffers from both PG release defects and periplasmic crowding defects, perhaps more so in an osmotically challenging low salt medium. The evidence for lysis is that at high inoculum (10^-2) the ∆7 LTG mutant does grow for a short time, but then we observe a drop in OD_600, indicative of lysis. According to our model, ∆6, on the other hand, which still has MltG, likely suffers only (or mostly) from a periplasmic crowding defect. Deleting periplasmic glucans only mitigates periplasmic crowding (and probably only partially), which does not help the more defective ∆7, which additionally suffers from lack of the postulated second activity.

      The reviewers raise an interesting point regarding the word “challenge”. We indeed specifically make the point that this is not an acute challenge, but rather accumulating damage during prolonged growth, even in salt-free LB. We have thus removed the word “challenge” from the revised manuscript. Importantly, we only use the ∆opgH suppression phenotype as one of many puzzle pieces for our conclusion. The key assay is the direct demonstration of periplasmic soluble PG strands accumulating in both WT and, to a higher degree, the ∆6 LTG mutant (Fig. 6).

      I was also highly confused by the antibiotic + BADA staining experiments. Do the authors stain the cells, treat, and then visualize? Are they then studying the fate of old PG? How does BADA get incorporated into PG in V.cholerae? Is it through LDT activity or some other way? Without more explanation, it is hard to interpret the results.

      BADA does get incorporated through either LDT or PG synthesis activity in V. cholerae, but for these experiments, the specific incorporation pathway is inconsequential, since we only focus on the end product (stained PG). We think that what we visualize is not the fate of old PG (otherwise we would see similar strong stains with Fosfomycin, which inhibits cell wall synthesis upstream of PG strand generation by PBPs/SEDS), but rather visualizes the generation of long, uncrosslinked PG strands due to the inhibition of PBP transpeptidase activity. We have added more explanations of this assay to the revised manuscript.

      The last conclusion is not supported by data. There are no data showing that LTG activity is temporally separate from PG synthesis.

      We would like to point out that this is not framed as a conclusion per se, but rather a plausible speculation. Our data showing soluble strand accumulation in the WT strongly suggest that LTGs do not work in perfect harmony with synthesis, but rather degrade strands AFTER they accumulate (i.e., temporally separate). We further believe that complementation with a heterologous enzyme (MltE), which does not have a homolog in V. cholerae strongly argues that LTGs and PG synthesis do not have to associate through protein-protein interactions. All this adds to an emerging model that PG synthesis and LTG-mediated degradation are not as tightly co-ordinated as one might assume.

    1. Author Response:

      Reviewer #1:

      The paper by Sun et al uses a previously published computational model of the insect central complex and expands the applicability of this model. While the original model was developed to generate a biologically plausible neural circuit for producing visually guided navigation behavior (integrating three distinct navigation strategies), the new paper shows that the same model can be used to produce navigation behavior in response to multimodal sensory information. In particular, the authors show that olfactory navigation as well as wind-guided navigation can be seamlessly integrated with visual behaviors.

      The authors link the computational model to postulate neural mechanisms that are inspired by known features of the insect central complex. Using the model, behavioral observations, in particular from ants, can be readily reproduced, including tasks in which the animals had to switch between guidance cues, e.g. from visually driven path integration to odor based location of a nest entrance, or were blown off course by wind.

      The manuscript clearly requires that the first paper by the same group is read first, as many core concepts of the computational model are introduced in that paper. When viewed as such an extension (as intended by the 'Research Advances' article type), the paper adds valuable insights and stimulates thought and hypothesis development regarding concepts of multimodal integration. In light of the increasing amount of data on insect brain connectomics, hypotheses based on biologically inspired computational models are highly useful.

      While the authors refer to their computational model as 'biologically realistic', several features (e.g. a ring attractor circuit in the fan shaped body) are speculative and, to date, unconfirmed in any insect, despite the existence of the fruit fly connectome. This does not mean that the model is conceptually wrong, especially as it allows to faithfully reproduce complex behavioral data and shifting of activity 'bumps' across the width of the central complex (as is key for the model) is likely one of the principal functions of the fan-shaped body circuit. Yet the exact nature of the neural implementation might have to be adjusted once relevant data on ants becomes available. The model, as it stands, should hence be seen as 'biologically plausible" rather than 'realistic'.

      That said, the addition of the new aspects of the model shows how flexible the proposed circuit is for coordinated navigational control in insects, and, interestingly, highlights analogous concepts found in the basal ganglia of mammals - a thought-provoking parallel that is in line with ideas of deep homology between these distant brain regions.

      Thanks for your comments. We have carefully check thorough the paper to update our statements: we use 'bio-plausible' when there is no direct biological evidence supporting this computation.

      Reviewer #2:

      In their previous manuscript, Sun et al. combined existing and hypothesized circuit motifs within the insect central complex (CX) to propose an integrated model for how the region might enable visual route following and homing. In their original framework, circuit motifs within the fan-shaped body allowed for appropriate context-switching. They now show how roughly the same motifs could also allow the model to (optimally) incorporate other sensory inputs, such as odor concentration gradients and wind direction cues, thereby enabling an insect to use the CX for additional behaviors in a context-dependent manner. The model's performance is evaluated in comparison to the behavior of larval and adult insect behavior (flies and ants, for example). This study represents a useful extension of the model's scope, but it would benefit from some additional computational exploration and explanation. As it stands, the figures and figure legends are not self-contained enough to be clearly understandable to the average reader. This new piece would also benefit from a greater focus on alternative models and alternative neural pathways that also subserve at least some of the additional navigational behaviors. The existence of direct olfactory-motor pathways is mentioned in Discussion for example, but deserves to be explored in Results as well. Otherwise, the significance of the authors' model reproducing Drosophila larval chemotaxis is not clear: note that larvae do not have CX circuits of the sort that the model proposes.

      Thanks for the considered feedback. 1) to be more self-contained, we added a new Figure 1 that introduces the previous models’ key features. We also updated the legends and captions throughout to make them clearer; 2) to justify our simulation of Drosophila olfactory navigation in larvae that do not possess of the CX, we added a specific discussion to the text (Results and Discussion), and added a panel showing the different brain structures for adult and larvae in Figure 2 (revised version). We hope that every important thing is clear now.

      Reviewer #3:

      Sun et al. propose an excellent study on multi-sensory fusion in ant when the animal is confronted to both wind and odour source or when conflict exists between chemotaxis and path integration. The paper is very well written and the figures are very clearly designed. The list of references is complete. On my opinion, this paper might be considered as a companion paper of a previous paper published in eLife (Sun et al 2020) featuring a strong impact on the plausible strategy that can be used by ant to integrate various cues (odour, wind, proprioception) but a weaker impact (because already published in the first paper) on the neuronal model of the various structures involved in the central complex of the ant's brain: protocerebrum bridge (PB), fan-shape body (FB) and ellipsoid body (EB). An interesting copy and shift function already described in (Sun et al 2020, eLife) seems to be well suited to generate the appropriate motor commands of the heading in response to a difference between the current heading and the measured heading (sensory feedback control). To summarize, this copy and shift function tends to minimize the heading error by making ant turn left or right. It is worth noting that the simulated responses of the ant have been compared to the real data published in previous papers by others.

      I have the following main concerns about this work:

      • First, the steering function as regard of the shift and copy mechanism should be recalled and carefully explained (figure 2B of the previous paper published in eLife)

      We have added a new Figure 1 that introduces the previous model (including the crucial copy-and-shift mechanism) to the reader.

      • About figure 1C and the on-off response of the ant: authors argue that their model replicates faithfully the ant's response: however to be absolutely convinced by this statement, authors must take into account in their simulation the following parameters used by Alavarez-Salvada et al. : ground speed, angular velocity, curvature and turn probability. If the simulated and real responses are similar, we should observe an ON response consisting of upwind orientation coupled with faster and straighter trajectories, and an OFF response consisting of slower and more curved trajectories. It is not clearly the case in the current version of the paper due to a lack of thorough analysis between the parameters listed previously. I can not see more curved trajectories in the OFF response.

      Thanks for these comments. We have included more detailed analysis of the paths of the model in the supplementary material of the revised paper (Figure 2-Figure supplement 3 and Figure 3-Figure supplement 1) in preference for a clear behavioural example in the main figures (Figure 2C and Figure 3C). Specifically, the supplementary analysis includes angular velocity (as requested), upwind speed (a function of path directness rather than animal velocity), and the perceived odour concentrations. These were chosen to match the data presented by Álvarez-Salvado et al. (2018) and allow direct comparison with their results.

      Indeed, the model generates higher 'upwind speed' and smaller 'angular velocity' (more straight) during ON-response phases than that of OFF-response as reported in real animals.

      Regarding plotting ground speed, in the current model the speed of motion is constant which we have clarified in the Methods section.

      Regarding 'curvature' and 'turn probability', given our simulation settings we do not think these two parameters are informative for the following reasons. The ground speed is constant within each simulation, and thus the 'curvature' which is calculated by dividing angular velocity by ground speed provides identical information as the 'angular velocity'. Also, the model alters the heading angle at each step so there is no notion of turning vs. not turning against which to assign a probability value.

      Given the above, we have softened the claims in the Results section from 'closely matches the behavioural data' to 'similar to the behavioural data' which we believe more closely aligns with the reviewer’s perspective.

      • About the simulated behaviour shown in figure 2C: this is a very interesting and critical point because here a conflict is produced between two sensory modalities: path integration and chemotaxis. Authors must clarify how is this conflict processed/managed by the ring attractor? Is it due to changes in the dynamics of the measurements (odor and path integration) or due to changes in the ring attractor itself? By the way, I strongly encourage authors to provide temporal simulation of the ring attractor state: plot the input and ring's output signals at different time steps to see clearly a shift in the bump output.

      Thank you, we have considered this comment in detail and agree that this could be clearer. For clarity, the resultant paths are driven by the ring attractor and integrating the dynamic values of PI and olfactory based on their dynamic certainties. This capability is inherent in ring attractor networks and something that we want to highlight. To make this point clearer, we have added a new video (Figure 3-video 1) to demonstrate the dynamics of the process.

      • About the angular resolution of the PB, FB and EB: how many different angular directions are coded? How many neurons are simulated in each structure? In MM section (equation 10 page 13) it is indicated that the angular resolution (shifting accuracy) has been improved by a factor 10 (from 45{degree sign} to 4.5{degree sign}) to achieve better performance. This point must be indicated and discussed in the main text because it is related to the accuracy of the heading measurement and thus to the behaviour of the simulated ant. How can the ant improve its heading accuracy despite a coarse resolution of central complex in the heading measurement?

      Apologies for the confusion here. The accuracy of the heading direction system is not course due to the population encoding across the 8 neural bins of the PB. Much in the same way that many colours can be encoded using just red, green and blue values combined appropriately.

      When we said that we improved the shifting 'accuracy' we were referring to resolution by which the 'activity bump' could be shifted across the population. In our previous model, 'shifts' of 45deg (i.e., one column each step) was sufficient for accurate visual guidance, but insufficient for accurate olfactory navigation (from experiments). Thus, we improved the resolution by which the activity bump could be shifted to 4.5deg. To clarify this point we have changed the term 'accuracy' for 'resolution' in the main text.

      An interesting research question for future work could be an investigation of the mechanism of shifting and their impact on performance in various guidance tasks.

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper has a number of strengths: The basic question of whether individual ORNs drive behavior is an important one, and the authors test ~90% of the different ORN classes which is very extensive. The presentation of the data is beautiful and well-conceived. And the paper is extremely readable. The core observation that only 10/45 tested ORNs can drive locomotor behavior on their own is an important result, as it addresses some labelled-line ideas that have been prevalent in the field.

      There are a few results which are rather unexpected, and one may wonder whether they are somewhat unique to the behavioral apparatus the authors use:

      The absence of an influence of wind on odor responses may not be general (i.e. the authors show it doesn't change single-ORN-elicited behavior, that is likely not true with odor-guided behavior where wind direction is an important cue for the animals to localize the odor source). It is possible that the narrow corridors in the assay used here promotes locomotor behavior in flies (as opposed to in an open arena where flies locomote much less, and may need wind stimulation to promote movement).

      We thank the Reviewer for raising this issue. We agree that experimental results should be interpreted in view of the natural condition. Please note that Bell and Wilson used narrow 50 × 5 × 1.2-mm chambers, similar in size to the 50 × 4 × 3 mm chambers used in the WALISAR assay. Since both widths are narrow, we consider that this minor difference alone cannot easily account for the discrepancy between the two studies. There are other parameters that we consider more likely to account for this, for example, the use of blind flies.

      Action taken: We have added a sentence on this to the new Limitations section in the Discussion, and added a row to Supplementary File 2 - Sheet 3.

      The counter-intuitive effect of starvation on behavior driven by food-sensing ORNs may reflect the fact the optogenetic ORN stimulation is very strong, as the authors discuss. This result can only be interpreted if the spike rates elicited with their optogenetic stimulation were known.

      This result echoes previous results showing that the relationship between hunger and ORN responses can be complicated (Root et al., 2011). We request that electrophysiology is not required for publication.

      Action taken: We have added a row to the Supplementary Table 4 showing that no electrophysiology was done, and have added a sentence about this to the new Limitations section in the Discussion.

      Knowing where in the dynamic range of ORN firing the optogenetic stimulation lies will also be important for interpreting the pairwise interactions between ORNs. For example summation may be more apparent when lower ORN firing rates are being combined. While analyzing ORN pairs, it would also be more informative to examine each pair individually across a range of stimulation intensities, since some pairs may summate and others may max pool etc.

      Individual pairs were examined in Figure 5. Our systematic analyses contradict the earlier published conclusions, suggesting a greater range of combination rules, consistent with complex interactions. Also, these analyses found shifts in weightings suggesting that, rather than fixed, the combination rules are dynamic.

      Action taken: We request that doing further combination analyses are not made a requirement of publication.

      Also in Fig 5G-I the authors analyze all ORN pairs together to look for summation etc. It is more informative to examine each pair individually across a range of stimulation intensities, since some pairs may summate and others may max pool etc. So this data, currently in SuppFig8. should be moved to the main text.

      Action taken: We have moved that Supplementary Figure to the main text.

    1. Author Response:

      Reviewer #1:

      Authors introduce a deep learning-based toolbox (ELEPHANT) to provide ease in annotation and tracking for 3D cells across time. The study takes two datasets (CE and PH) to demonstrate the performance of their method and compare it with two existing 3D cell tracking methods on segmentation and accuracy metrics. 3D U-Nets are shown to be performing well in segmentation tasks in recent years, authors also utilize 3D U-Net for segmenting cells as well as linking the nuclei across time through optical flow. The variation in selected datasets is shown to be in the shape, size and intensity of cells. Beyond segmentation, authors also demonstrate the performance of ELEPHANT in exploring the tracking results with and without optical flow and regenerating their fate maps. A complete server-based implementation is provided with detailed codebase and docker images to implement and utilize ELEPHANT.

      Strengths:

      The paper is technically sound with detailed explanation of each methodological step and results. 3D U-Nets are optimized for the segmentation task in hand with large training sessions, efficiency of the pipeline is nicely demonstrated which serves this as a useful toolbox for real-time annotation and prediction of cell structures. The detailed implementation on a local and remote server is presented which is a need while handling and analyzing large scale bio-imaging datasets. Beyond smoothing, SSIM-based loss is effectively applied to make the model robust against intensity and structural variations which definitely helps in generalized performance of the segmentation and tracking pipeline.

      Segmentation results are validated on a large set of nuclei and links which is helpful to understand the limitation of the models. The advantage of using optical flow-based linking is clearly shown on top of using nearest neighbors. Spatio-temporal distribution of cells on a given data guides the users in using the framework for several biological applications such as tracking the lineage of newly born cells - a hard task in stem cell engineering.

      A detailed implementation on both remote and server as well as open-source codebase on Github is well provided for the scientific community which will help the users to easily use ELEPHANT for specific datasets. Although CE and PH datasets are used to demonstrate the performance, however, similar implementation can also be performed on neuronal datasets that would be of much use in exploring neurogenesis.

      Weaknesses:

      Authors use ellipse-like shapes to annotate the data, however, many cells are not elliptic or circular in shape but consist of varying morphology. If the annotation module is equipped with drawing free annotations then it will be better useful to capture the diverse shapes of cells in both training and validation. This also limits the scope of the study to be used only for cells' datasets that are circular/elliptical in shape.

      ELEPHANT can be used to track nuclei or cells of diverse shapes. Tracking is based on reliable detection of nuclei/cells but does not require precise segmentation of their shapes. We have now added results showing that ellipsoid approximations are sufficient for detection and cell tracking, even when tracking cells with complex and variable shapes (figure 3).

      As we now explain in the manuscript (page 4), we use ellipsoids for annotation because they are essential for rapid and efficient training and predictions, which are the backbone of interactive deep learning. In practice, using ellipsoids also reduces the amount of work required for annotating the data compared with precise drawing of cell outlines. Post-processing can be appended to our workflow if a user needs to extract the precise morphology of cells.

      Authors use 3D U-net for segmentation which is a semantic segmenter, perhaps, an instance-based 3D segmenter could be a better choice to track the identity of the cells across time and space. However, an instance-based segmenter may not be ideal for segmenting the cells boundaries but a comparison between a 3D U-Net and an instance-based 3D segmenter on the same datasets will be helpful to evaluate.

      Although the original 3D U-Net is a semantic segmenter, we use its architecture to estimate the center region of cells, which works as an instance-wise detector. A similar strategy was followed by recent techniques (Kok et al. 2020, PLoS One doi:10.1101/2020.03.18.996421, Scherr et al. 2020 PLoS One doi:10.1371/journal.pone.0243219) to identify cell instances. Instance-based segmenters (e.g. StarDist, Mask R-CNN) are particularly useful for precise segmentation but our primary focus here is detection and tracking, which can be done most efficiently with the current architecture. Because StarDist or Mask R-CNN do not support sparse annotations, a direct comparison of these methods is difficult at the moment.

      The selected datasets seem to be capturing the diversity in shape and intensity, however, the biological imaging datasets in practice often have low signal to noise ratio, cell density variation and overlapping, etc. It seems like the selected datasets lack these diversities and a performance on any other data of such kind would be useful for performance evaluation as well as providing a pre-trained model for the community usage. Moreover, it would also be useful to demonstrate the performance of the framework in segmenting+tracking any 3D neuronal nuclei dataset which will broaden the scope of the study.

      The PH dataset that we used for testing ELEPHANT presents many challenges, such as variations in intensity, areas of low signal to noise ratio, densely packed and overlapping nuclei (see manuscript page 7, Suppl. Figure 5). To add to this analysis, we have now applied our method to additional datasets that show diverse characteristics – including datasets with elongated/irregular-shaped cells from the Cell Tracking Challenge (Figure 3E) and organoids imaged by light and confocal microscopy (Figure 3C,D) – demonstrating the versatility of our method. We do not think that neuronal nuclei present a particular challenge for ELEPHANT (the PH dataset includes neurons).

      We now also provide a pre-trained model, trained with diverse image datasets, which can be applied by users as a starting point for tracking on new image data.

      The 3D U-Nets are used for linking by using the difference between two consecutive images (across time) as labels. However, this technique helps to track the cell in theory but may also result in losing cell identity when cells are overlapping or when boundary features are less prominent, etc. Perhaps, a specialized deep neural network such as FlowNet3D could be a better choice here.

      Our 3D U-Net does not directly generate links across consecutive images. Instead it produces voxel-wise optical flow maps for each of the three dimensions, which are then combined with detection results to predict the position for each object (see manuscript page 6 and Methods). This is then used for linking. The identity of the tracked objects is defined during detection.

      In the end, our approach is similar to FlowNet3D in that both estimate optical flow for each detected object, although we use two consecutive images as input instead of the sets of detected objects. FlowNet3D operates only on object coordinates, without taking into account image features that could be important cues for cell tracking (e.g. fluorescence intensity of nuclei during cell division).

      Reviewer #2:

      The authors created a cell tracking tool, which they claimed was user-friendly and achieved state-of-the-art performance.

      Would a user, particularly a biologist, be able to run the code from a set of instructions clearly defined on the readme? This was not possible for me. I am not familiar with Java or Mastodon, but I'm not sure we can expect the average biologist to be familiar with these tools either. I was very impressed by the interface provided though.

      We have updated the user manual and software interface to make the software more accessible for users. Moreover, ELEPHANT is now available as an extension on Fiji, which will greatly facilitate its adoption by non-expert users.

      Did the authors achieve state-of-the-art performance? It is unclear from the paper. It would be helpful to see comparisons of this tool with modern deep learning approaches such as Stardist. Stardist for instance reports performance on the parhyale dataset in their paper. Many people in the field are combining tools like Stardist with cell tracking tools like trackmate (e.g. see https://www.biorxiv.org/content/10.1101/2020.09.22.306233v1). It would be important to know whether one can get performance comparable to Stardist (at e.g. a 0.5 IoU threshold) on a single 3D with this sparse labelling and interactive approac. I still think this approach of using sparse labelling could be very useful for transferring to novel datasets, but it is difficult to justify the framework if there is a large drop in performance compared to a fully supervised algorithm.

      The novelty in ELEPHANT is making deep learning available for cell tracking and lineaging by users who do not have extensive annotated datasets for training. Existing deep learning applications (including StarDist) do not fulfill this purpose.

      The detection and tracking scores of ELEPHANT in the Cell Tracking Challenge (identified as IGFL-FR) were the best when applied to cell lineaging on C. elegans test datasets, compared to a large number of other tracking applications (http://celltrackingchallenge.net/latest-ctb-results/). This comparison includes methods that employ deep-learning.

      ELEPHANT models trained with sparse annotation perform similarly well to trained StarDist3D models for nuclear detection in single 3D stacks (see Supplementary Figure 8). For cell tracking over time, StarDist and Trackmate have so far only been implemented in 2D.

      Reviewer #3:

      This work describes a new open source tool (ELEPHANT, https://elephant-track.github.io/) for efficient and interactive training of a deep learning based cell detection and tracking model. It uses the existing Fiji plugin Mastodon as an interactive front end (https://github.com/mastodon-sc/mastodon). Mastodon is a large-scale tracking and track-editing framework for large, multi-view images. The authors contribution is an extension of Mastodon, adding automated deep learning based cell detection and tracking. Technically, this is achieved by connecting the Mastodon as a client (written in Java) to a deep learning server (written in Python). The server can run on a different dedicated computer, capable of the GPU based computations that are needed for deep learning. This framework makes possible the detection and tracking of cells in very large volumetric data sets, within a user friendly graphical user interface.

      Strengths:

      1) It is great to reuse an existing front-end framework like Mastodon and plug in a deep learning back-end! Such software design avoids reinvention of the wheel and avoids that users need to learn too many tools.

      2) The idea to use sparse ellipsoids as annotations for cell detection is in my view fantastic as it allows very efficient annotation. This is much faster than having to paint dense 3D ground truth as is required for most deep learning algorithms.

      3) It is great that the learning is so fast that it is essentially interactive!

      Opportunities for improvements:

      The software in its current form had a view issues that made it a little hard to use. It would be great if those could be addressed in future versions.

      1) There are several options for how to set up the ELEPHANT server. In any case this requires quite some technical knowledge that may prevent adoption by a broader user base. It would thus be great if this could be further streamlined.

      We thank reviewer 3 for the very useful and detailed suggestions on improving the user interface of ELEPHANT. We have implemented most of these suggestions and we plan to pursue additional ones in future versions of the software. In brief:

      • To facilitate the setting up of the ELEPHANT server, we have implemented a control panel that allows users to monitor the process and provides links to the relevant section of the user manual and to Google Colab.
      • ELEPHANT is now available as an extension on Fiji, which will greatly facilitate its use by non-expert users.
      • Pre-trained detection and linking models, trained on diverse image datasets, are now available on the ELEPHANT github.
      • Image data can be uploaded and converted automatically via the Fiji/Mastodon interface when the image data files are missing on the server.

      2) For a GUI based software it is becoming state-of-the-art to provide recorded videos that demonstrate how to use the software. This is much more telling than written text. The authors added very nice short videos to the documentation, but I think it would be essential to also provide a longer video (ideally with voice over) where the authors demonstrate the whole workflow in one go.

      We are preparing a demo video on YouTube, which will be embedded in the user manual.

      3) As a user one interacts with the Mastodon software which sends requests to the ELEPHANT client. It would be great if the feedback for what is going on server side could be improved. For example adding progress bars and metrics for the process of the deep learning training that are visualized within Mastodon would be, in my view, very important for the usability.

      We added a log window in which users can monitor the processes that are running on the server.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] While the study is addressing an interesting topic, I also felt this manuscript was limited in novel findings to take away. Certainly the study clearly shows that substitution saturation is achieved at synonymous CpG sites. However, subsequent main analyses do not really show anything new: the depletion of segregating sites in functional versus neutral categories (Fig 2) has been extensively shown in the literature and polymorphism saturation is not a necessary condition for observing this pattern.

      We agree with the reviewer that many of the points raised were appreciated previously and did not mean to convey another impression. Our aim was instead to highlight some unique opportunities provided by being at or very near saturation for mCpG transitions. In that regard, we note that although depletion of variation in functional categories is to be expected at any sample size, the selection strength that this depletion reflects is very different in samples that are far from saturated, where invariant sites span the entire spectrum from neutral to lethal. Consider the depletion per functional category relative to synonymous sites in the adjoining plot in a sample of 100k: ~40% of mCpG LOF sites do not have T mutations. From our Fig. 4 and b, it can be seen that these sites are associated with a much broader range of hs values than sites invariant at 780k, so that information about selection at an individual site is quite limited (indeed, in our p-value formulation, these sites would be assigned p≤0.35, see Fig. 1). Thus, only now can we really start to tease apart weakly deleterious mutations from strongly deleterious or even embryonic lethal mutations. This allows us to identify individual sites that are most likely to underlie pathogenic mutations and functional categories that harbor deleterious variation at the extreme end of the spectrum of possible selection coefficients. More generally, saturation is useful because it allows one to learn about selection with many fewer untested assumptions than previously feasible.

      Similarly, the diminishing returns on sampling new variable sites has been shown in previous studies, for example the first "large" human datasets ca. 2012 (e.g. Fig 2 in Nelson et al. 2012, Science) have similar depictions as Figure 3B although with smaller sample sizes and different approaches (projection vs simulation in this study).

      We agree completely: diminishing returns is expected on first principles from coalescent theory, which is why we cited a classic theory paper when making that point in the previous version of the manuscript. Nonetheless, the degree of saturation is an empirical question, since it depends on the unknown underlying demography of the recent past. In that regard, we note that Nelson et al. predict that at sample sizes of 400K chromosomes in Europeans, approximately 20% of all synonymous sites will be segregating at least one of three possible alleles, when the observed number is 29%. Regardless, not citing Nelson et al. 2012 was a clear oversight on our part, for which we apologize; we now cite it in that context and in mentioning the multiple merger coalescent.

      There are some simulations presented in Fig 4, but this is more of a hypothetical representation of the site-specific DFE under simulation conditions roughly approximating human demography than formal inference on single sites. Again, these all describe the state of the field quite well, but I was disappointed by the lack of a novel finding derived from exploiting the mutation saturation properties at methylated CpG sites.

      As noted above, in our view, the novelty of our results lies in their leveraging saturation in order to identify sites under extremely strong selection and make inferences about selection without the need to rely on strong, untested assumptions.

      However, we note that Fig 4 is not simply a hypothetical representation, in that it shows the inferred DFE for single mCpG sites for a fixed mutation rate and given a plausible demographic model, given data summarized in terms of three ranges of allele frequency (i.e., = 0, between 1 and 10 copies, or above 10 copies). One could estimate a DFE across all sites from those summaries of the data (i.e., from the proportion of mCpG sites in each of the three frequency categories), by weighting the three densities in Fig 4 by those proportions. That is, in fact, what is done in a recent preprint by Dukler et al. (2021, BioRxiv): they infer the DFE from two summaries of the allele frequency spectrum (in bins of sites), the proportion of invariant sites and the proportion of alleles at 1-70 copies, in a sample of 70K chromosomes.

      To illustrate how something similar could be done with Fig. 4 based on individual sites, we obtain an estimate of the DFE for LOF mutations (shown in Panel B and D for two different prior distributions on hs) by weighting the posterior densities in Panel A by the fraction of LOF mutations that are segregating (73% at 780K; 9% at 15K) and invariant (27% and 91% respectively); in panel C, we show the same for a different choice of prior. For the smaller sample size considered, the posterior distribution recapitulates the prior, because there is little information about selection in whether a site is observed to be segregating or invariant, and particularly about strong selection. In the sample of 780K, there is much more information about selection in a site being invariant and therefore, there is a shift towards stronger selection coefficients for LOF mutations regardless of the prior.

      Our goal was to highlight these points rather than infer a DFE using these two summaries, which throw out much of the information in the data (i.e., the allele frequency differences among segregating sites). In that regard, we note that the DFE inference would be improved by using the allele frequency at each of 1.1 million individual mCpG sites in the exome. We outline this next step in the Discussion but believe it is beyond the scope of our paper, as it is a project in itself – in particular it would require careful attention to robustness with regard to both the demographic model (and its impact on multiple hits), biased gene conversion and variability in mutation rates among mCpG sites. We now make these points explicitly in the Outlook.

      Similarly, I felt the authors posed a very important point about limitations of DFE inference methods in the Introduction but ended up not really providing any new insights into this problem. The authors argue (rightly so) that currently available DFE estimates are limited by both the sparsity of polymorphisms and limited flexibility in parametric forms of the DFE. However, the nonsynonymous human DFE estimates in the literature appear to be surprisingly robust to sample size: older estimates (Eyre-Walker et al. 2006 Genetics, Boyko et al. 2008 PLOS Genetics) seem to at least be somewhat consistent with newer estimates (assuming the same mutation rate) from samples that are orders of magnitude larger (Kim et al. 2017 Genetics).

      We are not quite sure what the reviewer has in mind by “somewhat consistent,” as Boyko et al. estimate that 35% of non-synonymous mutations have s>10^-2 while Kim et al. find that proportion to be “0.38–0.84 fold lower” than the Boyko et al. estimate (see, e.g., Fig. 4 in Kim et al., 2017). Moreover, the preprint by Dukler et al. mentioned above, which infers the DFE based on ~70K chromosomes, finds estimates inconsistent with those of Kim et al. (see SOM Table 2 and SOM Figure S5 in Dukler et al., 2021).

      More generally, given that even 70K chromosomes carry little information about much of the distribution of selection coefficients (see our Fig. 4), we expect that studies based on relatively sample sizes will basically recover something close to their prior; therefore, they should agree when they use the same or similar parametric forms for the distribution of selection coefficients and disagree otherwise. The dependence on that choice is nicely illustrated in Kim et al., who consider different choices and then perform inference on the same data set and with the same fixed mutation rate for exomes; depending on their choice anywhere between 5%-28% of non-synonymous changes are inferred to be under strong selection with s>=10^-2 (see their Table S4).

      Whether a DFE inferred under polymorphism saturation conditions with different methods is different, and how it is different, is an issue of broad and immediate relevance to all those conducting population genomic simulations involving purifying selection. The analyses presented as Fig 4A and 4B kind of show this, but they are more a demonstration of what information one might have at 1M+ sample sizes rather than an analysis of whether genome-wide nonsynonymous DFE estimates are accurate. In other words, this manuscript makes it clear that a problem exists, that it is a fundamental and important problem in population genetics, and that with modern datasets we are now poised to start addressing this problem with some types of sites, but all of this is already very well-appreciated except for perhaps the last point.

      At least a crude analysis to directly compare the nonsynonymous genome-wide DFE from smaller samples to the 780K sample would be helpful, but it should be noted that these kinds of analyses could be well beyond the scope of the current manuscript. For example, if methylated nonsynonymous CpG sites are under a different level of constraint than other nonsynonymous sites (Fig. S14) then comparing results to a genome-wide nonsynonymous DFE might not make sense and any new analysis would have to try and infer a DFE independently from synonymous/nonsynonymous methylated CpG sites.

      We are not sure what would be learned from this comparison, given that Figure 4 shows that, at least with an uninformative prior, there is little information about the true DFE in samples, even of tens of thousands of individuals. Thus, if some of the genome-wide nonsynonymous DFE estimates based on small sample sizes turn out to be accurate, it will be because the guess about the parametric shape of the DFE was an inspired one. In our view, that is certainly possible but not likely, given that the shape of the DFE is precisely what the field has been aiming to learn and, we would argue, what we are now finally in a position to do for CpG mutations in humans.

      Reviewer #2 (Public Review):

      This manuscript presents a simple and elegant argument that neutrally evolving CpG sites are now mutationally saturated, with each having a 99% probability of containing variation in modern datasets containing hundreds of thousands of exomes. The authors make a compelling argument that for CpG sites where mutations would create genic stop codons or impair DNA binding, about 20% of such mutations are strongly deleterious (likely impairing fitness by 5% or more). Although it is not especially novel to make such statements about the selective constraint acting on large classes of sites, the more novel aspect of this work is the strong site-by-site prediction it makes that most individual sites without variation in UK Biobank are likely to be under strong selection.

      The authors rightly point out that since 99% of neutrally evolving CpG sites contain variation in the data they are looking at, a CpG site without variation is likely evolving under constraint with a p value significance of 0.01. However, a weakness of their argument is that they do not discuss the associated multiple testing problem-in other words, how likely is it that a given non synonymous CpG site is devoid of variation but actually not under strong selection? Since one of the most novel and useful deliverables of this paper is single-base-pair-resolution predictions about which sites are under selection, such a multiple testing correction would provide important "error bars" for evaluating how likely it is that an individual CpG site is actually constrained, not just the proportion of constrained sites within a particular functional category.

      We thank the reviewer for pointing this out. One way to think about this problem might be in terms of false discovery rates, in which case the FDR would be 16% across all non-synonymous mCpG sites that are invariant in current samples, and ~4% for the subset of those sites where mutations lead to loss-of-function of genes.

      Another way to address this issue, which we had included but not emphasized previously, is by examining how one’s beliefs about selection should be updated after observing a site to be invariant (i.e., using Bayes odds). At current sample sizes and assuming our uninformative prior, for a non-synonymous mCpG site that does not have a C>T mutation, the Bayes odds are 15:1 in favor of hs>0.5x10^-3; thus the chance that such a site is not under strong selection is 1/16, given our prior and demographic model. These two approaches (FDR and Bayes odds) are based on somewhat distinct assumptions.

      We have now added and/or emphasized these two points in the main text.

      The paper provides a comparison of their functional predictions to CADD scores, an older machine-learning-based attempt at identifying site by site constraint at single base pair resolution. While this section is useful and informative, I would have liked to see a discussion of the degree to which the comparison might be circular due to CADD's reliance on information about which sites are and are not variable. I had trouble assessing this for myself given that CADD appears to have used genetic variation data available a few years ago, but obviously did not use the biobank scale datasets that were not available when that work was published.

      We apologize for the lack of clarity in the presentation. We meant to emphasize that de novo mutation rates vary across CADD deciles when considering all CpG sites (Fig. 2-figure supplement 5c), which confounds CADD precisely because it is based in part on which sites are variable. We have edited the manuscript to clarify this.

      Reading this paper left me excited about the possibility of examining individual invariant CpG sites and deducing how many of them are already associated with known disease phenotypes. I believe the paper does not mention how many of these invariant sites appear in Clinvar or in databases of patients with known developmental disorders, and I wondered how close to saturation disease gene databases might be given that individuals with developmental disorders are much more likely to have their exomes sequenced compared to healthy individuals. One could imagine some such analyses being relatively low hanging fruit that could strengthen the current paper, but the authors also make several reference to a companion paper in preparation that deals more directly with the problem of assessing clinical variant significance. This is a reasonable strategy, but it does give the discussion section of the paper somewhat of a "to be continued" feel.

      We apologize for the confusion that arose from our references to a second manuscript in prep. The companion paper is not a continuation of the current manuscript: it contains an analysis of fitness and pathogenic effects of loss-of-function variation in human exomes.

      Following the reviewer’s suggestion to address the clinical significance of our results, we have now examined the relationship of mCpG sites invariant in current samples with Clinvar variants. We find that of the approximately 59,000 non-synonymous mCpG sites that are invariant, only ~3.6% overlap with C>T variants associated with at least one disease and classified as likely pathogenic in Clinvar (~5.8% if we include those classified as uncertain or with conflicting evidence as pathogenic). Approximately 2% of invariant mCpGs have C>T mutations in what is, to our knowledge, the largest collection of de novo variants ascertained in ~35,000 individuals with developmental disorders (DDD, Kaplanis et al. 2020). At the level of genes, of the 10k genes that have at least one invariant non-synonymous mCpG, only 8% (11% including uncertain variants) have any non-synonymous hits in Clinvar, and ~8% in DDD. We think it highly unlikely that the large number of remaining invariant sites are not seen with mutations in these databases because such mutations are lethal; rather it seems to us to be the case that these disease databases are far from saturation as they contain variants from a relatively small number of individuals, are subject to various ascertainment biases both at the variant level and at the individual level, and only contain data for a small subset of existing severe diseases.

      With a view to assessing clinical relevance however, we can ask a related question, namely how informative being invariant in a sample of 780k is about pathogenicity in Clinvar. Although the relationship between selection and pathogenicity is far from straightforward, being an invariant non-synonymous mCpG in current samples not only substantially increases (15-10fold) the odds of hs > 0.5x10-3 (see Fig. 4b), it also increases the odds of being classified as pathogenic vs. benign in Clinvar 8-51 fold. In the DDD sample, we don’t know which variants are pathogenic; however, if we consider non-synonymous mutations that occur in consensus DDD genes as pathogenic (a standard diagnostic criterion), being invariant increases the odds of being classified as pathogenic 6-fold. We caution that both Clinvar classifications and the identification of consensus genes in DDD relies in part on whether a site is segregating in datasets like ExAC, so this exercise is somewhat circular. Nonetheless it illustrates that there is some information about clinical importance in mCpG sites that are invariant in current samples, and that the degree of enrichment (6 to 51-fold) is very roughly on par with the Bayes odds that we estimate of strong selection conditional on a site being invariant. We have added these findings to the main text and added the plot as Supplementary Figure 13.

      Reviewer #3 (Public Review):

      [...] The authors emphasize several times how important an accurate demographic model is. While we may be close to a solid demographic model for humans, this is certainly not the case for many other organisms. Yet we are not far off from sufficient sample sizes in a number of species to begin to reach saturation. I found myself wondering how different the results/inference would be under a different model of human demographic history. Though likely the results would be supplemental, it would be nice in the main text to be able to say something about whether results are qualitatively different under a somewhat different published model.

      We had previously examined the effect of a few demographic scenarios with large increases in population size towards the present on the average length of the genealogy of a sample (and hence the expected number of mutations at a site) in Figure 3-figure supplement 1b, but without quantifying the effect on our selection inference. Following this suggestion, we now consider a widely used model of human demography inferred from a relatively small sample, and therefore not powered to detect the huge increase in population size towards the present (Tennessen et al. 2012). Using this model, we find a poor fit to the proportion of segregating CpG sites (the observed fraction is 99% in 780k exomes, when the model predicts 49%). Also, as expected, inferences about selection depend on the accuracy of the demographic model (as can be seen by comparing panel B to Fig 4B in the main text).

      On a similar note, while a fixed hs simplifies much of the analysis, I wondered how results would differ for 1) completely recessive mutations and 2) under a distribution of dominance coefficients, especially one in which the most deleterious alleles were more recessive. Again, though I think it would strengthen the manuscript by no means do I feel this is a necessary addition, though some discussion of variation in dominance would be an easy and helpful add.

      There's some discussion of population structure, but I also found myself wondering about GxE. That is, another reason a variant might be segregating is that it's conditionally neutral in some populations and only deleterious in a subset. I think no analysis to be done here, but perhaps some discussion?

      We agree that our analysis ignores the possibilities of complete recessivity in fitness (h=0) as well as more complicated selection scenarios, such as spatially-varying selection (of the type that might be induced by GxE). We note however that so long as there are any fitness effects in heterozygotes, the allele dynamics will be primarily governed by hs; one might also imagine that under some conditions, the mean selection effect across environments would predict allele dynamics reasonably well even in the presence of GxE. Also worth exploring in our view is the standard assumption that hs remains fixed even as Ne changes dramatically. We now mention these points in the Outlook.

      Maybe I missed it, but I don't think the acronym DNM is explained anywhere. While it was fairly self-explanatory, I did have a moment of wondering whether it was methylation or mutation and can't hurt to be explicit.

      We apologize for the oversight and have updated the text accordingly.

    1. Author Response:

      Reviewer #2 (Public Review):

      Tissue microarrays have become a mainstay in clinical and basic research, for both discovery and validation of biomarkers. The authors approach the possible sampling variation in a thoughtful way, not only quantifying the issue systematically, but working towards a solution.

      Major Comments:

      o The authors split the variation in to two co-existing explanations, either intratumoral heterogeneity or batch effect (likely a degree of both play a role). Batch correction inherently reduces noise (the latter) at the cost of reducing signal (the former). It would be useful to know what approaches have been employed to test for overfitting. The authors claim in the introduction the use of different methods for maintaining "biological" variation, but that analysis seems limited.

      We agree that overfitting is a potential concern for any model. The large number of tumor cores per each batch is less likely to give rise to overfitting if few parameters per batch are estimated. We consider overfitting of the adjustment models a separate problem from overadjustment, which would remove biological variation and which depends on balancing of batches with respect to biological factors. The results from our simulations (Fig. 5, Fig. 5–figure supplement 1) address the latter. “Biological variation” between TMAs was maintained in each simulated data set (Fig. 5–figure supplement 1). All mitigation approaches are more successful in recovering the true association (Fig. 5) compared to not addressing batch effects.

      o Were there considerations for the variability in Gleason scoring between members of the study team?

      We agree that this is an important consideration. Gleason scores in our study are from a centralized, standardized re-review of full tissue sections performed before constructing the TMAs. These use cores from the highest-density tumor regions. See Stark et al. (JCO 2009, referenced) on how variability was removed.

      o The manuscript involves the processing of a number of different cohorts in the field of prostate cancer. It would be important to know how would the performance of the batchma approach would change in tumors with greater heterogeneity.

      We do not have additional empirical data. We would to like to emphasize that there is substantial heterogeneity within the large prostate cancer case series that we analyzed, which was sampled from population-based cohorts. Moreover, in the last paragraph of the section, “Validation batch effect mitigation in plasmode simulation,” we tested the methods implemented in the batchtma package in simulations that involved scenarios with far greater heterogeneity than empirically observed (Figure 5– figure supplement 3; the actual data on biomarkers with high between-TMA ICCs corresponds to the setting “some confounding”).

    1. Author Response:

      Reviewer #2 (Public Review):

      In all vertebrate species investigated, cerebrospinal fluid contacting the cerebrospinal fluid express the channel PKD2L1 (in macaques, mice and zebrafish: Djenoune et al., Frontiers in Neuroanatomy 2014; in lamprey: Jalalvand et al., Current Biology 2016b; Jalalvand et al J Neurosci 2018). However, in all species investigated these cells fall into two functional types based on their axial sensitivity to detect spinal curvature (in vivo for zebrafish: Bohm et al., Nature Communications 2016; Hubbard et al., Current Biology 2016), expression of neuropeptides and neuromodulators (in lamprey;: Christenson et al., Neurosci Letter 1991; Schotland et al., JCN 1996; in zebrafish: Djenoune et al., Scientific Reports 2017) or their firing patterns (in mouse: Petracca et al J Neurosci 2016; Di Bella et al., Cell Reports 2019).

      While the microscopy techniques used here are outstanding and bring without a doubt important evidence on the location and density of DSVs, there are concerns to address regarding the consolidation and interpretation of the physiological recordings of the ciliated neurons and pharmacology based on evidence that only ASIC1 channel is expressed in lamprey (see phylogenic analysis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3047259/), and that the lamprey ASIC1a is proton insensitive (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1464184/).

      In the article of Coric et al 2005 they had identified one clone of cDNA that corresponded to ASIC1 and when expressed in oocytes, no pH sensitivity was found under these conditions. They do not comment regarding the possible presence of ASIC3. The review of Grunder and Chen (2010) has a focus on ASIC1a and base their comment in lamprey on Coric et al 2005. Our evidence for the presence of ASIC3 in lamprey is that both the mechanical and pH response are blocked by APETx2, a selective antagonist of ASIC3, (Jalalvand et al 2016, Nature. com), strongly suggesting the presence of ASIC3 in the lamprey. ASIC3 is present in both the peripheral and central nervous system in mammals.

      Reviewer #3 (Public Review):

      This manuscript uses a variety of optical superresolution techniques to explore the structure and function of different cerebrospinal fluid contacting (CSF-c) neurons. First, Expansion Microscopy and Lightsheet Microscopy are combined to image large volumes of tissue from lamprey and to demonstrate the known organization of somatostatin-expressing and dopaminergic CSF-c neurons (Fig. 1). The authors then used STED to explore the subcellular location of somatostatin and dopamine in CSF-c neurons and demonstrate their presences in vesicle-like structures ranging between 60 and 200 nm (Fig. 2). Subsequently, the relation between GABA and somatostatin is examined in somatostatin CSF-c neurons. The authors show that there is no obvious colocalization between these molecules and that only somatostatin levels are altered in response to changes in the extracellular pH (Fig. 3). The authors furthermore demonstrate that dopamine levels with dopaminergic CSF-c neurons are also insensitive to pH changes (Fig. 4).

      The authors have previously shown that somatostatin CSF-c neurons are mechanosensitive and now also demonstrate this for dopaminergic CSF-c neurons (Ext Data Fig. 1). They also show that their mechanosensitivity is mediated differently (ie. Not through ASIC3). To further explore this difference in mechanosensitivity, the authors set out to explore ciliary structure of these ciliated neurons. By combining expansion and STED, the authors succeed in resolving ciliary ultrastructure and demonstrate that they can distinguish between between motile (9+2) and primary cilia (9+0) (Fig. 5). They find that the majority of cilia on somatostatin-expressing CSF-c neurons is primary, whereas all cilia on dopaminergic CSF-c neurons were motile (Fig. 6).

      Overall, this is an interesting imaging study that reports a number of technical steps that enable tissue-imaging with exquisite detail, such as discriminating between motile and primary cilia. It also nicely demonstrates what sort of new data can now be obtained in tissue, e.g. changes in vesicle numbers upon certain stimuli. However, as explained below, both the composition of the main text and the reproduction quality of the figures make it hard to judge the biological significance of this work.

      Comments:

      1/ Overall, I feel the paper needs a thorough rewrite. The introduction should give more insights into the underlying biology and also clarify which questions are being asked and why these are important. Currently the introduction is mostly a long summary of all results, but it doesn't help to understand the biology that underlies this work. Because of that the different experimental pieces currently feel a bit random and disconnected.

      We have rewritten part of the Introduction to better expose the underlying biological questions.

      2/ The reproduction quality of Figures 1, 2, 3 and 6 in the merged PDF is not great. In many cases, I cannot read the annotations or appreciate the content of the images. This make it rather impossible to judge the quality of the work. The data shown in Figure 5 is very impressive and I am sure the raw data for the other figures is equally great, but I can only judge what I see for myself.

      We provide Figures at high resolution.

      3/ Page 8: the conclusion that PKD2L1 is the mechanosensitive receptor for dopaminergic CSF-c neurons is only based on its presence in this cells. To really demonstrate this loss-of-function experiments would be needed.

      PKD2L1 has been found to be responsible for mechanosensitivity in Zebrafish (Böhm et al. 2016).

      We agree that to demonstrate a loss of function in lamprey a knock-out experiment would be needed. Transgenic techniques unfortunately cannot be used in lamprey since each generation lasts around 7 years, and there is no specific blocker for PKD2L1 to apply either. Therefore, we have modified the sentences.

    1. Author Response:

      Reviewer #2 (Public Review):

      The CRISP-Cas9 complex has revolutionized genomic editing techniques. The widespread application of this new molecular tool enables a precise and accurate DNA cleavage that has been impossible to achieve. Yet, in some cases, the system suffers from a lack of specificity. In this paper, the authors present a new study on the characterization of the allosteric communication within the CRISP-Cas9 complex. They identified three different mutations that disrupt the complex's internal allosteric communication, affecting the cleavage reaction's specificity to different extents. The authors argue that the various degrees of perturbation are correlated with the Cas9 specificity. Given the size of the complex, the authors utilize a divide and conquer approach to studying the structural-dynamic changes of the isolated HNH endonuclease using NMR spectroscopy. Then they used molecular dynamics simulations to relate the changes in the isolated enzyme to the entire complex. As marked by the authors, the effects of the selected mutations (K855A, K810A, and K848A) are minimal. The HSQC spectrum in Figure 2B shows only marginal chemical shift changes in the protein fingerprint. The latter is supported by the CD spectra that show no significant perturbations in the dichroic profiles. However, the lineshapes reveal substantial changes in the enzyme dynamics apparent from the broadening of several signals. The chemical shift perturbations, although small, show that K855A has the most pronounced spectroscopic changes followed by K810A and K848A. As expected, the most significant differences are revealed by relaxation studies. The authors performed T1, T2, and heteronuclear NOE experiments to characterize the fast dynamics of the protein in the NMR time scale, revealing the most significant differences in the K855A mutant.

      Additionally, they used CPMG dispersion experiments to analyze the dynamics in the micro-to-millisecond time scale. From these measurements, the authors conclude that the relaxation characteristics of the mutants do not change significantly, i.e., the mutants possess conformational flexibility similar to the wild type. To interpret the dynamic behaviors of the different HNH variants, the authors performed MD simulations and analyzed the allosteric network using community analysis. The computational work revealed the connections between the communities and how the mutants affect interdomain communication (figure 5).

      Overall the paper is exciting and shows how NMR and MD simulations can be used synergistically to dissect the intra- and inter-molecular allosteric communication in highly complex systems. However, there are a few shortcomings that the authors need to address. One significant concern is the lack of a direct comparison between the NMR studies and the MD simulations. Additionally, it is unclear how these dynamics or structural perturbations caused by these selected mutants are converted into the enzyme's increased or decreased specificity.

      Other technical concerns:

      A) The authors performed relaxation measurements for fast dynamics. However, they did not calculate the order parameters for the protein backbone. Usually, the order parameters from the protein backbone can be directly compared to the calculated values from MD trajectories. How do the S2 values from the two techniques compare?

      The reviewer is absolutely correct and we have now included S^2 parameters for each K-to-A mutant and determined the difference from WT HNH (new Figure Supplement S7). We also added discussion of these data on page 9, lines 185-189. Briefly, S^2 parameters for each mutant are quite similar to those of WT HNH, evidenced by DeltaS^2 values (greater or equal to) 0.1 for the majority of residues. These data also mirror DeltaS^2 values determined from MD simulations. Further, we note agreement between S^2 and the 1^H-[15^N] NOE that show depressed values sporadically between residues 800-825, surrounding residue 850, and at the C-terminus.

      B) The authors state that the differences in the relaxation dispersion profiles are less than 1.5 Hz, indicating small changes in dynamics. Did the author compare all residues or a subset of residues?

      C) In the discussion, the authors refer to the synchronous motions that may be responsible for specificity. How did they deduce that the motions are synchronous? From MD simulations or the global fitting of the CPMG curves? Do motions need to be synchronous for effective allosteric communications?

      In the manuscript, we referred to “synchronous” when describing community network analysis (CNA), where groups of residues displaying highly synchronized dynamics are gathered in communities. This wording is indeed employed in several computational studies harnessing CNA (PNAS, 2017, 114, E3414-E3423). We therefore did not employ the word “synchronous” as mechanistically. We have now changed our phrasing in the manuscript to avoid any confusion.

      D) Finally, the authors claim that mutations can target sites identified in this study (hotspots) to improve CRISP-Cas9 function. Can the authors elaborate more on this point? How do they envision mutations to tune the function of the complex?

      We thank the reviewers for this insightful comment, which gives us the opportunity to suggest critical hotspots for mutational studies. Our computational analysis indicated that the three K–to–A mutations mainly disrupt the cross-talk between the A1 and A2 communities (Figure 4). This effect is observed for all mutants, and is confirmed by the analysis of the NMR relaxation data (Figure 5), suggesting that the A1-A2 communities are critical hotspots for the signal transmission. Building on this observation, mutational studies targeting residues of the A1-A2 communities could impact the allosteric communication and, in turn, modulate the function and specificity of the system. We have now included this discussion in the main text (page 16, lines 332-341 and page 18, lines 393-395) adding Figure 6. The Abstract was also amended including this information.

    1. Author Response:

      Reviewer #2 (Public Review):

      Campylobacter jejuni is serious food-borne pathogen and understanding how the various products necessary for pathogenesis are regulated is a key step in preventing its growth and/or treating disease. Here, Sharma and coworkers demonstrate the complex pathway that leads to the maturation of two complementary regulatory RNAs and how one of the RNAs antagonizes the other to relieve repression of a virulence-related gene. The work is detailed and convincing, and provides a reference point for the roles of regulatory RNAs in C. jejuni as well as other bacteria. Future work will be needed to better understand when each of these RNAs is best expressed and processed into active form, and to fully support the idea that one RNA acts as an antagonist for the other.

      We thank the reviewer for their positive feedback on our work. Additional experiments (Figure 7B) provide additional evidence that CJnc180 is an antagonist of CJnc190 and affects ptmG.

      Reviewer #3 (Public Review):

      In this manuscript the authors describe the biogenesis and the mechanism of action of a pair of cis-encoded sRNAs: CJnc190 and CJnc180. Both RNAs are being processed by RNase III. 5' and 3' ends mapping together with in vitro and in vivo experiments using purified RNase III and rnc deletion mutant demonstrated that the processing of CJnc190 sRNA depended on the formation of an intramolecular duplex, while CJnc180 sRNA processing required the presence of the antisense CJnc190 sRNA. The mature CJnc190 and CJnc180 sRNA specious are 69 and 88 nt long respectively. They also show that mature CJnc190 sRNA represses translation of ptmG via base-pairing and CJnc180 sRNA antagonizes CJnc190 repression acting as a sponge, scavenging CJnc190 sRNA. In addition, they find that two promoters are responsible for the synthesis of CJnc190 sRNA and both transcripts are subject to RNase III processing.

      The study represents an enormous amount of work. The data are solid and generally support the overall conclusions. Having said that the manuscript is overwhelming, loaded with too many details which make the reading difficult and in the absence of a bigger picture many times uninspiring.

      We thank this reviewer for the overall positive feedback. We agree that this is a very complex story. We have made several revisions to the text and Figures and have moved details to the Supplementary Information. We hope this facilitates reading of our manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      Rinkenberger et al. take a forward genetics ORF overexpression approach to identify human interferon (IFN)-inducible gene (ISG) products driving host defense to the protozoan pathogen Toxoplasma. The screen encompassing approximately 500 ISG identifies 3 ISG candidates and is able to validate 2 of these 3, namely the transcription factor IRF1 and the retinoic acid receptor responsive gene RARRES3, which becomes the focus of the study. Using gain- and loss-of-function approaches the study demonstrates that RARRES3 promotes the reduction of parasitic burden in human cell lines. Importantly, the study provides evidence linking RARRES3 functionally to the previously reported interferon-inducible defense mechanism of host-mediated parasite extrusion. Overall, the discovery of RARRES3 as an anti-parasitic factor is potentially of broad interest to the field of innate immunity, parasitology and, more generally, microbial pathogenesis, although its physiologically importance or its role in host defense to pathogens other than Toxoplasma was not explored in this study.

      Strengths:

      The paper takes an unbiased genetics approach to identify novel human genes that execute cell-autonomous host defense against the parasite Toxoplasma

      The study is well controlled and convincingly demonstrates that RARRES3 limits parasitic burden in human cell lines, using both gain- and loss-of-function approaches.

      The study provides indirect evidence that RARRES3 mediates the expulsion of parasites from infected cells

      The study shows that some clonal lineages of Toxoplasma are resistant to RARRES3-mediated immunity suggesting that some Toxoplasma strains may have evolved mechanisms to counteract the host defense pathway(s) regulated by RARRES3.

      Weaknesses:

      The physiological relevance of RARRES3-mediated parasite egress during the course of Toxoplasma infections is unclear and not discussed.

      We added a section in the Discussion covering the potential physiological relevance of RARRES3.

      Regarding the failure to see an IDO phenotype (Fig. 1F), the authors may consider that there standard media and serum contains relatively high concentrations of tryptophan (Materials and Methods doesn't provide any information on the exact trp concentration used) and that IDO cannot catabolize the excess amount of tryptophan present in media + serum to achieve tryptophan starvation conditions. I believe previous studies demonstrating IDO-mediated nutritional immunity in cell culture used trp-limited culture conditions. Without any careful experiments using titrated concentrations of trp, the conclusion that IDO cannot restrict Toxo in A549 cells does not seem justified

      A brief discussion of this point has been added to the results discussing figure 1F and the conclusions have been toned down. We have specified in the text that our culture medium (DMEM containing 10% FBS) contains 16 ug/ml concentration of tryptophan. Although this might mask the effects of IDO, we are able to appreciate inhibition of parasite growth in response to INF-g, suggesting this pathway is not the most important in A549 cells. We agree with the importance and requirement of tryptophan for parasite growth, we just didn't observe the involvement of IDO1 in our experimental set up.

      The authors state that RARRES3 deficiency was complemented with RARRES3 ectopic expression. However, it is unclear from the data presentation whether complemented KOs are statistically different from controls (KO + FLUC) under IFNgamma primed conditions (Fig. 5B) and thus whether complementation was actually achieved.

      The complemented KO is not significantly different from WT demonstrating complementation. A “ns” comparison between these bars has been added to the figure for clarity.

      The paper lacks any direct evidence for RARRES3-mediated parasite egress.

      We conducted a live imaging experiment to directly observed parasite egress in RARRES3 and FLUC ectopically expressing A549s. The data is presented in figure 6C and videos 1-2.

    1. Author Response:

      Reviewer #1 (Public Review):

      In their manuscript entitled "PBN-PVT projection modulates negative emotions in mice", Zhu et al. combine circuit mapping techniques with behavioral manipulations to interrogate the function of anatomical projections from the parabrachial nucleus (PBN) to the paraventricular nucleus of the thalamus (PVT). The study addresses an important scientific question, since the PVT and particularly the posterior PVT is known to be mostly sensitive to aversive signals, but the neural circuit mechanisms underlying this process remain unknown. Here the authors contribute important evidence that PBN inputs to the PVT may be critical for this process. Specifically, the authors identify that the PVT receives glutamatergic projections from the PBN that promote aversive behavioral responses but do not modulate nociception. The latter finding is intriguing considering that the PBN is an important node in pain processing and that the PVT has recently emerged as a modulator of pain. Overall, the study includes an impressive array of techniques and manipulations and offers insight to an important scientific question. The authors' conclusions will be significantly strengthened by the inclusion of some additional experiments and controls.

      It is in my view problematic that the authors used different genetic strategies to target the PBN-PVT pathway. For example, in Figure 1 the authors used Vglut2-cre mice for the anterograde tracings but later on in the same figure used constitutively expressed ChR2 in the PBN to assess functional connectivity with the PVT using ex-vivo patch-clamp electrophysiology. In Figure 2 the authors once again employed Vglut2-Cre mice to target PBN projections to the PVT and manipulate these projections optogenetically during behavioral tests. However, in the following figure (Fig. 3) the authors then use a retro-Cre approach and chemogenetics. The interchangeable use of these different manipulations is not warranted by data presented by the authors. For example it is unclear whether all PBN neurons projecting to the PVT are glutamatergic and express VGLUT2. When using the constitutively expensed ChR2 in the PBN to demonstrate glutamatergic projections to the PVT, the authors may be faced by potential contamination from adjacent brain stem structures like the LC and DRN, which project to the PVT and are known to contain glutamatergic neurons (vglut1 and vglut3, respectively). Another example, for figure 4 why did the authors not use Vglut2-cre mice and inhibited PBN terminals in the PVT as in Figure 2?

      We agree with the reviewer. Now we have reframed this manuscript. We first presented the slice recording results from wild-type mice (Figure 1). We recorded both the EPSCs and IPSCs. We found that light-induced EPSCs in 34 of 52 neurons and light-induced IPSCs in 4 of 52 neurons. Please see Page 5 Line 119 to Line 121. We carefully examined the ChR2 virus infection area. Please see the following Fig R1 showcase. We found that there were dense ChR2-mCherry+ neurons in the PBN. We also observed ChR2-mCherry+ neurons in the nearby ventrolateral periaqueductal gray (VLPAG), locus coeruleus (LC), cuneiform nucleus (CnF), and laterodorsal tegmental nucleus (LDTg). And the dorsal raphe nucleus (DR) was not infected. We agreed with the reviewer that there could be potential contamination from the LC, which releases dopamine and norepinephrine to the PVT by LC-PVT projection. We have discussed this on Page 13 Line 375 to Line 380.

      Figure R1. AAV-hSyn-ChR2-mCherry virus infection showcase. LPBN, lateral parabrachial nucleus. MPBN, medial parabrachial nucleus; VLPAG: ventrolateral periaqueductal gray; LC, locus coeruleus; CnF, cuneiform nucleus; LDTg, laterodorsal tegmental nucleus; DR, dorsal raphe nucleus; scp, superior cerebellar peduncle, scale bar: 200 μm.

      We performed tdTomato staining with VgluT2 mRNA in situ hybridization and found that about 94.4% of tdTomato+ neurons express VgluT2 mRNA. These results indicate that the majority of PVT-projecting PBN neurons are glutamatergic. These new results have been included in Figure 1R−U.

      Then we used VgluT2-ires-Cre mice to perform tracing (Figure1−figure supplement 2) and behavioral tests (optogenetic activation in Figure 2, optogenetic inhibition in Figure 4). We also performed the pharmacogenetic activation of PVT-projecting PBN neurons on wild-type mice (Figure 3). We observed that pharmacogenetic activation of the PVT-projecting PBN neurons reduced the center duration in the OFT, similar to the optogenetic activation OFT result. We also observed that pharmacogenetic activation of the PVT-projecting PBN neurons induced freezing behaviors. Our pharmacogenetic activation experiment supported the hypothesis that PBN-PVT projections modulate negative affective states.

      Now we have now performed the optogenetic inhibition of the PBN-PVT projections using VgluT2-ires-Cre mice. We found that inhibition of PBN-PVT projections reduces 2-MT-induced aversion-like behaviors and footshock-induced freezing behaviors. These new results have been included in Figure 4, Figure 4−figure supplement 1 and 2, and were described in the text. Please see the text Page 9 Line 254 to Page 10 Line 274.

      Related to the previous point, in the retrograde labeling experiment (Fig. 1) it would be useful if the authors determined what fraction of retrogradely label cells are indeed VGLUT2+. For behavioral experiments employing the retro-Cre approach the authors may be manipulating a heterogenous population of PBN neurons which could be influencing their behavioral observations. In general, the authors should ensure that a similar population of PBN-PVT neurons is been assessed throughout the study.

      We have now performed tdTomato staining with VgluT2 mRNA in situ hybridization and found that approximately 94.4% of tdTomato+ neurons expressed VgluT2 mRNA. These results indicated that the majority of PVT-projecting PBN neurons are glutamatergic. These new results have been included in Figure 1R−U and were described in the text. Please see Page 5 Line 129 to Line 132.

      The authors' grouping of the behavioral data into the first vs the last four minutes of light stimulation in the OF does not seem to be properly justified an appears rather arbitrary. Also related to data analysis, the unpaired t-test analysis in the fear conditioning experiment in Figure 4J seems inappropriate. ANOVA with group comparisons is more appropriate here.

      To provide a more detailed profile of the behaviors in the OFT, we further divided the laser ON period (5−10 minutes) into five one-minute periods and analyzed the velocity, non-moving time, travel distance, center time, and jumping. We found that the velocity and non-moving time were increased, and the center time was decreased in the ChR2 mice during most periods. Furthermore, we observed that the travel distance and jumping behaviors were increased only in the first one-minute period in ChR2 mice. These new results have been included in Figure 2−figure supplement 2 and were described in the text. Please see Page 7 Line 179 to Line 189. We also discussed this on Page 14 Line 396 to Line 403.

      We now performed the optogenetic inhibition of PBN-PVT projections in footshock-induced freezing behavior on Vglut2-ires-Cre mice (Figure 4J−K). And we revised the statistics (Unpaired student's t-test) and calculated the percentage of freezing behaviors in 10 minutes, which matched the constant optogenetic inhibition. Similar changes have been made in the Figure 4−figure supplement 3K.

      Considering the persistency of the effect in the OF following optogenetic stimulation of PBN-PVT afferents, the lack of such persistent effect in the RTPA is hard to reconcile. By performing additional experiments the authors attempt to settle this discrepancy by proposing that the PBN-PVT pathway promotes aversion but does not facilitate negative associations. I find this conclusion to be problematic. If the pathway is critical for conveying aversive signals to the PVT, one expects that at the very least it would be require for the formation of associate memories involving aversive stimuli. However, the authors do not show data to this effect. Instead they show that animals decrease their acute defensive reactions to aversive stimuli (2-MT and fear conditioning), but do not show whether associative memory related to this experience (e.g. fear memory retrieval) is impacted by manipulations of the PBN-PVT pathway.

      We have now performed several experiments to examine the effects of the PBN-PVT projections on aversion formation and memory retrieval.

      We first performed a prolonged conditioned place aversion that mimics drug-induced place aversion. And we found that optogenetic activation of PBN-PVT projections did not induce aversion in the postconditioning test on Day 4. These new results have been included in Figure 2−figure supplement 2H−I and described in the text. Please see Page 7 Line 196 to Line 199.

      Then, we performed the classical auditory fear conditioning test and found that optogenetic inhibition of PBN-PVT projections during footshock in the conditioning period did not affect freezing levels in contextual test or cue test (Laser OFF trials). And inhibition of PBN-PVT projections during contextual test or cue test (Laser On trials) did not affect freezing levels either. These data suggest that PBN-PVT projections are not crucial for associative fear memory formation or retrieval. These new results have been included in Figure 4−figure supplement 2 and described in the text. Please see Page 10 Line 268 to Page Line 274. We also discussed this on Page 15 Line 430 to Page 16 Line 473.

      A similar lack of connection between aversive signals within the PVT and the PBN pathway is found in the photometry data presented in Figure 5. While importantly the authors' observation of aversive modulation of the pPVT reproduces data from other recent studies, the question here is whether the increased activity of PVT neurons is mediated by input from the PBN. The cFos experiment included in this figure attempts to draw this connection, but empirical evidence is required.

      We have now performed the dual Fos staining experiment and the optoeletrode experiment.

      In the dual Fos staining experiment, we found that there was a broad overlap between optogenetic stimulation-activated neurons (expressing the Fos protein) and footshock-activated neurons (expressing the fos mRNA) (Figure 6−figure supplement 1B−E).

      In optoelectrode experiment, there was also a broad overlap between laser-activated and footshock-activated neurons. This result was consistent with the dual Fos staining result, suggesting that PVTPBN neurons were activated by aversive stimulation. Next, we analyzed the firing rates of PVT neurons during footshock with laser sweeps and footshock without laser sweeps. We found that the footshock stimulus with laser activated 30 of 40 neurons and increased the overall firing rates of 40 neurons compared with the footshock without laser result (Figure 6I). These results indicated that activation of PBN-PVT projections could enhance PVT neuronal responses to aversive stimulation.

      These new results have been included in Figure 6, Figure 6−figure supplement 1, and described in the text. Please see Page 10 Line 295 to Page 11 Line 317. We also discussed these results on Page 15 Line 422 to Line 429.

      Reviewer #2 (Public Review):

      Zhu et al. investigated the connectivity and functional role of the projections from the parabrachial nucleus (PBN) to the paraventricular nucleus of the thalamus (PVT). Using neural tracers and in vitro electrophysiological recordings, the authors showed the existence of monosynaptic glutamatergic connections between the PBN and PVT. Further behavioral tests using optogenetic and chemogenetic approaches demonstrated that activation of the PVT-PBN circuit induces aversive and anxiety-like behaviors, whereas optogenetic inhibition of PVT-projecting PBN neurons reduces fear and aversive responses elicited by footshock or the synthetic predator odor 2MT. Next, they characterized the anatomical targets of PVT neurons that receive direct innervation from the PBN (PVTPBN). The authors also showed that PVTPBN neurons are activated by aversive stimuli and chemogenetically exciting these cells is sufficient to induce anxiety-like behaviors. While the data mostly support their conclusions, alternative interpretations and potential caveats should be addressed in the discussion.

      Strength:

      The authors used different behavioral tests that collectively support a role for PBN-PVT projections in promoting fear- and anxiety-like behaviors, but not nociceptive or depressive-like responses. They also provided insights into the temporal participation of the PBN-PVT circuit by showing that this pathway regulates the expression of affective states without contributing for the formation of fear-associated memories. Because previous studies have shown that activation of projection-defined PVT neurons is sufficient to induce the formation of aversive memories, the differences between the present study and previous findings reinforce the idea of functional heterogeneity within the PVT. The authors further explored this functional heterogeneity in PVT by using an anterograde viral construct to selectively label PVT neurons that are targeted by PBN inputs. Together, these results connect two important brain regions (i.e., PBN and PVT) that were known to be involved in fear and aversive responses, and provide new information to help the field to elucidate the complex networks that control emotional behaviors.

      Weakness:

      The authors should avoid anthropomorphizing the behavioral interpretation of the findings and generalizing their conclusions. In addition, there is a series of potential caveats that could interfere with the interpretation of the results, all of which must be discussed in the article. For example, the long protocol duration of laser stimulation, the possibility of antidromic effects following photoactivation of PBN terminals in PVT, and the existence of collateral PBN projections that could also be contributing for the observed behavioral changes. Additional clarification about the exclusive glutamatergic nature of the PBN-PVT projection should be provided and the present findings should be reconciled with prior studies showing the existence of GABAergic PBN-PVT projections.

      We agree with the reviewer. Now we have revised the text carefully to avoid using subjective terms. We showed the light-induced EPSCs and IPSCs results in Figure 1, and we performed RNAscope experiments to clarify the glutamatergic nature of the PVT-projecting PBN neurons (Figure 1 and Figure1−figure supplement 1). We also added discussion about the laser stimulation protocol, the potential possibility of antidromic effects, and collateral projections. Please see Page 14 Line 413 to Page 15 Line 418, and Page 16 Line 449 to Line 457.

      We also added several experiments to dissect the effect of manipulation of the PBN-PVT projection in fear memory acquisition and retrieval. These new results have been included in Figure 4−figure supplement 2 and described in the text. Please see Page 10 Line 268 to Line 274. We also discussed this on Page 15 Line 430 to Page 16 Line 473.

      Reviewer #3 (Public Review):

      Zhu YB et al investigated the functional role of the parabrachial nucleus (PBN) to the thalamic paraventricular nucleus (PVT) in processing negative emotions. They found that PBN send excitatory projection to PVT. The activation of PBN-PVT projection induces anxiety-like and fear-like behaviors, while inhibition of this projection relieves fear and aversion.

      Strengths:

      The authors dissected anatomic and functional connection between the PBN and the PVT by using comprehensive modern neuroscience techniques including viral tracing, electrophysiology, optogenetics and pharmacogenetics. They clearly demonstrated the significant role of PBN-PVT projection in modulating negative emotions.

      Weaknesses:

      The PBN contains a variety of neuronal subtypes that expressed distinct molecular marker such as CGRP, Tac1, Pdyn, Nts et al. The PBN also send projections to multiple targets, including VMH, PAG, BNST, CEA and ILN that could mediate distinct function. What's the neuronal identity of PVT-projecting PBN neurons, how is the PVT projection and other projections organized, are they overlapping or relative independent pathway? Those important questions were not examined in this study, which make it hard to relate this finding to other existing literature.

      We have now performed the RNAscope experiments detecting VgluT2, Tac1, Tacr1, Pdyn mRNA, and fluorescent immunostaining detecting CGRP protein in the PBN. We found that about 94.4% of tdTomato+ neurons express VgluT2 mRNA. We also found that tdTomato+ neurons were only partially co-labeled with Tacr1, Tac1, or Pdyn mRNA, but not with CGRP. These results indicate that the majority of PVT-projecting PBN neurons are glutamatergic. These new results have been included in Figure 1, Figure 1−figure supplement 1, and were described in the text. Please see Page 5 Line 129 to Line 140.

      We also provided the collateral projections from PVT-projecting neurons in Figure 1−figure supplement 3, Page 6 Line 148 to Line 151, and discussed on Page 16 Line 449 to Line 457.

    1. Author Response:

      Reviewer #2 (Public Review):

      1. The novelty of the current observation of two types of links is overstated, for example, in the abstract: "Our data reveal the existence of two molecular connectors/spacers which likely contribute to the nanometer scale precise stacking of the ROS disks" (Line 25). In fact, both of these links have been shown before (Usukura and Yamada, 1981; Roof and Heuser, 1982; Corless and Schneider, 1987; Corless et al., 1987; Kajimura et al., 2000). These previous studies deserve to be recognized. Of special note is the paper by Usukura and Yamada whose images of the disc rim connectors are by no means less convincing than shown in the current manuscript. On the other hand, the novelty and impact of the data related to peripherin appears to be understated, particularly in the abstract.

      We changed the abstract line 27 to: “Our data confirm the existence of two previously observed molecular connectors …”, cite the recommended references in the introduction (lines 54-55), the results (lines 131-132), and the discussion (lines 282/285). To highlight the previous reports, we rephrased the sentence in lines 132-133, “In agreement with these previous findings, we observed structures that connect membranes of two adjacent disks …”; the discussion is rephrased in lines 280-281, “Similar connectors have been observed previously ...” and “… and their statistical analysis confirmed the existence of two distinct connector species.”, and in lines 291-292, “Based on previous studies combined with our quantitative analysis, we put forward a hypothesis for the molecular identity of the disk rim connector which agrees in part with recent models”.

      1. Notably, ROM-1 has not been found in peripherin oligomers larger than octamers (e.g. Loewen and Molday, 2000 and subsequent studies by Naash and colleagues). This should be discussed in the context of the current model.

      We agree that this is an important aspect. We pick subvolumes along all disk rims, and on average we obtain the ordered scaffold as shown in the manuscript. We expect heterogeneity in the data because of the different degrees of oligomerization and the exclusion of ROM1 from higher oligomers. Our analysis required substantial classification to achieve convergence to a stable average, indeed indicating heterogeneity in the rim structure. However, we could not resolve additional structures to sufficient quality. It might be that this heterogeneity is what ultimately limits our achievable resolution. We added these thoughts in the discussion starting in lines 377-378, “PRPH2-Rom1 oligomers isolated from native sources exhibit varying degrees of polymerization (Loewen and Molday, 2000), and ROM1 is excluded from larger oligomers (Milstein et al., 2020). We could not resolve this heterogeneity as additional structures to sufficient quality by subvolume averaging, but in combination with the inherent flexibility of the disk rim, this heterogeneity might be the reason for the restricted resolution of our averages.”

      1. The following statement should be reconsidered given the established role of cysteine-150 in peripherin oligomerization: "We hypothesize that the necessary cysteine residues are located in the head domain of the tetramers (Figure 5B), ..." It has been firmly established that only one cysteine (C150) located in the intradiscal loop is not engaged in intramolecular interactions and is essential for peripherin oligomerization.

      Thank you for this advice. We agree and rephrased our discussion in lines 368-371, “The intermolecular disulfide brides are exclusively formed by the PRPH2-C150 and ROM1-C153 cysteine residues, which are located in the luminal domain (Zulliger et al., 2018). We hypothesize that these disulfide bonds (Figure 5B), are responsible for the contacts across rows (Figure 3) ...”

      1. Line 340: "A model involving V-shaped tetramers for membrane curvature formation was proposed recently (Milstein et al., 2020), but it comprises two rows of tetramers which are linked in a head-tohead manner. Our analysis instead resolves three rows organized side-by side in situ (Figure 5A)." I am confused by this statement: doesn't your model also show long rows connected head-to-head? The real difference is that Milstein and colleagues proposed four tetramers per rim whereas the current data reveal three.

      Thank you for pointing out this imprecise description. The model proposed by Milstein and the model in the old version of our manuscript, both propose linkage between tetramers via their disk luminal domains. In our manuscript, we refer to the luminal domain as the head domain. However, to our understanding, the Milstein model suggests two rows of tetramers, where one tetramer in the first row is rotated 180° with respect to a tetramer in the second row (therefore head-to-head), while our data indicate that the V-shaped repeats which we originally hypothesized to be tetramers are only rotated ~63° with respect to one another and are therefore rather oriented side-by-side:

      Fig. 2: Comparison of models for the organization of the ROS disk rim as proposed in in Milstein et al., 2020 (top panel)

      and in our work (lower panel). We now rephrased lines 383-385, “Instead, our analysis in situ resolves three rows of repeats which are also linked by the luminal domain but are rather organized side-by-side (Figure 5A).”

      1. Line 347: "Our data indicate that the luminal domains of tetramers hold the disk rim scaffold together (Figure 3C), which is supported by the fact that most pathological mutations of PRPH2 affect its luminal domain (Boon et al., 2008; Goldberg et al., 2001). It is possible that these mutations impair the formation of tetramers, rows of tetramers, and their disulfide bond-stabilized oligomerization. These alterations could impede or completely prevent disk morphogenesis which, in turn, would disrupt the structural integrity of ROS, compromise the viability of the retina and ultimately lead to blindness." This is not an original idea, as many studies showed that disruptions in peripherin oligomerization lead to anatomical defects in disc formation and subsequent photoreceptor cell death.

      Thank you for pointing this out. Our data are indeed in good agreement with the results made by many groups and further expand on them. We rephrased the manuscript in several places to clarify this relationship: in the abstract lines 32-34, “Our Cryo-ET data provide novel quantitative and structural information on the molecular architecture in ROS and substantiate previous results on proposed mechanisms underlying pathologies of certain PRPH2 mutations leading to blindness.”; in the introduction lines 78-79, “… allowed us to obtain 3D molecular-resolution images of vitrified ROS in a close-to-native state providing further evidence for previously suggested mechanisms leading to ROS dysfunction”; and in the discussion lines 393-397, “In good agreement with previous work, it is possible that these mutations impair the formation of complexes, and their disulfide bond-stabilized oligomerization (Chang et al., 2002; Conley et al., 2019; Zulliger et al., 2018). Hence, these alterations could impede or completely prevent disk morphogenesis …”. Also, additional relevant publications are cited in line 395.

      1. In regards to the distance between disc rims and plasma membrane, the authors cite the data obtained with frogs (10 nm) but not a more relevant, previously reported measurement in mice (Gilliam et al, 2012). The value of 18 nm reported in that study is much closer to the currently reported value.

      We appreciate the reference to this excellent paper. We added it in lines 335-337, “This value was derived from amphibians (Roof and Heuser, 1982) and deviates considerably from recent results (18 nm, (Gilliam et al., 2012)) and from our current measurements in mice (~25 nm).” Our aim was to point out that a model for ROS organization that is often cited and is otherwise well-founded (BatraSafferling et al., 2006) makes a wrong assumption about distance in the context of the mammalian systems. 7. The authors are (correctly) being very careful in assigning the molecular identity of disc interior connectors to PDE6. However, they are more confident in assigning the disc rim connectors to GARP2, which is reflected in the labeling of these links in figure

      1. Their arguments are valid, but these links are not attached to peripherin (a protein considered to be the membrane binding partner for GARPs), which is not immediately consistent with this hypothesis. Perhaps it would be fair to re-label the corresponding links in figure 5 as "disc rim connectors".

      That is an excellent and fair suggestion. We changed Figure 5 accordingly.

      1. On a similar note, the disc rim connectors seem to be located where ABCA4 is presumed to be localized within the rim, which may not be just a coincidence. The authors already have tomograms obtained from ABCA4 knockout animals. Is it possible to analyze whether these links are preserved in these tomograms?

      We agree, this is an important question to address. Unfortunately, neither the biological preparation nor the tomograms of the ABCA4 knockout were as good in quality as for the WT. Still, we frequently see connectors at the disk rim, especially after denoising of the tomograms.

      Fig. 3: connectors at disk rims in WT (left) and ABCA4 knockout mice (right).

      Sometimes it appears the connectors between adjacent disks are linked via an intradisk densities, which was already observed in Corless et al., 1987. We thought that these densities could be ABCA4 and tried to find them with two approaches in our WT tomograms (data not shown). In the first approach using a segmentation similar to what we did for the connectors between disks, we found an order of magnitude fewer intradisk connectors than (inter)disk rim connectors. In the second approach, we used the positions of segmented (inter)disk rim connectors and classified rotational averages which focused on the disk luminal space next to the contact point of a connector with the disk membrane. Again, less than 10% of the disk rim connector subvolumes were assigned to classes with an additional luminal density. Both experiments indicate that disk rim connectors sometimes occur with an additional luminal density. In total, we found less than 100 of these intradisk densities, an observation which seems to be preserved in WT and ABCA4 KO. Based on this small number of positions/locations, however, we cannot draw any conclusion. Therefore, we did not add this point to the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The introduction felt a bit short. I was hoping early on I think for a hint at what biotic and abiotic factors UV could be important for and how this might be important for adaptation. A bit more on previous work on the genetics of UV pigmentation could be added too. I think a bit more on sunflowers more generally (what petiolaris is, where natural pops are distributed, etc.) would be helpful. This seems more relevant than its status as an emoji, for example.

      We had opted to provide some of the relevant background in the corresponding sections of the manuscript, but agree that it would be beneficial to expand the introduction. In the revised version of the manuscript, we have modified the introduction and the first section of Results and Discussion to include more information about wild sunflowers, possible adaptive functions of floral UV patterns, and previous work on the genetic basis of floral UV patterning. More generally, we have strived to provide more background information throughout the manuscript.

      The authors present the % of Vp explained by the Chr15 SNP. Perhaps I missed it, but it might be nice to also present the narrow sense heritability and how much of Va is explained.

      Narrow sense heritability for LUVp is extremely high in our H. annuus GWAS population; four different software [EMMAX (Kang et al., Nat Genet 2010), GEMMA (Zhou and Stephens, Nat Genet. 2012), GCTA (Yang et al., Am J Hum Genet 2011) and BOLT_LMM (Loh et al., Nat Genet 2015)] provided h2 estimates of ~1. While it is possible that these estimates are somewhat inflated by the presence of a single locus of extremely large effect, all individuals in this populations were grown at the same time under the same conditions, and limited environmental effects would therefore be expected. The percentage of additive variance explained by HaMYB111 appears therefore to be equal to the percentage of phenotypic variance (~62%).

      We have included details in the Methods section – Genome-wide association mapping, and added this information to the relevant section of the main text:

      “The chromosome 15 SNP with the strongest association with ligule UV pigmentation patterns in H. annuus (henceforth “Chr15_LUVp SNP”) explained 62% of the observed phenotypic and additive variation (narrow-sense heritability for LUVp in this dataset is ~1).”

      A few lines of discussion about why the Chr15 allele might be observed at only low frequencies in petiolaris I think would be of interest - the authors appear to argue that the same abiotic factors may be at play in petiolaris, so why don't we see this allele at frequencies higher than 2%? Is it recent? Geographically localized?

      That is a very interesting observation, and we currently do not have enough data to provide a definitive answer to why that is. From GWAS, HaMYB111 does not seem to play a measurable role in controlling variation for LUVp in H. petiolaris; Even when we repeat the GWAS with MAF > 1%, so that the Chr15_LUVp SNP would be included in the analysis, there is no significant association between that SNP and LUVp (the significant association on chr. 15 seen in the Manhattan plot for H. petiolaris is ~20 Mbp downstream of HaMYB111). The rarity of the L allele in H. petiolaris could complicate detection of a GWAS signal; on the other hand, the few H. petiolaris individuals carrying the L allele have, on average, only marginally larger LUVp than the rest of the population (LL = 0.32 allele).

      The two most likely explanations for the low frequencies of the L allele in H. petiolaris are differences in alleles, or their effect, between H. annuus and H. petiolaris; or, as suggested by the reviewer, a recent introgression. In H. annuus, the Chr15_LUVp SNP is likely not the actual causal polymorphism affecting HaMYB111 activity, but is only in LD with it (or them); this association might be absent in H. petiolaris alleles. An alternative possibility is that downstream differences in the genetic network regulating flavonol glycosides biosynthesis mask the effect of different HaMYB111 alleles.

      H. annuus and H. petiolaris hybridize frequently across their range, so this could be a recent introgression that has not established itself; alternatively, physiological differences in H. petiolaris could make the L allele less advantageous, so the introgressed allele is simply being maintained by drift (or recurring hybridization). Further analysis of genetic and functional diversity at HaMYB111 in H. petiolaris will be required to differentiate between these possibilities.

      We have added a few sentences highlighting some of these possible explanations at the end the main text of the manuscript, which now reads:

      “Despite a more limited range of variation for LUVp, a similar trend (larger UV patterns in drier, colder environments) is present also in H. petiolaris (Figure 4 – figure supplement 4). Interestingly, while the L allele at Chr_15 LUVp SNP is present in H. petiolaris (Figure 1 – figure supplement 2), it is found only at a very low frequency, and does not seem to significantly affect floral UV patterns in this species (Figure 2a). This could represent a recent introgression, since H. annuus and H. petiolaris are known to hybridize in nature (Heiser, 1947, Yatabe et al., 2007). Alternatively, the Chr_15 LUVp SNP might not be associated with functional differences in HaMYB111 in H. petiolaris, or differences in genetic networks or physiology between H. annuus and H. petiolaris could mask the effect of this allele, or limit its adaptive advantage, in the latter species.“

      Page 14: It's unclear to me why there is any need to discretize the LUVp values for the analyses presented here. Seems like it makes sense to either 1) analyze by genotype of plant at the Chr15 SNP, if known, or 2) treat it as a continuous variable and analyze accordingly.

      We designed our experiment to be a comparison between three well-defined phenotypic classes, to reduce the experimental noise inherent to pollinator visitation trials. As a consequence, intermediate phenotypic classes (0.3 < LUVp < 0.5 and 0.8 < LUVp < 0.95) are not represented in the experiment, and therefore we believe that analyzing LUVp as a continuous variable would be less appropriate in this case. In the revised manuscript, we have provided a modified Figure 4 – figure supplement 1 in which individual data points are show (colour-coded by pollinator type), as well as a fitted lines showing the general trend across the data.

      The individuals in pollinator visitation experiments were not genotyped for the Chr15_LUVp SNP; while having that information might provide a more direct link between HaMYB111 and pollinator visitation rates, our main interest in this experiment was to test the possible adaptive effects of variation in floral UV pigmentation.

      Page 14: I'm not sure you can infer selection from the % of plants grown in the experiment unless the experiment was a true random sample from a larger metapopulation that is homogenous for pollinator preference. In addition, I thought one of the Ashman papers had actually argued for intermediate level UV abundance in the presence of UV?

      We have removed mentions of selection from the sentence - while the 110 populations included in our 2019 common garden experiment were selected to represent the whole range of H. annuus, we agree that the pattern we observe is at best suggestive. We have, however, kept a modified version of the sentence in the revised version of the manuscript, since we believe that is an interesting observation. The sentence now reads:

      “Pollination rates are known to be yield-limiting in sunflower (Greenleaf and Kremen, 2006), and a strong reduction in pollination could therefore have a negative effect on fitness; consistent with this plants with very small LUVp values were rare (~1.5% of individuals) in our common garden experiment, which was designed to provide a balanced representation of the natural range of H. annuus.”. (new lines 373-378)

      It is correct that Koski et al., Nature Plants 2015 found intermediate UV patterns to increase pollen viability in excised flowers of Argentina anserina exposed to artificial UV radiation. However, the authors also remark that larger UV patterns would probably be favoured in natural environments, in which UV radiation would be more than two times higher than in their experimental setting. Additionally, when using artificial flowers, they found that pollen viability increased linearly with the size of floral UV pattern.

      More generally, as we discuss later on in the manuscript, the pollen protection mechanism proposed in Koski et al., Nature Plants 2015 is unlikely to be as important in sunflower inflorescences, which are much flatter than the bowl- shaped flowers of A. anserina; consistent with this, and contrary to what was observed for A. anserina, we found no correlation between UV radiation and floral UV patterns in wild sunflowers (Figure 4c).

      I would reduce or remove the text around L316-321. If there's good a priori reason to believe flower heat isn't a big deal (L. 323) and the experimental data back that up, why add 5 lines talking up the hypothesis?

      We had fairly strong reasons to believe temperature might play an important role in floral UV pattern diversity: a link between flower temperature and UV patterns has been proposed before (Koski et al., Current Biol 2020); a very strong correlation exists between temperature and LUVp in our dataset; and, perhaps more importantly, inflorescence temperature is known to have a major effect on pollinator attraction (Atamian et al., Science 2016; Creux et al., New Phytol 2021). While it is known that UV radiation is not particularly energetic, we didn’t mean line 323 to imply that we were sure a priori that there wouldn’t be any effect of UV patterns of inflorescence temperature.

      In the revised manuscript, we have re-organized that section and provided the information reported in line 323 (UV radiation accounts for only 3-7% of the total radiation at earth level) before the experimental results, to clarify what our thought process was in designing those experiments. The paragraph now reads:

      “By absorbing more radiation, larger UV bullseyes could therefore contribute to increasing temperature of the sunflower inflorescences, and their attractiveness to pollinators, in cold climates. However, UV wavelengths represents only a small fraction (3-7%) of the solar radiation reaching the Earth surface (compared to >50% for visible wavelengths), and might therefore not provide sufficient energy to significantly warm up the ligules (Nunez et al., 1994). In line with this observation, different levels of UV pigmentation had no effect on the temperature of inflorescences or individual ligules exposed to sunlight (Figure 4e-g; Figure 4 – figure supplement 3).”

      Page 17: The discussion of flower size is interesting. Is there any phenotypic or genetic correlation between LUVP and flower size?

      This is a really interesting question! There is no obvious genetic correlation between LUVp and flower size – in GWAS, HaMYB111 is not associated to any of the floral characteristics we measured (flowerhead diameter; disk diameter; ligule length; ligule width; relative ligule size; see Todesco et al., Nature 2020). There is also no significant association between ligule length and LUVp (R^2 = 0.0024, P = 0.1282), and only a very weak positive association between inflorescence size and LUVp (R^2 = 0.0243, P = 0.00013; see attached figure). There is, however, a stronger positive correlation between LUVp and disk size (the disk being the central part of the sunflower inflorescence, composed of the fertile florets; R^2 = 0.1478. P = 2.78 × 10-21), and as a consequence a negative correlation between LUVp and relative ligule size (that is, the length of the ligule relative to the diameter of the whole inflorescence; R^2 = 0.1216, P = 1.46 × 10-17). This means that, given an inflorescence of the same size, plants with large LUVp values will tend to have smaller ligules and larger discs. Since the disk of sunflower inflorescences is uniformly UV- absorbing, this would further increase the size of UV-absorbing region in these inflorescences.

      While it is tempting to speculate that this might be connected with regulation of transpiration (meaning that plants with larger LUVp further reduce transpiration from ligules by having smaller ligules - relative ligule size is also positively correlated with summer humidity; R^2 = 0.2536, P = 2.86 × 10_-5), there are many other fitness-related factors that could determine inflorescence size, and disk size in particular (seed size, florets/seed number...). Additionally, in common garden experiments, flowerhead size (and plant size in general) is affected by flowering time, which is also one of the reason why we use LUVp to measure floral UV patterns instead of absolute measurements of bullseye size; in a previous work from our group in Helianthus argophyllus, size measurements for inflorescence and UV bullseye mapped to the same locus as flowering time, while genetic regulation of LUVp was independent of flowering time (Moyers et al., Ann Bot 2017). Flowering time in H. annuus is known to be strongly affected by photoperiod (Blackman et al., Mol Ecol 2011), meaning that the flowering time we measured in Vancouver might not reflect the exact flowering time in the populations of origin of those plants – with consequences on inflorescence size.

      In summary, there is an interesting pattern of concordance between floral UV pattern and some aspects of inflorescence morphology, but we think it would be premature to draw any inference from them. Measurements of inflorescence parameters in natural populations would be much more informative in this respect.

      Reviewer #2 (Public Review):

      The genetic analysis is rigorously conducted with multiple Helianthus species and accessions of H. annuus. The same QTL was inputed in two Helianthus species, and fine mapped to promotor regions of HaMyb111.

      While there is a significant association at the beginning of chr. 15 in the GWAS for H. petiolaris petiolaris, we should clarify that that peak is unfortunately ~20 Mbp away from HaMYB111. While it is not impossible that the difference is due to reference biases in mapping H. petiolaris reads to the cultivated H. annuus genome, the most conservative explanation is that those two QTL are unrelated. We have clarified this in the legend to Fig. 2 in the revised manuscript.

      The allelic variation of the TF was carefully mapped in many populations and accessions. Flavonol glycosides were found to correlate spatially and developmentally in ligules and correlate with Myb111 transcript abundances, and a downstream flavonoid biosynthetic gene. Heterologous expression in Arabidopsis in Atmyb12 mutants, showed that HaMyb111 to be able to regulate flavonol glycoside accumulations, albeit with different molecules than those that accumulate in Helianthus. Several lines of evidence are consistent with transcriptional regulation of myb111 accounting for the variation in bullseye size.

      Functional analysis examined three possible functional roles, in pollinator attraction, thermal regulation of flowers, and water loss in excised flowers (ligules?), providing support for the first and last, but not the second possible functions, confirming the results of previous studies on the pollinator attraction and water loss functions for flavonol glycosides. The thermal imaging work of dawn exposed flower heads provided an elegant falsification of the temperature regulation hypothesis. Biogeographic clines in bullseye size correlated with temperature and humidity clines, providing a confirmation of the hypothesis posed by Koski and Ashmann about the patterns being consistent with Gloger's rule, and historical trends from herbaria collections over climate change and ozone depletion scenarios. The work hence represents a major advance from Moyers et al. 2017's genetic analysis of bullseyes in sunflowers, and confirms the role established in Petunia for this Myb TF for flavonoid glycoside accumulations, in a new tissue, the ligule.

      Thank you. We have specified in the legend of Fig. 4i of the revised manuscript that desiccation was measured in individual detached ligules, and added further details about the experiment in the Methods section.

      While there is a correlation between pigmentation and temperature/humidity in our dataset, it goes in the opposite direction to what would be expected under Gloger’s rule – that is, we see stronger pigmentation in drier/colder environments, contrary to what is generally observed in animals. This is also contrary to what observed in Koski and Ashman, Nature Plants 2015, where the authors found that floral UV pigmentation increased at lower latitudes and higher levels of UV radiation. While possibly rarer, such “anti-Gloger” patterns have been observed in plants before (Lev-Yadun, Plant Signal Behav 2016).

      Weakness: The authors were not able to confirm their inferences about myb111 function through direct manipulations of the locus in sunflower.

      That is unfortunately correct. Reliable and efficient transformation of cultivated sunflower (much less of wild sunflower species) has eluded the sunflower community (including our laboratories) so far – see for example discussion on the topic in Lewi et al. Agrobacterium protocols 2016, and Sujatha et al. PCTOC 2012. We had therefore to rely on heterologous complementation in Arabidopsis; while this approach has limitations, we believe that its results, given also the similarity in expression patterns between HaMYB111 and AtMYB111, and in combination with the other experiments reported in our manuscript, make a convincing case that HaMYB111 regulates flavonol glycosides accumulation in sunflower ligules.

      Given that that the flavonol glycosides that accumulate in Helianthus are different from those regulated when the gene is heterologously expressed in Arabidopsis, the biochemical function of Hamyb111, while quite reasonable, is not completely watertight. The flavonol glycosides are not fully characterized (only Ms/Ms data are provided) and named only with cryptic abbreviations in the main figures.

      We believe that the fact that expression of HaMYB111 in the Arabidopsis myb111 mutant reproduces the very same pattern of flavonol glycosides accumulation found in wild type Col-0 is proof that its biochemical function is the same as that of the endogenous AtMYB111 gene – that is, HaMYB111 induces expression of the same genes involved in flavonol glycosides biosynthesis in Arabidopsis. Differences in function between HaMYB11 and AtMYB111 would have resulted in different flavonol profiles between wild type Col-0 and 35S::HaMYB111 myb111 lines. It should be noted that the known direct targets of AtMYB111 in Arabidopsis are genes involved in the production of the basic flavonol aglycone (Strake et al., Plant J 2007). Differences in flavonol glycoside profiles between the two species are likely due to broader differences between the genetic networks regulating flavonol biosynthesis: additional layers of regulation of the genes targeted by MYB111, or differential regulation (or presence/absence variation) of genes controlling downstream flavonol glycosylation and conversion between different flavonols.

      In the revised manuscript, we have added the full names of all identified peaks to the legend of Figures 3a,b,e.

      This and the differences in metabolite accumulations between Arabidopsis and Helianthus becomes a bit problematic for the functional interpretations. And here the authors may want to re-read Gronquist et al. 2002: PNAS as a cautionary tale about inferring function from the spatial location of metabolites. In this study, the Eisner/Meinwald team discovered that imbedded in the UV-absorbing floral nectar guides amongst the expected array of flavonoid glycosides, were isoprenilated phloroglucinols, which have both UV-absorbing and herbivore defensive properties. Hence the authors may want to re-examine some of the other unidentified metabolites in the tissues of the bullseyes, including the caffeoyl quinic acids, for alternative functional hypotheses for their observed variation in bullseye size (eg. herbivore defense of ligules).

      This is a good point, and we have included a mention of a more explicit mention possible role of caffeoyl quinic acid (CQA) as a UV pigment in the main text, as well as highlighted at the end of the manuscript other possible factors that could contribute to variation for floral UV patterns in wild sunflowers.

      We should note, however, that CQA plays a considerably smaller role than flavonols in explaining UV absorbance in UV-absorbing (parts of) sunflower ligules, and the difference in abundance with respect to UV-reflecting (parts of) ligules is much less obvious than for flavonols (height of the absorbance peak is reduced only 2-3 times in UV- reflecting tissues for CQA, vs. 7-70 fold reductions for individual quercetin glycosides). Therefore, flavonols are clearly the main pigment responsible for UV patterning in ligules. This is in contrast with the situation for Hypericum calycinum reported in Gronquist et al., PNAS 2002, were dearomatized isoprenylated phloroglucinols (DIPs) are much more abundant than flavonols in most floral tissue, including petals. The localization of DIPs accumulation, in reproductive organs and on the abaxial (“lower”) side of the petals (so that they would be exposed when the flower is closed), is also more consistent with a role in prevention of herbivory; no UV pigmentation is found on the adaxial (“upper”) part of petals in this species, which would be consistent with a role in pollinator attraction.

      The hypotheses regarding a role for the flavonoid glycosides regulated by Myb111 expression in transpirational mitigation and hence conferring a selective advantage under high temperatures and low and high humidities, are not strongly supported by the data provided. The water loss data from excised flowers (or ligules-can't tell from the methods descriptions) is not equivalent to measures of transpiration rates (the stomatal controlled release of water), which are better performed with intact flowers by porometry or other forms of gas-exchange measures. Excised tissues tend to have uncontrolled stomatal function, and elevated cuticular water loss at damaged sites. The putative fitness benefits of variable bullseye size under different humidity regimes, proposed to explain the observed geographical clines in bullseye size remain untested.

      We have clarified in the text and methods section that the desiccation experiments were performed on detached ligules. We agree that the results of this experiments do not constitute a direct proof that UV patterns/flavonol levels have an impact on plant fitness under different humidities in the wild – our aim was simply to provide a plausible physiological explanation for the correlation we observe between floral UV patterns and relative humidity. However, we do believe they are strongly suggestive of a role for floral flavonol/UV patterns in regulating transpiration, which is consistent with previous observations that flowers are a major source of transpiration in plants (Galen et al., Am Nat 2000, and other references in the manuscript). As suggested also by other reviewers, we have softened our interpretation of these result to clarify that they are suggestive, but not proof, of a connection between floral UV patterns, ligule transpiration and environmental humidity levels.

      “While desiccation rates are only a proxy for transpiration in field conditions (Duursma et al. 2019, Hygen et al. 1951), and other factors might affect ligule transpiration in this set of lines, this evidence (strong correlation between LUVp and summer relative humidity; known role of flavonol glycosides in regulating transpiration; and correlation between extent of ligule UV pigmentation and desiccation rates) suggests that variation in floral UV pigmentation in sunflowers is driven by the role of flavonol glycosides in reducing water loss from ligules, with larger floral UV patterns helping prevent drought stress in drier environments.” (new lines 462-469)

      Detached ligules were chosen to avoid confounding the results should differences in the physiology of the rest of the inflorescence/plant between lines also affect rates of water loss. Desiccation/water loss measurements were performed for consistency with the experiments reported in Nakabayashi et al Plant J. 2014, in which the effects of flavonol accumulation (through overexpression of AtMYB12) on water loss/drought resistance were first reported. It should also be noted that the use of detached organs to study the effect of desiccation on transpiration, water loss and drought responses is common in literature (see for example Hygen, Physiol Plant 1951; Aguilar et al., J Exp Bot 2000; Chen et al., PNAS 2011; Egea et al., Sci Rep 2018; Duursma et al., New Phytol 2019, among others). While removing the ligules create a more stressful/artificial situation, mechanical factors are likely to affect all ligules and leaves in the same way, and we can see no obvious reason why that would affect the small LUVp group more than the large LUVp group (individuals in the two groups were selected to represent several geographically unrelated populations).

      We have included some of the aforementioned references to the main text and Methods sections in the revised manuscript to support our use of this experimental setup.

      Alternative functional hypotheses for the observed variation in bullseye size in herbivore resistance or floral volatile release could also be mentioned in the Discussion. Are the large ligules involved in floral scent release?

      We have added sentences in the Results and Discussion, and Conclusions section in the revised manuscript to explore possible additional factors that could influence patterns of UV pigmentation across sunflower populations, including resistance to herbivory and floral volatiles. While some work has been done to characterize floral volatiles in sunflower (e.g. Etievant et al. J. Agric. Food Chem; Pham-Delegue et al. J. Chem. Ecol. 1989), to our knowledge the role of ligules in their production has not been investigates.

      In the revised manuscript, the section “A dual role for floral UV pigmentation” now includes the sentences:

      “Although pollinator preferences in this experiment could still affected by other unmeasured factors (nectar content, floral volatiles), these results are consistent with previous results showing that floral UV patterns play a major role in pollinator attraction (Horth et al., 2014, Koski ad Ashman, 2014, Rae and Vamosi, 2013, Sheehan et al., 2016).” (new lines 378-381)

      And the Conclusions sections includes the sentence:

      “It should be noted that, while we have examined some of the most likely factors explaining the distribution of variation for floral UV patterns in wild H. annuus across North America, other abiotic factors could play a role, as well as biotic ones (e.g. the aforementioned differences in pollinator assemblages, or a role of UV pigments in protection from herbivory (Gronquist et al., 2001)).” (new lines 540-544)

      Reviewer #3 (Public Review):

      Todesco et al undertake an ambitious study to understand UV-absorbing variation in sunflower inflorescences, which often, but not always display a "bullseye" pattern of UV-absorbance generated by ligules of the ray flowers. [...] I think this manuscript has high potential impact on science on both of these fronts.

      Thank you! We are aware that our experiments do not provide a direct link between UV patterns and fitness in natural populations (although we think they are strongly suggestive) and that, as pointed out also by other reviewers, there are other possible (unmeasured) factors that could explain or contribute to explain the patterns we observed. In the revised manuscript we have better characterized the aims and interpretation of our desiccation experiment, and modified the main text to acknowledge other possible factors affecting pollination preferences (nectar production, floral volatiles) and variation for floral UV patterns in H. annuus (pollinator assemblages, resistance to herbivory).

    1. Author Response:

      Reviewer #1 (Public Review):

      Kunze et al. provide an interesting experiment aimed to understand the effects of variable temperature regimes in host-pathogen interactions. This is one of the most complete experiments to date, that goes beyond exploring increasing but constant temperature regimes. The experimental setup is strong, exposing Daphnia magna to the natural range of temperature variability and realistic fluctuating (+-3C) and extreme (6C pulse) regimes. Daphnia exposure to Odospora colligata pathogens was also rightly tested against a placebo control. Aided by their experimental approach Kunze et al. explore their results with clear figures and fine text, getting deep into our understanding of the thermal performance of important host and pathogen life history traits (such as reproductive output) and setting them in the larger picture of global warming. In short, I am impressed by the quality of the new information provided by this ms.

      Thank you for these positive comments on our manuscript.

      Reviewer #2 (Public Review):

      The manuscript of Kunze et al. aimed at finding how different kinds of fluctuations in temperature affect the disease outcome. The authors used Daphnia magna - Ordospora colligate host - parasite system exposed to a range of temperatures which were either stable, regularly fluctuating, or included a single heat wave, and measured fitness of the host (as reproductive output) and the parasite (infection rate and spore burden). The experiment is very well designed, and the methods of data analysis are sound and well suited to address the questions stated by the authors. The authors found that the unstable thermal conditions change the fitness of the host and the parasite. Temperature fluctuations narrowed thermal breadth for infection and spore burden of the parasite, whereas the heat wave caused shift in thermal optimum and a strong increase of maximal spore burden of the parasite. Both thermal variation treatments resulted in shifts in thermal optimum and maximal performance of the host. The most interesting (and surprising) result was the spectacular increase in spore burden of the parasite exposed to heat wave in comparison to fluctuating temperature treatment and stable temperature treatment, obtained in 16{degree sign}C. Authors rightfully conclude that the outcome of infection could be strongly altered by variations in thermal regime. This context dependency might to some extent explain the limited accuracy of disease spread models. This is critical especially in the face of climate change, which is expected to result in more frequent and more rapid thermal variation events. Moreover, the narrowed thermal performance curve of the parasite (especially in the high temperatures range) under fluctuating temperature regime indicates, that the thermal tolerance of some organisms to warming might be overestimated, when tested under (less realistic) stable thermal conditions. I think the paper of Kunze et al. is a very strong contribution to the field of disease ecology, and I find no major weaknesses. The Introduction and Discussion sections are well written and provide some extensive overview of the relevant literature. The study design and results are described clearly and the conclusions are well supported. I have no major criticism to this manuscript.

      We thank the reviewer for these positive comments on our manuscript.

    1. Author Response:

      Reviewer #3 (Public Review):

      The Schepartz lab have previously shown that the binding of growth factors results in the formation of two distinct coiled coil dimers within the juxtamembrane (JM) segment. These two isomeric coiled coil structures are also allosterically preferred by point mutations within transmembrane (TM) helix. In this manuscript, authors demonstrate that the JM coiled coil is a binary switch, governing the trafficking status of EGFR, either towards degradative or recycling pathway.

      They design novel variants of EGFR (E661R and KRAA) that mimic the two distinct coiled coil types, EGF-type and TGF-α-type. These variants are further validated using bipartite tetracysteine- ReAsH system. In order to assess the trafficking of these variants, authors use confocal imaging to measure colocalization with respective organelle markers. In addition, authors also use variants with point mutations at TM segment that controls the JM coiled coil state to demonstrate that the trafficking is dependent on JM segment and not growth factor identity. EGFR signaling is of prime importance in cancer biology and trafficking plays a major role, where the degradative pathway decreases the signaling, in contrast to recycling pathway that sustains the signaling. The authors clearly demonstrate this switch in EGFR lifetime using relevant variants and show how well-known tyrosine kinase inhibitors regulate this in a drug resistant non-small cell lung cancer model.

      The model proposed by the authors is mostly well supported by data, but few points require clarification.

      i) The authors need to address why the switch is incomplete when JM mutants are used but appears complete with TM mutants. A) Does this mean recycling requires other criteria in addition to JM segment? B) Is it possible that TM mutants cause other changes in addition to controlling JM segment? C) Would it be better if organelle transmembrane markers were used (Tf, Lamp1, NPC1 etc.).

      The revised manuscript now includes a discussion of why the localization switch is less complete for the JM mutants than for the TM mutants. Whether these differences mean that the direction of trafficking requires direct interactions with the JM segment, or alternatively that the TM mutants cause other relevant changes in EGFR is currently under investigation.

      ii) It would be helpful to represent data as a distribution or scatter points instead of bar plot. Did authors observe any expression level dependence on their colocalization and lifetime assays?

      Figures 2 and 3 have been changed to illustrate both bars and individual points. We did not evaluate the effect of expression level on the extent of colocalization or EGFR lifetime.

      iii) Did authors investigate the lifetime of JM variants? Like it was shown with TM variants in Fig 4.

    1. Author Response:

      Reviewer #1 (Public Review):

      1) It seems like this model treats chromosome gains and losses equivalently. Is this appropriate? Chromosome loss events are much more toxic than chromosome gain events - as evidenced by the fact that haploinsufficiency is widespread, and all autosomal monosomies are embryonically-lethal while many trisomies are compatible with birth and development. Can the authors consider a model in which losses exert a more significant fitness penalty that chromosome gains?

      While we agree that monosomies are more detrimental than trisomies in non-cancerous tissue, this is not necessarily the case in tumors in which monosomy is often observed (see PMID: 32054838). Nevertheless, to address this critique we have now added a model variant with an additional condition in which cells experience extreme fitness penalties (90% reduction) if any chromosome is haploid. We apply this condition to all selection models and find this attenuates a ploidy increase over time in diploid cells in most selection models (see Figure 3 ‘haploid penalty’).

      2) Chromosomes do not missegregate at the same rate (PMID: 29898405). This point would need to be discussed, and, if feasible, incorporated into the authors' models.

      While this may be true in some contexts, the limited data on this topic (namely Worral et al. Cell Rep. 2018 and Dumont et al. EMBO J. 2020) do not agree on which chromosomes are mis-segregated more often. Worral suggested chromosomes 1-2 are particularly mis-segregated, whereas Dumont finds chromosome 3, 6, X are the highest. These differences may be explained by a context-dependent effects that depend on the model and mechanism of mis-segregation. Worral uses nocodazole washout to generate merotelics whereas Dumont gets mis-segregation through depleting CENP-A. It is unknown which if these mechanisms, if either, is representative of the mechanisms at play in human tumors so we decided to take a general approach assuming equivalent mis-segregation rates. However, we appreciate that this will be a question for other readers and we have now added this to the discussion.

      3) It would be helpful if the authors could clarify their use of live cell imaging (e.g., in Fig 6G). Certain apparent errors that are visible by live-cell imaging (like a lagging chromosome) can be resolved correctly and result in proper segregation. It is not clear whether it is appropriate to directly infer missegregation rates as is done in this paper.

      We did not perform this live cell imaging experiment. We cite these data as being kindly offered by the Kops laboratory and they correspond to the scDNAseq data for normal colon and CRC organoids from Bolhaqueiro et al. Nat Gen. 2019. We agree that chromosome mis-segregation rates cannot be directly inferred by imaging. As you say, lagging chromosomes may resolve and segregate to the correct daughter cell. The fundamental assumption is that, although not all lagging chromosomes mis-segregate, that specimens with higher rate of lagging chromosomes have higher rates of mis-segreation. Because there is no gold-standard measure of CIN in the literature to date, we feel it is necessary to show the correlation between the two and how the data from that study relates to the inferred rates in this study. We have made this clearer in the text.

      4) The authors would need to discuss in greater detail earlier mathematical models of CIN, including PMID: 26212324, 30204765, and 12446840 and explain how their approach improves on this prior work.

      We now provide a more detailed discussion on prior mathematical models, incorporating these and others.

      Reviewer #2 (Public Review):

      Weakness of the framework include: (1) Most notably, the presented framework is lacking expanded characterization and validation of selection models that are biologically relevant.

      We have taken this critique to heart. To address this, we have greatly expanded the models and their characterization. We now explicitly include a neutral model throughout, tested various modifications of the model (Figure 3C-E), and use ABC to enable model selection (see Table 3).

      The current framework simply applies a scalar exponent to already published fitness models for selection. It is unclear what this exponent mirrors biologically, beyond amplifying the selection pressures already explored in existing gene abundance and driver density models.

      We implemented cellular fitness as the sum of normalized chromosome scores such that the fitness of euploid cells is 1 and the probability of division = 0.5. In this framework, within the ‘abundance’ model, a cell with triploidy of chromosome arm 1p would have a fitness of 0.98. With no additional selection, the probability that this cell divides is 0.98 x 0.5 = 0.49. The published fitness models for karyotype selection do not experimentally determine how fitness relates to the probability of division within a given time. For example, there is no clear reason why (or evidence indicating) an extra copy of chromosome arm 1p would reduce the probability of division from 0.5 to precisely 0.49 for a given period. The proposed model of karyotype selection that our ‘abundance’ model is based on only stipulates that aneuploidy of larger chromosomes is more detrimental than small chromosomes. Thus, these fitness values behave as arbitrary units and, therefore, we believe that adjusting and fitting an arbitrary scaling factor to the biological data is appropriate. For example, with an additional selection of S=10, the same cell with trisomy of chromosome arm 1p would divide with a probability of F^S x 0.5 = 0.98^10 x 0.5 = 0.41.

      We could have implemented a multiplicative framework where fitness (F_mult) is defined as the total deviation from euploid fitness (1) multiplied by a scaling factor S (F_mult = S(1 - F)). For the trisomy 1p example, the same fitness value (F^S=0.9810) can be achieved multiplicatively as exponentially via 1 – (9.14 x (1 - 0.98)) ~ 0.98^10. Thus, the same fitness values can be achieved through arbitrary scaling. We regret that this may have been misinterpreted because it was implemented exponentially vs multiplicatively.

      To further address this critique, we have now better fitted the S values with a flat prior probability across all values, shown how it relates to P_misseg in posterior probabilities (e.gs, Figure 6C, Table 3) and performed the separate analysis requested in critique #5 below.

      (2) Towards this, how is the CIN ON-OFF model in which CIN is turned off after so many cell divisions relevant biologically? Typically CIN is a considered a trait that evolves later in cancer progression, that once tolerated, is ongoing and facilitates development of metastasis and drug resistance. A more relevant model to explore would be that of the effect of a whole genome duplication (WGD) event on population evolution, which is thought to facilitate tolerance of ensuing missegregation events (because reduce risk of nullisomy).

      We agree that the CIN ON-OFF model had limited biologic relevance and removed this. To improve on this, we have changed our approach to use constant CIN for a much longer period of time (3000 time steps). We agree that WGD is a relevant phenomenon. However, others have already explicitly modeled this (see PMID: 26212324 and 32139907), so we avoid doing the same. Instead, we show that tetraploid founding cells tolerate high mis-segregation rates better than diploid founding cells.

      (3) The authors utilize two models of karyotype fitness - a gene abundance model and driver density model - to evaluate impact of specific karyotypes on cellular fitness. They also include a hybrid model whose fitness effects are simply the average of these two models, which adds little value as only a weighted average.

      To date we do not have an experimentally-defined human selection model. The gene abundance model is limited in that it considers all genes equally which inadequately considers disease function and essentiality. By contrast the driver density model weights tumor suppressors and oncogenes which may not operate in all context and ignores the essential functions of most other protein-coding genes. We believe the hybrid model can compensate for these mutual defects, but acknowledge the importance an experimentally derived models to adjudicate which is best.

      In silico results shows inferred missegregation rates are extremely disparate across the two primary models. And while a description of these differences is provided, the presented analyses do not make clear the most important question - which of these models is more clinically relevant? Toward this, in Figure 2F, the authors claim the three models approach a triploid state - which is unsupported by the in silico results. Clearly the driver model approaches a triploid state, as previously reported. But the abundance model does not and hybrid only slightly so, given that it is simply a weighted average of these two approaches. Because the authors have developed a Bayesian strategy for inferring which model parameters best fit observed data, it would be very useful to see which model best recapitulates karyotypes observed in cancer cell lines or patient materials.

      We agree that the abundance and hybrid models are unable to approach a triploid state, in earnest, as does the driver and have made that clearer in the text and improved the figure panel in question for clarity. To address your latter point on which model best fits observed data, we have implemented a model selection scheme to do this (see Table 3). This indicates the gene abundance model as the most biologically relevant and provides evidence for stabilizing selection as the primary mode of selection occurring in the organoid and biopsy data we analyzed.

      (4) Topological features of phylogenetic trees, while discriminatory, are largely dependent on accurate phylogenetic tree reconstruction. The latter requires more careful consideration of cell linkages beyond computing pairwise Euclidean distances and performing complete-linkage clustering. For example, a WGD event, would appear very far from its nearest cell ancestor in Euclidean space.

      While more granular cell linkage data would certainly improve phylogenetic reconstruction, low-coverage scDNA- sequencing (0.01-0.05x) is unable to reliably recover SNPs that would enable this approach. Clustering on copy number similarity remains the standard approach at this point (see PMID: 33762732). We have added this to the discussion.

      (5) Finally, experimental validation of the added selection exponential factor is imperative. Works have already shown models of karyotypic evolution without additional selection exponential coefficient can accurately recover rates of missegregation observed in human cell lines and cancers by fluorescent microscopy. Incorporation of this additional weight on selection pressure has not been demonstrated or validated experimentally. This would require experimental sampling of karyotypes longitudinally and is a critical piece of this manuscript's novelty.

      As described in #1 above, the selection values of F are in arbitrary units and so we believe a selection scaling factor is important to include in the model. For example, without additional selection, a hypothetical aneuploid cell with a trisomy resulting in F = 0.95 would be 5% less likely to divide than a euploid cell with F = 1. The exponent scales the selection such that when S = 2, the fitness of the trisomic cell is F ~ 0.9, or 10% less likely to divide. This scaling is necessary to enable both positive and negative selection in a system fitness is decided as the sum of chromosome scores. To further validate the additional weight on selection pressure we did the following:

      1. We constrained the prior distribution of simulated data for our model selection to S=1 giving only the base fitness values without additional scaling. We, again, performed model selection on the data from Bolhaqueiro et al., 2019 and Navin et al., 2011 and found that, with this constrained prior dataset, we inferred mis- segregation rates (see Table 4) that were far below rates seen in cancer cell lines (see Figure 6E).

      2. Given the initial clarification that reviewers were looking for longitudinal analysis, we leveraged data provided by the authors of Bolhaqueiro et al., 2019 where they sequenced single cells from 3 clones from organoid line 16T at 3 weeks and 21 weeks after seeding. We inferred mis-segregation rates and selective pressures in these clones at the 3-week timepoint. We did so under the Abundance model using the same prior distribution of steps given that the diversity of populations under the Abundance model rapidly reach a steady state. When we simulated additional populations using these inferred characteristics we found that the karyotype composition of the simulated populations most closely resembled the biological population than did populations simulated with the unmodified selection values (see Figure 6 — figure supplement 4). This lends credence to the biological relevance of scaled selective pressure vs. unmodified selective pressure.

      Reviewer #3 (Public Review):

      1) Given the importance of the selection paradigm in determining the observed karyotypic heterogeneity, a significant weakness of the work is that there is no attempt to learn the selection paradigm from the observed data. This is important because there is an interrelationship between selection, the chromosomal alteration rate, and the observed data and so the accuracy of the inferred alteration rate is likely to be compromised if an inappropriate model of selection is used.

      We have implemented a model selection strategy to address this critique. Accordingly, we infer mis-segregation rate under each model and take the result of the best-fit model to be the inferred rate. In this case, stabilizing selection under

      2) Somewhat relatedly, how the population of cells grows (e.g. exponential growth vs constant population size) also effects the observed karyotype heterogeneity, but the modelling only allows for exponential growth which may be an inappropriate of the public datasets analysed.

      We have now concurrently modeled chromosomal instability with a constant population size by approximating constant- population Wright Fisher dynamics (see Materials and Methods). We find these models produce similar results at the karyotype level, addressing concerns about the effects of growth patterns on karyotype evolution in this model.

      3) There are some technical concerns about the approximate Bayesian computation analysis (choice of prior distributions, testing for convergence, matching of the growth model to cell growth patterns in the data, and temporal effects) which need to be addressed to ensure this part of the analysis is robust.

      To address these concerns, we improved and more clearly detailed the prior distributions for each inference within the figure legends, we tested for karyotype convergence in each model (see Figure 3), and we demonstrate that inference under the Abundance model is robust to changes in the number of time steps included in the prior data (see Figure 6 — figure supplement 1).

    1. Author Response:

      Reviewer #3 (Public Review):

      1) The two algorithms presented are essentially a low-pass and high-pass filter on binarized odor. As such, it may not be so surprising that there is a tradeoff between which algorithm works better depending on the frequency content of different environments. The low-pass filter (algorithm 1) works better in environments with mostly low-frequency fluctuations (boundary layer plume, low wind-speed, high diffusivity) while the high-pass filter (algorithm 2) works better in environments with mostly high-frequency fluctuations (high windspeed, low diffusivity). To understand what is essential in these algorithms I think it would be useful to (1) compare the two algorithms to a "null" algorithm that drives upwind orientation whenever odor is present (i.e. include thresholding and binarization but no filtering), (2) compare navigation success metrics directly to the frequency content of different environments, (3) examine how navigation success depends on the filtering cutoff of the two algorithms (tau_on and tau_w). Comparing to the null algorithm with no filtering I think is important to determine whether there is actually a tradeoff to be made, or whether a system that can approximate a flat transfer function (or at least capture all relevant frequencies in the environment) is ideal and must be approximated with biological parts.

      For (1) and (3), we have now added simulations of the models for a range of different timescales, including an integrator with an infinitely fast timescale corresponding to the “null” model the reviewer describes (Results lines 376-380, Figure 4—figure supplement 2 and Materials and methods lines 1008-1025). We find that changing the timescale of the intermittency filter largely leaves performance unchanged whereas changing the timescale of the frequency filter is akin to changing the gain on the frequency filter, as predicted by Equations 24 and 29. Since we do find a local maximum in the frequency filter timescale, we conclude that there are benefits to filtering in time. For (2), many plumes we simulate in Fig. 5 span a wide range of frequencies and intermittencies; we chose to plot performance as a function of diffusivity / windspeed to emphasize how performance depends on environment parameters that shape the statistics of the plume (flow and odor dynamics). Note that we renamed 𝜏! to 𝜏".

      2) While the two algorithms presented here present a nice conceptual division, biological filtering algorithms are likely to incorporate elements of both. For example, the adaptive compression algorithm of Alvarez-Salvado (which is eliminated in the simplification used here) provides some sensitivity to odor onsets and is based on well-described adaptation at the olfactory periphery. Synaptic depression algorithms likewise provide sensitivity to derivatives as well as integration over time, and synaptic depression with multiple timescales has been described in detail at various stages of the olfactory system. A productive extension of the work done here would be to explore the utility of biophysically-motivated filtering algorithms for navigation in different environments.

      Thank you for this suggestion, which led us to extend our work in that interesting direction. We have now generalized our model to respond to odor intensity (rather than its binarized version) by implementing an adaptive compression taken from prior modeling efforts (Alvarez-Salvado et al, eLife 2018) (added to Fig. 3; also see additional Fig. 3 Supplement 1). Moreover, we now also consider navigators that respond to odor signals using a biophysical model of odor transduction, ORN firing, and PN firing, in addition to synaptic depression within the ORN-PN synapse, which combines modeling efforts from prior works (Gorur-Shandilya, Demir, et al, eLife 2017; Nagel & Wilson, Nat. Neurosci. 2015; Fox & Nagel, “Synaptic control of temporal processing in the Drosophila olfactory system” arXiv 2021). This realistic circuit model produced exciting results that indicate that the natural ORN-PN circuitry can, to some degree, satisfy the dual demands of intermittency and frequency sensing. These results are shown in the new Fig. 6.

      3) It would be helpful in the Discussion to present a clearer picture of what the frequency content of natural environments is likely to be. For example, flies stop walking at windspeeds above ~70cm/s (Yorozu 2009). In contrast, flies in flight are likely to encounter much sparser and high frequency plume encounters, as they are moving through the air at much faster speeds and because odors encountered here would be away from the boundary layer. Therefore the best test of the tradeoff hypothesis would likely be to compare temporal filtering of odor plumes by neural circuitry in flying vs walking flies. This would connect to the literature in motion detection as well, where octopamine release during flight causes a speeding of the motion detection algorithm.

      We have added lines 47-48 to the introduction describing the natural frequency content of plumes and lines 574-578 discussing how one might see evidence of this tradeoff when comparing between walking and flying flies.

    1. Author Response:

      Reviewer #2:

      What the authors attempt to achieve, and their approaches:

      The author attempt to establish by which mechanisms cholesterol influences the function of the GPCR A_{2A}R, an adenosine receptor. The role of cholesterol on GPCRs has been reported in a number of studies, primarily in cellular experiments, and the authors set out here to clarify the molecular mechanisms.

      To this end, they build upon their recent achievements to produce this protein and reconstitute it in nanodiscs, i.e. discoidal objects comprised of the membrane protein (here: A_{2A}R), lipids (here: POPC, POPG and cholesterol) and a membrane-scaffold protein (MSP) which wraps around this disc of protein+lipid. Nanodiscs allow studying proteins in solution, and are thought to be much more native-like than e.g. detergent micelles.

      The authors first use GTP hydrolysis experiments to quantify the basal activity and agonist potency at cholesterol concentrations from 0 to 13%. The cholesterol effects are weak but detectable. Then they use a single 19F label that reports on the protein's conformation (active, inactive) to show that the protein populates slightly more active states with cholesterol. (again, weak effects). Then they investigate G-protein binding to A_{2A}R in the nanodisc, and find (very!) weak enhancement at 13% cholesterol. These data point to weak positive allosteric modulation by cholesterol. They then use molecular dynamics simulations to probe the allosteric communication, using a recently proposed framework (Rigidity-transmission allostery). Doing these simulations in the presence of cholesterol (postions of cholesterol from X-ray structure) and absence. This analysis shows again only very weak effects of cholesterol, and this time the effect is opposite, i.e. negative allosteric modulation by cholesterol. Then they use 19F-labeled cholesterol analogues to probe by NMR the state of cholesterol (bound to protein?). Lastly, they use Laurdan fluorescence experiments and pressure NMR to establish that (i) the lipids become more ordered when cholesterol is present, and (ii) if one achieves such ordering even without cholesterol - namely by pressure - one may achieve similar effects as those that cholesterol has.

      Collectively, these data lead them to conclude that cholesterol has a (weak) positive allosteric effect on this receptor, and this effect is not a direct one, but goes via modulation of the membrane properties.

      We thank the reviewer for his comments and critique. A lot of his comments have to do with the nanodisc as a model system. We have therefore included an additional paragraph as discussed above, highlighting the advantages and disadvantages of the nanodisc. We’ve also included references to papers that have characterized nanodiscs or membrane proteins in nanodiscs. In our hands, 31P NMR spectra of POPC/POPG nanodiscs and their melt behavior is very similar to liposomes. We’ve tried to add to the discussion on nanodiscs without distracting too much from the focus in the paper.

      Major strengths and weaknesses of methods and results:

      The study addresses an important question, which inherently is difficult to answer: the effect of cholesterol is poorly understood and such studies require to work in an actual membrane. The authors do a careful combination of different methods to achieve their goal of identifying the mechanisms.

      Despite combining several methods, several of them have their inherent problems:

      (i) the nanodisc is too small to properly mimic the membrane environment, and it does not allow reaching relevant cholesterol concentrations. Moreover, it is not clear (to me) if one can exclude e.g. interactions of the protein with the surrounding MSP, or of cholesterol with MSP (see (iii) below).

      We agree. In principle, we should worry about MSP. On the other hand, this is a constant in all of the samples and we focus instead on the cholesterol-dependent effects. These nanodiscs are unarguably small. We’ve commented on this in the paper now. However, we’d expect that the confinement would if anything emphasize the cholesterol bound state. Yet, the NMR studies of F-cholesterol interactions at best identified transient bound states.

      (ii) the state of the protein (inactive, active) is probed with a single NMR-active site. The effects are small and I am not convinced that one shall interpret changes as small as the ones in Figures 3 and 4. In particular, how does this single probe behave at high pressure? Does it reflect an active state at 2000 bar pressure - where possibly other effects (unfolding?) may occur?

      Here we can be quite confident. The spectra are predicated on a recent paper (Huang, et al, 2021) published in Cell in the spring of this year. Each state was carefully correlated with specific functional assays and conditions in a self-consistent way. The labeling site used on TM6 was strategically chosen based on earlier crystallographic studies of inactive and active A2AR. We have other labeling sites (TM7 and TM5) but the point was to use the chemical shift signatures to talk about cholesterol-induced changes to the conformational ensemble assigned in the Cell paper. The differences are small, but the fact that PAM effects are observed across conditions (apo, inverse agonist-bound, agonist-bound, and G protein-bound) reassures us that the spectral differences between low and high cholesterol samples are real. Unfolding by 19F NMR is in this case easy to see – the effects become irreversible and independent of ligand and the chemical shift ends up as one upfield peak. We also see a stabilization of the A1 (active) state, and a slight downfield shift of the active ensemble with increased pressure, consistent with reduced exchange dynamics (and coalescence) associated with the active state. We’ve commented on this in the revised version while trying not to distract from the flow of the paper.

      (iii) the data in Figure 6 (19F of cholesterol analogs) are hard to interpret. Is cholesterol bound to the protein? Does the 19F shift reflect binding to the protein? or interactions within the confined space of the disc? or with MSP? The two analogs do not tell a coherent story.

      It is confusing. We agree. We were fully expecting to see a clear A2AR bound state of cholesterol either through a concentration-dependent shift or a new peak. We also looked for “hidden” bound states through 19F NMR CEST experiments. We never identified a bound state in the presence of a range of cholesterol concentrations, as a function of receptor drug. We did observe small shifts although often these effects were as prominent with inverse agonist as agonist, possibly pointing to the existence of multiple weak binding sites. We’ve added some of this to the conversation. It’s also certainly possible that cholesterol exhibits some interaction with MSP, although again MSP is a constant presence in all the samples while we are focusing on cholesterol-dependent effects. In any case, we never detected a bound signature characteristic of slow exchange. That’s significant to the study despite the ambiguity of the measurements.

      (iv) the pressure NMR study (Fig 7D) has weaknesses. The authors implicitly assume that pressure acts on the membrane, leading to more ordering. (They do recognize the possibility that pressure may have an effect on the protein directly, but consider that this direct effect on the protein is minor.) I think that their arguments are possibly incorrect: they apply here pressure onto a sample of nanodiscs, but all studies they cite to justify the use of pressure on membranes dealt with extended lipid bilayers (liposomes). To me it is not clear what is the lateral effect of pressure onto a nanodisc. Can water laterally enter into the bilayer and thus modify the lipid structure? I also note that previous pressure-NMR studies on a GPCR in micelles (rather than nanodiscs) showed a shift toward the active state. As a micelle is a very different thing than a nanodisc, this suggests that the pressure effect is, at least in part or predominantly, on the protein itself.

      On top of the weakness of the pressure NMR experiment to identify what actually happens to the disc, it is not clear either how to interpret the 19F shift at very high pressure (Fig 7D). Given that there is only a single NMR probe, far out in an artificial side chain, it is difficult to assess the state of the protein.

      These are good questions. Firstly, lipid bilayers (be it in liposomes, bicelles, or nanodiscs) are super soft and compressible systems – all known to change in hydrophobic thickness to pressure much more readily than proteins – be they membrane embedded or soluble. Secondly, the 19F NMR spectra are well-known to be representative of fully functional receptor as discussed above. Thirdly, even detergent micelles are susceptible to pressure (much more so than the receptor itself) See J. Phys. Chem. B 2014, 118, 5698−5706 (now referenced in the paper). Pressure will enhance hydrophobic thickness, even in a detergent host, by ordering the acyl chains. The lower specific volume states, selected by higher pressure, have a larger hydrophobic dimension. Thus, the effects seen earlier are equally an effect of environment. In the revised version, we simply make the point that the protein isn’t unfolded and that both cholesterol or pressure give rise to enhanced hydrophobic thickness and corresponding shifts in equilibria to the active states.

    1. Author Response:

      Reviewer #1:

      The paper details a whole genome re-sequencing of 310 accessions of quinoa. This provides a good glimpse of diversity in this orphan crop, plus the GWAS studies are able to help provide the foundations for identifying key genes in quinoa variation. This will certainly advance our knowledge of this increasingly important orphan crop.

      1) One issue that permeates the entire paper is that the analysis is fairly basic and the authors do not make full use of the data. The analysis of population diversity is restricted to PCA, ADMIXTURE and phylogenetic analysis. It would probably broaden the impact of the paper if they can do deeper analysis of quinoa diversity, maybe looking at demographic history, looking at selection of highland vs. lowland, etc.

      Thanks for this suggestion. We performed a local PCA analysis by dividing the genome into 50 kb windows, and the results of the analysis are presented in Fig. S9. The results are added to the text, lines 189-209 and 556-562. Moreover, for a better understanding of the demographic history of quinoa, another study is underway with a very large set of additional genome sequences and additional outgroups.

      2) There is a focus on the rapid LD decay, which the authors attribute to the short breeding history and low selection. That seems like a stretch to make this conclusion based solely on LD decay. As they point out, many other factors could account for this, and the authors should provide other lines of evidence to draw this conclusion.

      The evidence of short breeding history in quinoa is also provided through admixtures analysis (Fig. S6) and genetic diversity analysis (Fig. S7 and S8).

      3) The GWAS analysis is good and does provide a good foundation for quinoa genetics. The authors discuss possible candidate genes is these GWAS regions. For the thousand seed weight, the relative small span of the GWAS peaks allows for localization of just a few genes in the GWAS region (CqPP2C5 and the CqRING). The GWAS associated with flowering time is larger - 1 Mb with 605 genes - but the authors focus on the GLX2-1 gene. This is again a stretch, as the large region precludes narrowing the candidate list unless there was a compelling mutation (for example a deletion or insertion of a major flowering time gene).

      Altogether, 605 genes are found in the 50kb flanking regions of the PCA-associated SNPs. This region is not 1 Mb, but 0.1 Mb in size. It was a typing error in the text corrected as 8.05-8.15 Mb (modified in the text line 284). In this region, we found 5 genes, and 3 of them were without any known annotation. The strongest association was found in the GLX2-1 gene and this association was also ‘consistent’ between years for all four traits. We modified the text line 285-286 and 287-290.

      Reviewer #2:

      A key genomic study on emerging, nutritious, alternative grain crop.

      Deep genomic data on hundreds of land races/accessions.

      Population structure analysis, could be enhanced.

      Agronomic growth and yield traits are correlated and environmentally sensitive.

      Genomic dissection via GWAS to multigenic loci with candidate genes add genomic prediction and selection.

      Inference on domestication.

      To improve population structure analysis, we performed a local PCA analysis by dividing the genome in 50 Kb windows, and the results of the analysis are presented in Fig. S9. The results are added to the text lines 189-209 and 556-562.

      We agree that the growing conditions typical of lowland (longer seasons) can prevent many accessions from reaching maturity. However, we observed that all accessions flowered and produced seeds. Nonetheless, GWAS with PCA (CP) has been shown to be effective in multiple studies (mentioned below) for genetically correlated traits. Therefore, we believe our analysis could address the bias that might occur due to maturity differences. We also discuss this in line 386-390 and 413-417.

      • Miao, C., Xu, Y., Liu, S., Schnable, P. S., & Schnable, J. C. (2020). Increased power and accuracy of causal locus identification in time series genome-wide association in sorghum. Plant physiology, 183(4), 1898-1909.

      • Yano, K., Morinaka, Y., Wang, F., Huang, P., Takehara, S., Hirai, T., ... & Matsuoka, M. (2019). GWAS with principal component analysis identifies a gene comprehensively controlling rice architecture. Proceedings of the National Academy of Sciences, 116(42), 21262-21267.

      • Aschard, H., Vilhjálmsson, B. J., Greliche, N., Morange, P. E., Trégouët, D. A., & Kraft, P. (2014). Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. The American Journal of Human Genetics, 94(5), 662-676.

      Genomic selection and prediction are interesting points. We believe that our study marks an important first step on the way to genomic selection. We agree that in many breeding programs, using marker-assisted selection for polygenic traits failed. However, markers from QTL explaining a large proportion of the phenotypic variance can be useful for marker-assisted selection, as for instance, the markers from our QTL regions on Cq2A. The next step will be to provide a database for genomic selection. This requires a more extensive set of breeding lines (training population) which should be grown under different environments.

      Reviewer #3:

      The authors have re-sequenced 310 quinoa accessions and carried out field phenotyping of the same set of accessions for two years in order to characterize genetic diversity and analyze the genetic basis of agronomically important traits.

      The main strength of the manuscript is that the authors have carefully characterized more than 300 quinoa accessions, achieving a sufficiently large population size for GWAS analysis with good statistical power. It is especially promising that the phenotypes all show high heritability. This indicates that the field phenotyping was of high quality and provides a good starting point for discovering relevant marker-trait associations. In addition, the authors provide convincing evidence for distinct population characteristics of highland and lowland quinoa, adding additional information compared to previous work (Maughan, 2012).

      The weak points are related to the genotype data and the conclusions drawn based on the GWAS analysis.

      1) An important issue is related to the relatively low depth of coverage (4-10x) that was used for re-sequencing. Across the accessions, there is a pronounced negative correlation between the mean sequencing depth and the heterozygosity level, indicating that heterozygotes are overcalled in individuals with low coverage. This also results in heterozygosity levels that are generally higher than expected for what is assumed to be mainly homozygous inbred lines.

      We addressed your concern by providing the scatter plot as requested. We also calculated correlations between coverage and heterozygosity (Fig. S3b). However, correlations were not significant, and therefore we believe that the coverage was sufficient enough to achieve accurate SNP-calling (lines 106-108).

      2) Another potential issue concerns SNPs called in repetitive regions. Among the significant GWAS SNPs identified, a very large proportion appears to be found in intergenic regions. While this does not rule out that some of them are genuinely important associations, it does suggest a potentially high level of noise in the GWAS results. In addition to the filtering already imposed, which includes a filter for mapping quality, the SNPs called in intergenic regions with unusually high coverage could be more closely examined to determine the extent of the issue. Masking repetitive genomic regions using RepeatMasker or similar programs could be useful.

      Thank you for this suggestion; we understand the problem could occur due to the poor/incorrect mapping in the intergenic regions. Therefore, we applied stringent filtering to remove SNPs with more than 50% missing genotype data, minimum mean depth less than five, and minor allele frequency less than 5% for the GWAS analysis. SNP densities in intergenic regions are generally higher than in the genic regions. In this table, there are 511 (47% of all association) intergenic SNPs and 300 upstream or downstream (28%) that are associated with traits. Therefore, we do not think that we have an overwhelming majority of intergenic SNPs. Also, we believe that SNPs within repetitive regions are also important. For instance, repetitive elements can have a function in controlling gene expression. Moreover, since our SNP calling and filtering criteria were very stringent, the probability of having false positives in our SNP data set is very low. Therefore, we would not remove them from the GWAS analysis at this stage.

      3) When the authors discuss their GWAS results, they frequently focus on cherry-picked candidate genes, although, in several cases, the top SNPs in the region in question are not found within these candidates. A more broad focus on all genes within the LD blocks, while still mentioning the candidate genes, would be more informative.

      We obtained candidate genes based on whole-genome LD average (50 kb) and we provided LD heatmaps to show that Saponin genes and GLX2-1 are in LD with the strongest associated SNPs Modified line 259-260, 398. For thousand seed weight, we showed that the SNPs with significant p-values are located within both CqRING and CqPP2C genes. We also modified the text accordingly (Lines 24,81,249,251,254-255,274,275,285-286,287-290,300-302,391,396,397,398,405-406,409-410,413-418,420-422).

      4) The manuscript includes statements that a particular genotype "results in" some phenotypic outcome, although no causal relationship has been demonstrated. In general, there is a tendency to draw too strong conclusions based on the GWAS results.

      We modified the text based on the reviewer’s comment. Rephrased into “associated with”.

      5) As this is primarily a resource paper, the authors should make the complete genotype and phenotype data as well as the layout of the field trials available. It would not be possible to reproduce the GWAS analysis based on the data included with the current version. They should also clarify how the quinoa accessions described will be made accessible to the community and provide all scripts used for data analysis through GitHub or a similar repository.

      Most of the accessions are available from the IPK Gatersleben and the USDA genebanks. Materials that are not available from the genebanks can be obtained from the authors with a Standard Material Transfer Agreement (SMTA). Genomic data (Ready to use vcf files) and phenotypic data are made available through the Dryad repository https://doi.org/10.5061/dryad.zgmsbcc9m. Raw sequencing data are available from NCBI SRA. Also, detailed descriptions of the germplasm, phenotyping methods, and phenotypes are posted at https://quinoa.kaust.edu.sa/#/ and published in Stanschewski et al., 2021 (see lines 603-607).

  5. Nov 2021
    1. Author Response:

      Reviewer #1 (Public Review):

      Hickey et al. studied chromatin landscape changes in early Zebrafish embryos at three distinct stages: preZGA, ZGA and postZGA. Using ChIP-seq on these time-course samples, they examined developmental genes at their regulatory elements, including promoters and enhancers, that carry nucleosomes enriched with histone variant H2A.Z, as well as post-translational modifications H3K4me1 and H3K27ac, but with low DNA methylation, in early-stage embryos prior to turning on zygotic gene expression. During embryogenesis, this group of elements recruit a Polycomb Repressive Complex 1 (PRC1) component Rnf2 to "write" the ubiquitinated H2A or H2A.Z. The mono-ubH2A/Z then recruits a PRC2 component Aebp2 to further "write" the H3K27me3 repressive mark to silent these developmentally regulated genes in later stage embryos. Using a small molecule to inhibit Rnf2 abolishes H3K27me3 and leads to ectopic gene expression.

      Most of the data for the first half of this manuscript are presented in a clear and logic manner. The conclusions based on these correlation assays are quite obvious and well supported (except a few minor points raised below for clarifications, #2-#3). The major concern is for the second half of the manuscript where a drug is used to draw causal relationships (see point #1 below).

      1. Using small molecule could have secondary effects. It also seems that the drug-induced defects cannot be reversed after being washed away. Furthermore, this drug treatment eliminates almost all H3K27me3 genome-wide, regardless of their occupancy status with mono-ubH2A/Z, making it difficult to make the causal connection between the prerequisite mono-ubH2A/Z occupancy and the subsequent de novo H3K27me3. I think it is important for the authors to address this point more directly as this is the main conclusion of this work. Could the authors perform genetic analyses to confirm the specificity of the phenotypes?

      2. Page 8, line 160-163: "Curiously, enhancer cluster 5 (Figure 2A) was unique - displaying high H3K4me1, very high H3K27ac, and open chromatin (via ATAC-seq analysis; Figure 2 - figure supplement C, D) - but bore DNA methylation - an unusual combination given the typical strong correlation between high H3K4me1 and DNA hypomethylation." I suspect that the authors are talking about the chromatin state at pre-ZGA stage as this is the only stage DNA methylation pattern was included, but it is hard to tell that this cluster displays high H3K4me1 at all.

      We now see the confusion, and are happy to clarify this. We were intending to refer to to the histone marking at postZGA, and the DNAme at postZGA (for cluster #5) – as postZGA is the time when H3K4me1 is high, H3K27ac is very high, and DNAme remains high. The reviewer is right that we do not show the DNAme pattern at post ZGA, only preZGA. However, the DNAme pattern stays almost constant between preZGA (2.5 hpf) and postZGA (4.3 hpf) – a result we published previously in Potok et al., 2010 (note: the maternal genome shows DNA reprogramming prior to 2.5hr, and is then constant through ZGA). We did not include DNAme at every stage simply to save space in Panel A, which was getting crowded. However, to avoid the reader misunderstanding our point, we have taken care to make this clear in the revised manuscript. We thank the reviewer for raising this point.

      1. Page 10, line 206-207: "PRT4165 treatment also conferred limited new/ectopic Aebp2 peaks (Figure 4C, clusters 4, 6, 7,8)", it seems that clusters 4, 6, 7, 8 together are not "limited" compared to clusters 1, 3, and 5, and could be even more abundant.

      Thank you for this comment - we agree with the reviewer and have clarified this in the text and Figure 4. In the initial version, the section where we mention ‘limited’ additional sites was intend to refer to promoters, and although as only a modest fraction of the ectopic sites are at promoters, but we did not provide that context in the text. Indeed, if one looks at all sites in the genome, there are a large number of ectopic sites after PRT4165 treatment. This is shown clearly in the revised Figure 4 (which shows all genomic sites) and we have clarified this in the text.

      We were curious whether there is any feature that helps us understand what might unify the ectopic binding, and therefore underlie the mechanism(s). First, we tested whether binding sites for particular transcription factors might be enriched; however, we did not find a class of binding sites that represented more than 3% of the total sites. We note that others have reported some affinity of mammalian Aebp2 for DNA and some limited sequence specificity (Kim et al., NAR 2009), and in the absence of a high-affinity H2AUb target, that shadow DNA binding function may become more apparent. Furthermore, we did not observe chromatin marks that showed a highly significant degree of overlaps. Thus, although intriguing, there does not appear to yet be a logic to the ectopic binding observed.

      1. In the context of studying the chromatin state of developmental genes in early vertebrate embryos, there are two recent publications in mouse embryos which also investigated the crosstalk between mono-ubH2A and H3K27me3 at the ZGA transition in mouse (https://doi.org/10.1038/s41588-021-00821-2 and https://doi.org/10.1038/s41588-021-00820-3). It would be informative to add some discussion for comparisons between these two vertebrate organisms.

      Reviewer #2 (Public Review):

      One model for polycomb domain establishment suggests that PRC2 adds H3K27me3 first, and then recruits PRC1 for silencing. The key evidence for this model is the H3K27me3-binding module CBX proteins in canonical PRC1 complexes. This model has been revised by recent studies, and it is now well recognized that the polycomb domains can be de novo established in a different order. In other scenarios, including X inactivation, a non-canonical PRC1 complex that lacks CBX proteins catalyzes ubH2A first, and PRC2 complex is subsequently recruited through recognizing ubH2A modification by its Jarid2 and Aebp2 subunits.

      In this manuscript, Hickey and co-workers analyzed the temporal change of various epigenetic marks around ZGA stages during zebrafish early embryo development. Based on their experimental data and bioinformatic analysis, they suggest that polycomb establishment in zebrafish embryo is following the 'non-canonical' order, in which H3K27me3 establishment is dependent on ubH2A pre-deposition and the following recruitment of Aebp2-PRC2 complex. Moreover, they suggest that polycomb-silenced developmental genes are solely repressed by ubH2A, independent of H3K27me3. Overall, the functional analysis (RNF2 inhibitor experiments) conducted in the current study highlights the critical function of PRC1 and ubH2A in silencing developmental genes during early embryo development. Moreover, this study provides clues that could reconcile with the earlier observations that H3K27me3 seems largely dispensable for silencing developmental genes in zebrafish early embryo (e.g. PMID: 31488564).

      The main concern is two similar studies have just been published in Nature Genetics using mouse early embryos, and the observation of this manuscript largely agree with the two mouse studies, rendering the novelty of this study.

      In addition, certain conclusions in the manuscript requires further experimental support:

      1. While the authors claim that H3K27me3 is established after ZGA, it is quite surprising to me that they did NOT analyzed the H3K27me3 pattern before ZGA. While IF staining suggests a minimal level of H3K27me3 before ZGA (Fig1 S2B), previous ChIP-seq analysis demonstrate that H3K27me3 are present (e.g. PMID: 22137762).

      Briefly, in our own work, we do not detect H3K27me3 by IF prior to ZGA, and we could not detect H3K27me3 peaks by ChIP during preZGA (also mentioned as ‘data not shown’ in Murphy et al., 2018).

      1. While the RNF2 inhibitor experiment clearly demonstrates that PRC1 is required for the deposition of both ubH2A and H3K27me3, that does not necessarily mean that PRC1-mediated ubH2A deposition precedes H3K27me3. The establishment and maintenance of polycomb domain usually requires the crosstalk and reinforcement between polycomb complexes. Therefore, the deficiency in either PRC1 or PRC2 complex may lead to the decreased level of both marks. To clarify a hierarchical order of the polycomb domain establishment, a phenotypic analysis of PRC2 deficiency is also necessary.

      Here, we emphasize that prior to performing the inhibitor experiment, we addressed the temporal order of addition in Figure 1 and in Figure 1 – figure supplement 1. H2Aub1 is added extensively to thousands of developmental genes during preZGA, well before H3K27me3 is detected. We interpret this as evidence that H2Aub1 temporally precedes H3K27me3 during embryonic development. We will also mention (described in the Discussion) that maternal zygotic loss of Ezh2, which eliminates all H3K27me3 in the genome at all embryo stages does not result in the activation of developmental genes.

      1. Parental difference. As shown in Fig.1B, ubH2A level varies greatly in sperm and egg, which suggests that the reprogramming process of ubH2A (and perhaps H3K27me3) distribution could be significantly different for the two parental alleles. It would be interesting to analyze the ubH2A and H3K27me3 distribution in germ cells before fertilization.

      We appreciate the reviewer’s comment and agree that this would be an interesting line of inquiry. However, this would require genomics analyses from reciprocal crosses of highly polymorphic fish strains. This would involve very considerable additional work. Therefore, we will consider this in our future studies.

      1. The role of Aebp2 subunit. Given the well-characterized function of Aebp2 in recognizing ubH2A, an involvement of Aebp2-PRC2 complex in establishing H3K27me3 on PRC1 pre-deposited regions is not unexpected. Indeed, Aebp2 co-localized well with ubH2A marked regions (Fig.3). However, an issue not clarified in the manuscript is whether Aebp2 is the sole subunit for the recruitment of PRC2 to ubH2A marked regions. Paralleled analysis of the changes for Aebp2 and H3K27me3 upon RNF2 inhibitor treatment is necessary, and Aebp2-dependent and -independent regions should be separately classified for analysis.

      2. Role of PRC1 on the temporal regulation of gene expression during early development. The authors only analyzed the RNA-seq results for RNF2i treated embryos post ZGA. Therefore, it is currently not clear if the role of PRC1 in transcriptional repression is restricted to post-ZGA stages. RNA-seq analysis of RNF2i treated embryos on those stages are also warranted.

    1. Author Response:

      Reviewer #1 (Public Review):

      The main finding - that the moment-to-moment relationship between excitability and perception is coupled to the body's slower respiratory oscillation - is novel, interesting, and important for advancing our understanding of how the brain-body system works as a whole. The experiment is simple and elegant, and the authors strike the right level of making the most of the data without doing too much and obscuring the main findings. The primary weakness, in my opinion, is the inability to distinguish between the possibility that respiration modulates excitability and the possibility that respiration modulates something boring like signal-to-noise ratio. In terms of conclusions, I thought the authors stuck pretty well to the data. The one place where the conclusions felt a little bold was in terms of the respiration <> alpha <> behavior relationship, where it felt the authors had already made up their minds re: causality. I agree that it probably makes more sense for respiration to influence something about the brain than vice versa, and the background presented in the Intro/Discussion supports this. However, the analysis only tells us that the behavioral performance was modulated by both alpha and respiration (and their interaction, but this is no way causal). Overall, it will be necessary to differentiate the current interpretation from the possibility that breathing and alpha are two unrelated time courses that influence behavior at the same time (and even interact in how they influence behavior, but just not interact with each other), and I do not believe the phase-amplitude coupling analysis is sufficient for this.

      We thank the reviewer for their positive and constructive evaluation of our work.

      Reviewer #2 (Public Review):

      Kluger and colleagues investigated the influence of respiration on visual sensory perception in a near-threshold task and argue that the detected correlation between respiration phase and detection precision is liked to alpha power, which in turn is modulated by the phase of respiration. The experiments involved detecting a low-contrast visual stimulus to the left or right of a fixation point with contrast settings adjusted via an adaptive staircase approach to reach a desired 60% hit rate, resulting in an observed hit rate of 54%. The main findings are that mutual information between the discrete outcome of hit-or- miss and the continuous contrast variable is significantly increased when respiration phase is considered as well. Furthermore, results show that neuronal alpha oscillation power is modulated in phase with respiration and that perception accuracy is correlated with alpha power. Time resolved correlation analysis aligned on respiration phase shows that this correlation peaks during inspiration around the same phase where the psychometric function for the visual detection task reaches a minimum. The experimental design and data analysis seem solid but there are several concerns regarding the novelty of the findings and the interpretation of the results.

      Major concerns: The finding that visual perception is modulated by the respiration cycle is not new (see e.g. Flexman et al. 1974 or Zelano et al. 2016).

      There are multiple studies going back decades that show alpha oscillation power to be modulated by breathing (e.g. Stancák et al., 1993, Bing-Canar et al. 2016). Also, as the authors acknowledge, it is well-established that alpha power correlates with neuronal excitability and perception threshold. What seems to be new in this study is the use of a linear mixed effect model to analyze the relationship between alpha power, respiration phase and perception accuracy. However, the results mostly seem to confirm previous findings.

      Thank you for giving us the opportunity to clarify our approach and the conceptual novelty it provides. First, not at all do we claim that our study is the first to demonstrate respiration-related alpha changes. Not only do we prominently cite the work by Zelano and colleagues (JNeuro, 2016) in the Introduction and Discussion sections, we also have previous work from our own lab demonstrating these effects (see Kluger & Gross, PLoS Biol 2021). Second, the reviewer’s comment that ‘the results mostly seem to confirm previous findings’ unfortunately appears to frame a critical proof-of-concept as a lack of novelty: In order for us to claim a triadic relationship between respiration, excitability, and behaviour, it is paramount to first demonstrate that assumptions about pairwise relations (such as respiration <> alpha power and alpha power <> behaviour) are supported, which of course means replicating known results in our data. Third, in order to evaluate the novelty of our present study, it is crucial to consider its core aim, which was to characterise how automatic respiration is related to lowest-level perception by means of respiration-induced modulation of neural oscillations. At this point, we respectfully disagree with the reviewer’s assessment of our results being mostly replicative, as the references they provide differ from our approach in various key aspects: The classic study by Flexman and colleagues (1974) merely differentiates between inspiration and expiration, critically without accounting for the asymmetry between the two respiratory phases. Zelano and colleagues (2016) did not investigate visual perception at all, but instead asked participants to categorise emotional face stimuli (termed ‘emotion recognition task’). Stancák and colleagues (1993) did not investigate automatic, but paced breathing, which involves continuous, conscious top-down control of one’s breathing rhythm - a demand that is not comparable to automatic, natural breathing we investigate here. The same is true for any kind of respiratory intervention or training like the ‘mindfulness-of-breathing exercise’ employed in the study by Bing-Canar and colleagues (2016). Once again, the oscillatory changes reported by the authors are not induced by automatic breathing, but instead reflect the outcome of a conscious manipulation of the breathing rhythm. In highlighting the key differences between previous studies and our approach, we do hope to have dispelled the reviewer’s initial concern regarding the novelty of our findings.

      Magnetoencephalography captures broad band neuronal activity including gamma frequencies. As the authors show (Fig. 4) and other studies have shown, the power of neuronal oscillations across multiple frequency bands is modulated by respiration phase. Gamma and beta oscillations have been implicated in sensory processing as well. Support for the author's hypothesis that the perception threshold modulation with respiration is due to alpha power modulation would be strengthened if they could show that the power of oscillations in other frequency bands are not or only weakly linked to perception accuracy.

      We thank the reviewer for their well-justified suggestion to extend the spectral scope of our analyses to include other frequency bands. In response to their comment, we have recomputed our analysis pipeline for the frequency range between 2 - 70Hz. While the whole analysis and results are described in a new Supplementary Text and Supplementary Figures (see below), we outline key findings here.

      In keeping with the structure of our main analyses, we first computed cluster-corrected whole-scalp topographies for delta, theta, alpha, beta, and gamma bands for hits vs misses over time intervals 1s prior to stimulus presentation:

      *Fig. S4 | Band-specific topographies over time. Whole-scalp topographic distribution of normalised pre- and peristimulus power differences between hits and misses, separately for each frequency band. Channels with significant differences in the respective band are marked (cluster-corrected within the respective time frame). Related to Fig. 3.*

      Compared to the clear parieto-occipital topography of prestimulus alpha modulations, delta and theta effects were prominently shifted to anterior sensors, which renders their involvement in low-level visual processing highly unlikely. No significant effects were observed in the gamma range. In contrast, beta-band modulations were closest to the alpha effects in their topography, covering parietal as well as occipital sites. Although the size of normalised effects were markedly smaller in the beta band (compared to alpha frequencies, cf. colour scaling), the topographic distribution of prestimulus modulations as well as the spectral proximity of the two bands prompted further investigation of beta involvement. To this end, we computed the instantaneous correlation between individual beta power (over the respiration cycle) and respiratory phase, analogous to our main analysis shown in Fig. 4c. Consistent with the TFR analysis shown above, no significant correlation between oscillatory power and respiration time courses were found for delta, theta, and gamma bands. For the beta band, however, we found a significant correlation during the inspiratory phase, similar to the alpha correlation described in the main text (and shown for comparison in the new Supplementary Fig. S5):

      *Fig. S5 | Instantaneous correlation of beta power and perceptual sensitivity. Group-level correlation between individual beta and PsychF threshold courses (averaged between 14 - 30 Hz) with significant phase vector (length of seven time points) marked by dark grey dots (cluster-corrected). Correlation time course of the alpha band (see Fig. 4c) shown for reference in light grey. Related to Fig. 4.*

      While both alpha and beta power were correlated to the breathing signal during the inspiratory phase, the correlation time courses suggested that there might be differential effects in both frequency bands, as indicated by the phase shift visible in Supplementary Fig S5. Therefore, we finally recomputed the LMEM visualised in Fig. 4 with an additional factor for beta power. In this extended model, significant effects were found for both alpha (t(1790) = 3.27, p < .001) and beta power (t(1790) = 4.83, p < .001). Beta showed significant interactions with the sine of the respiratory signal (t(1790) = -3.52, p < .001) as well as with alpha power (t(1790) = -4.63, p < .001). Comparing the LMEM to the previous model which only contained alpha power (along with respiratory sine and cosine) confirmed the significant contribution of beta power in explaining PsychF threshold variation by means of a theoretical likelihood ratio test (χ²(4) = 60.43, p < .001). Overall, we thus found beta power to be i) significantly modulated by respiration (see Fig 1), ii) significantly suppressed over parieto-occipital sensors for hits vs misses (see Fig. S4), and iii) significantly contribute to variations in PsychF threshold (see Fig S5). Collectively, these findings suggest differential roles of alpha and beta power, which we discuss in the main text as well as in the Supplementary Text:

      “Whole-scalp control analyses across all frequency bands demonstrated that this topographical pattern was unique to alpha and beta prestimulus power (see Supplementary Text 1 and Fig. S4).”

      “Control analyses across all frequency bands yielded a significant instantaneous correlation between PsychF threshold and beta power as well, albeit at a slightly later phase (see Fig. S5). No significant correlations were found for the remaining frequency bands.”

      “Accordingly, one recent study proposed that the alpha rhythm shapes the strength of neural stimulus representations by modulating excitability (Iemi et al., 2021). Previous work by Michalareas and colleagues (2016) as well as our own data (see Supplementary Material) point towards an interactions between alpha and beta bands, as beta oscillations have very recently been implicated in mediating top-down signals from the frontal eye field (FEF) that modulate excitability in the visual cortex during spatial attention (Veniero et al., 2021). Our findings suggest that this top-down signalling is modulated across the respiration cycle in a way that changes behavioural performance.”

      In the discussion the authors speculate that respiration locked modulation of alpha power and associated neuronal excitability could be based on the modulation of blood CO2 levels. Most recent studies of respiratory modulation of brain activity have demonstrated significant differences between nasal and oral breathing, with nasal breathing (through activation of the olfactory bulb) typically resulting in a stronger influence of respiration on neuronal activity and behavioral performance than oral breathing. The authors only tested nasal breathing. If blood CO2 fluctuations are indeed responsible for the observed effect, there should be no difference in outcome between nasal and oral breathing. Comparing the two conditions would thus provide interesting additional information about the possible underlying mechanisms.

      We appreciate the reviewer’s well-justified remarks regarding the differential effects for nasal and oral breathing and their implications on underlying mechanisms such as CO2. In revising the present as well as other manuscripts, it has become evident that fluctuations of CO2 alone (and, as we previously discussed, related changes in pH) cannot possibly explain the effects we and others are observing. Therefore, the revised manuscript no longer discusses CO2 as a potential mechanism. We have removed the corresponding paragraph and instead refer to the distinction between nasal and oral breathing to strengthen the argument for OB-induced cross-frequency coupling:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      Reviewer #3 (Public Review):

      The topic is timely, the study is well-designed, and the work has been performed in a highly competent manner. The authors relate three variables: respiration, alpha power and perceptual performance, constituting a link between somatic and neuronal physiology and cognition. A particular strength is the temporal resolution of respiration effects on cognition (continuous analysis of the respiration cycle). Furthermore, results are well contextualized by very comprehensively written introduction and discussion sections (which, nevertheless, could be slightly shortened).

      We do appreciate the reviewer’s positive evaluation of our manuscript and are thankful for their constructive remarks. We respond to their comments in detail below and have shortened the Discussion section in response to one of the reviewer’s remarks (kindly see points 1.1 and 2 below).

      I have three points of criticism, all meant in a constructive way:

      1. I wonder whether the authors could have gone one step further in the analysis of causal mechanisms, rather than correlations. The analysis of timing (Fig. 4d) and the last sentence of the abstract suggest that they imagine a causal role of respiratory feedback on cognitive performance, mediated via coordination of brain activity (in the specific case, by increasing excitability in visual areas). This could be made more explicit by appropriate experiments and data analysis:

      1.1. Manipulating the input signal: former studies suggest that nasal respiration is crucial for effects on brain oscillations and/or performance (e.g. Yanovsky et al., 2014; Zelano et al., 2016). Thus, the causal inference could be easily checked by comparing nasal versus oral respiration, without changing gas- and pH-parameters of activity of brainstem centers. >Admittedly, this experiment may add significant work to the present data which, by themselves, are already very strong.

      We thank the reviewer for their insightful comment regarding the question of causality. We acknowledge that our interpretation should have been phrased a little more cautiously. Therefore, we have rephrased corresponding paragraphs at various instances throughout the manuscript (kindly see below). Particular under current circumstances, we further appreciate the reviewer’s concern regarding the acquisition of additional data for a direct comparison of nasal vs oral breathing. Their comment is of course entirely valid and we were eager to address it, especially since it relates to CO2- and/or pH-related mechanisms of RMBOs we previously discussed. In light of the reviewer’s comments (also see their related comment #2 below) and convincing evidence from both animal and human studies that already compared nasal and oral breathing, we no longer feel that changes in CO2 provide a reasonable explanation for respiration-related oscillatory and behavioural effects we observed here. Consequently, we have removed the corresponding paragraph from the Discussion section which now reads as follows:

      “As outlined in the introduction, there is broad consensus that cross-frequency coupling (Canolty and Knight, 2010; Jensen and Colgin, 2007) plays a central role in translating respiratory to neural rhythms: Respiration entrains neural activity within the olfactory tract via mechanoreceptors, after which the phase of this infraslow rhythm is coupled to the amplitude of faster oscillations (see Fontanini and Bower, 2006; Ito et al., 2014). While this mechanism is difficult to investigate directly in humans, converging evidence for the importance of bulbar rhythms comes from animal bulbectomy studies (Ito et al., 2014) and the fact that respiration-related changes in both oscillatory power and behaviour dissipate during oral breathing (Zelano et al., 2016; Perl et al., 2019). Thus, rhythmic nasal respiration conceivably aligns rhythmic brain activity across the brain, which in turn influences behaviour. In our present paradigm, transient phases of heightened excitability would then be explained by decreased inhibitory influence on neural signalling within the visual cortex, leading to increased postsynaptic gain and higher detection rates. Given that the breathing 17 act is under voluntary control, the question then becomes to what extent respiration may be actively used to synchronise information sampling with phasic states of heightened excitability.”

      1.2. Temporal relations: The authors show that respiration-induced alpha modulation precedes behavioral modulation (Fig. 4d and related results text). Again, this finding suggests a causal influence of respiration on performance, mediated by alpha suppression (see results, lines 318-320). Could the data be directly tested for causality (e.g. by applying Granger causality, dynamic causal modelling or other methods)? If this is difficult, the question of causality should at least be discussed more explicitly.

      We appreciate the reviewer’s constructive criticism and their suggestion to employ causal analyses. While we agree that the overall pattern of results strongly suggests a causal cascade of respiration -> excitability -> perception, our interpretation with regard to a dynamic mechanism was probably overly strong. Unfortunately, it is indeed difficult to use directional analyses like Granger causality or DCM on the current data, since these methods quantify the relationship between two time series. They would not allow us to investigate the triad of respiration, alpha power, and behaviour, as we have discrete responses (i.e., single events) instead of a continuous behavioural measure. In fact, we are currently preparing a directional analysis of respiration-brain coupling (in resting-state data without a behavioural component) for an upcoming manuscript. In response to the reviewer’s remarks, we have toned down our interpretation throughout the manuscript and explicitly discuss the question of causality in the Discussion section of the revised manuscript:

      “The bootstrapping procedure yielded a confidence interval of [-33.17 -29.25] degrees for the peak effect of alpha power. While these results strongly suggest that respiration-alpha coupling temporally precedes behavioural consequences, they do not provide sufficient evidence for a strict causal interpretation (see Discussion)”

      “Rigorous future work is needed to investigate potentially causal effects of respiration-brain coupling on behaviour, e.g. by means of directed connectivity within task-related networks. A second promising line of research considers top-down respiratory modulation as a function of stimulus characteristics (such as predictability). This would grant fundamental insights into whether respiration is actively adapted to optimise sensory sampling in different contexts, as suggested by the animal literature.”

      1. At various instances, the authors suggest that respiration-induced changes in pH may be responsible for the changes in cortical excitability which, in turn, affect behavioral performance. In the discussion, they quote respective literature (lines 406-418). I glanced through the quoted papers by Feldman, Chesler, Lee, Dulla and Gourine - as far as I could see none of them suggests that the cyclic process of respiration induces significant cyclic shifts of pH in the brain parenchyma (if at all, this may occur in specialized chemosensory neurons in the brainstem). Moreover, recent real-time measurements by Zhang et al. (Chem. Sci 12:7369-7376) do also not reveal such cyclic changes in the cortex. Finally, translating oscillatory extracellular pH changes (if existent) into changes in inhibitory efficacy would require some time, potentially inducing delays and variance onto the cyclic changes at the network level. I feel that the evidence for the proposed mechanism is not sufficient, notwithstanding that it is a valid hypothesis. Please check and correct the interpretation of the cited literature if necessary.

      We acknowledge the reviewer’s caution regarding our suggestion of pH involvement, which is closely related to their previous comment (kindly see 1.1 above). As the reviewer mentions themselves, there are several studies demonstrating an absence of both neural and behavioural modulations for oral (vs nasal) breathing. These reports provide direct evidence against a mechanism driven by changes in CO2 and/or pH, which would be identical for nasal and oral breathing. Moreover, a second valid criticism is the uncertain temporal delay introduced by the (hypothetical) translation of pH changes into neural signals, which would most likely be incompatible with the ‘online’ (i.e., within-cycle) effects we report here. Therefore, as outlined in our response above, we have removed the pH-related suggestions from the Discussion section.

      1. Finally, some illustrations should be presented in a clearer way for those not familiar with the specifics of MEG analysis.

      We appreciate the reviewer’s suggestions regarding the clarity of our manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The model proposed here is the first large-scale model that actually performs a cognitive task, which in this case is working memory but could easily extend to decision making in general as is acknowledged by the authors. Briefly, each of the 30 areas are simulated as a rate, Wong-Wang circuit (i.e. two excitatory pools inhibit each other through a third, inhibitory population). The authors use previously collected anatomical data to constrain the model and show qualitatively match with the data, in particular how mnemonic activity emerges somewhat abruptly along the brain hierarchy.

      Strengths Previous models have focused on neural dynamics during the so-called "resting state", in which subjects are not performing any cognitive task - thus, resting. This study is therefore an important improvement in the field of large-scale modelling and will certainly become an influential reference for future modelling efforts. As typically done in large-scale modelling, some anatomical data is used to constrain the model. The model shows several interesting characteristics, in particular how distributed working memory is more resilient to distractors and how the global attractors can be turned off by inhibition of only top areas.

      Weaknesses Some of these results are not clear how they emerge, and some "biological constraints" do not seem to constrain. Moreover, some claims are slightly exaggerated, in particular how the model matches the data in the literature (which in some cases it does not) or how somatosensory working memory can be simulated by simply stimulating the "somatosensory cortex".

      This paper has two different models, one being a simplified version of the main model. However, it is not very clear what the simplified model adds the main findings, if not to show that the empirical anatomical connectivity does not constrain the full model.

      We thank the reviewer for this evaluation, and for appreciating the innovative character of our study in implementing a cognitive function in a data-constrained large-scale brain model. We hope that it will be useful for future studies planning to add cognitive functions to their large-scale models, and also for experimentalists who might benefit from this insight.

      In response to the detailed comments of the reviewer, and to address the weaknesses identified above, we have rewritten parts of the text, clarified important concepts and included a new simulations. Briefly:

      -We have clarified the nature and effects of the ‘biological constraints’ that we use. The full model that we use is indeed data-constrained, in the sense that we use real data to determine the values of many parameters. Having a data-constrained model, however, does not mean that all the results will be equally constrained. Some model results will critically depend on (some) data used to constrain the model, while other results will be more robust to changes in these parameters. We have highlighted this point and we also added explanations for each of the results presented.

      -We have corrected several claims along the text to make it more in line with experimental evidence, and included the new references suggested by the reviewer to this effect. For example, for the case of somatosensory WM mentioned by the reviewer, we have indicated that the existence of a ‘gating’ mechanism (explored in a supplementary figure) is important for achieving an accurate match with the experimentally observed effects of somatosensory stimulation.

      -Finally, we have highlighted the complementary benefits of the full and simplified models, and improved our motivation for the latter. Briefly, the simplified model allows us to identify the key ingredients needed for distributed WM (useful to generalize to other animal models), while the full model ensures that the main findings are still present when more realistic assumptions are made. A good example is the counterstream inhibitory bias, which is in principle not necessary for a simplified model but becomes a crucial factor to implement the distributed WM mechanism in our macaque model.

      Reviewer #2 (Public Review):

      There is a lot to like about this manuscript. It provides a large-scale model of a well-known phenomenon, the "delay activity" underlying working memory, our oldest and most enduring model of a cognitive function. The authors correctly state that despite the ubiquity of delay activity, there is little known about the macro and micro circuitry that produces it. The authors offer a computational model with testable hypotheses that is rooted in biology. I think this will be of interest to a wide variety of researchers just as delay activity is studied across a variety of animal models, brain systems, and behavior. It is also well-written.

      My main concern is the authors may be self-handicapping the impact of their model by not taking into account newer observations about delay activity. For a number of years now, evidence has been building that working memory is more complicated than "persistent activity" alone. Stokes, Pasternak, Dehaene, Miller and others have been mounting considerable evidence for more complex dynamics and for "activity-silent" mechanisms where memories are briefly held in latent (non-active) forms between bouts of spiking. There is also mounting evidence that the thalamus plays a key role in working memory (and attention). In particular, higher thalamic nuclei are critical for regulating cortical feedback. Cortical feedback plays a central role in the model presented here. The model presented in this manuscript just deals with persistent attractor states and the cortex alone.

      This is not to say that this manuscript does not have good value as is. No one disputes that some form of elevated, sustained, activity underlies working memory. This work adds insights into how that activity gets sustained and the role of, and interactions between, different cortical areas. The observation that the prefrontal and parietal cortex are more critical than other areas, that there are "hidden" attractor states, and "counterstream inhibitory bias" are important insights (and, importantly, testable). They will likely remain relevant even as the field is moving beyond persistent attractor states alone as the model for working memory. The new developments do not argue against the importance of delay activity in working memory. They show that it is more to the story, as inevitably happens in brain science.

      The authors do include a paragraph in the Discussion referencing the newer developments. Kudos to them for that. However, it presented as "new stuff to address in the future". Well, that future is now. These "newer" developments have been mounting over the past 10 years. The worry here is that by relying so heavily on the older persistent attractor dynamics model and presenting it as the only model, the authors are putting an early expiration date on their work, at least in terms of how it will be received and disseminated.

      We thank the reviewer for a careful and positive evaluation of our work. We consider that the main point raised here is indeed crucial: classical explanations of WM based on elevated and constant firing are an important part of the story, however other alternative or complementary approaches developed in the past years also deserve attention. These approaches include, to name a few, activitysilent mechanisms (Mongillo et al. 2008, Trübutschek et al. 2017), dynamic hidden states (Wolff et al. 2017), persistent activity without feedback (Goldman 2009), and paradigms relying on gamma bursts (Miller et al. 2018).

      It’s important to highlight, however, that our approach is “attractor network theory” not “persistent activity theory”, and an attractor does not have to be a steady state (tonic firing) but may display complex spatiotemporal patterns (fluid turbulence with tremendously rich temporal dynamics and eddies on many spatial scales is an attractor). We now have largely eliminated the use of “persistent” in the manuscript. On the other hand, for lack of a better word it’s fine to still use that term, if it is understood in a more general sense, which also includes stable representations in which the activity of individual neurons varies along the delay period (Goldman, 2009; Murray et al. 2017) or rhythmic activity which persists over time (Miller et al. 2018). The attractor network theory should be contrasted conceptually with mechanisms based on intrinsically transient memory traces (see Wang TINS 2021 for a more elaborated discussion on this).

      Our proposal for distributed WM has a general aim and it’s not restricted to the classical ‘elevated constant firing’ scenario. Following the reviewer’s suggestion, we have rewritten the text to make sure that multiple mechanisms of WM are acknowledged in different parts of the text, not only on a paragraph in the discussion. We have also acknowledged the importance of thalamocortical interactions and cited previous relevant studies in this sense (such as Guo et al. 2017), also as a response to comments from Reviewer 1.

      In addition, we have attempted to go beyond a simple rewriting and, using a variation of our simplified model, we now show that distributed WM representations can also happen in the context of activitysilent models (Figure 3 –figure supplement 1). In particular, we use a simplified network model with reduced local and long-range connectivity strength and incorporate short-term synaptic facilitation in synaptic projections. Our model results show that, while activity-silent memory traces can’t be maintained when areas are isolated from each other, inter-areal projections reinforce the synaptic efficacy levels and lead to a distributed representation via activity-silent mechanisms.

      We hope that this result serves to prove the generality of our distributed WM framework, and opens the door to subsequent studies focusing not only on distributed activity-silent mechanisms, but in distributed frameworks relying on other WM mechanisms as well.