10,000 Matching Annotations
  1. Nov 2025
    1. Author Response

      Reviewer #1 (Public Review):

      The idea that because the hippocampal code generates responses that match the most needed variable for each task (time or distance) makes it a predictive code is not fully proved with the analyses provided in the manuscript. For example, in the elapsed time task, there are also place cells and in the fixed-distance travel there are also cells that encode other features. This, rather than a predictive code, can be a regular sample of the environment with an overrepresentation of the more salient variable that animals need to get in order to collect rewards.

      We concur with the Reviewer’s reservation. Claims about predictive coding were removed and the following possible account explanation for over-representation was suggested instead:

      "These results underscore the flexible coding capabilities of the hippocampus, which are shaped by over-representation of salient variables associated with reward conditions. " (page 1 line 23, page 4 line 27)

      In addition, the analysis provided in the manuscript are rather simple, and better controls could be provided. Improving the analytical quantification of the results is necessary to support the main claim.

      We improved the quantification, as suggested below by specific comments of the reviewer.

      What is the relationship of each type of cell with the speed of the animal?

      The cells were assigned to the different types according to their responses while running across all speeds. However, we checked how the speed of the animal affects the peak firing rate of the cells, for each type of cell. Results of this analysis are presented in Author response image 1. Bars represent maximum firing rate of all cells of a given type across runs with the specified speed range (𝒎𝒆𝒂𝒏 ± 𝑺𝑬𝑴).

      Author response image 1.

      We did not find a significant interaction effect of the speed and the cell-type over the max firing rate (2-way Anova p>0.98).

      What is the relationship with the n of trial that the animal has run (first 10 trials, last 10 trials..)?

      Some of the animals were subjected to only one type of session. Moreover, they were sometimes trained without recording. Therefore, to answer this question we restricted our analysis to recording sessions where the animal switched from fixed-time to fixed-distance or vice versa. We checked the 20 first runs vs. the last 20 runs (data from 10 runs is not powerful enough for analysis) in See the results in Author response table 1.

      Author response table 1.

      To assess the dynamics of the coding flexibility, we defined the Time-Distance index (TDI), quantifying the balance between the proportion of distance cells and of time cells at a given time. as (NDistanceCells/NTimeCells)/(NDistanceCells+NTimeCells). The is in the range of [0 ,1] if the majority of cells are classified as distance cells, and in the range of [-1, 0] if the majority of cells are classified as time cells. Chi-square testing for differences in proportions did not reveal significant differences (after correction for multiple comparisons).

      The shaded boxes in Author response table 1 indicate the sessions which followed a transition between session types

      What is the average firing rate of each neuron?

      This information was now added to the titles of the panels in Figure 2 and Figure 2-figure supplement 1.

      Is there any relationship between intrinsic firing rate and the type of coding that the cell develops in each task?

      In Author response image 2 is a comparison of the firing rates of the Time cells vs the Distance cells.

      The distributions are similar (p = 0.975 ,and p = 0.675 for peak firing rate and mean firing rate, respectively, Kolmogorov-Smirnov (KS) test).

      Author response image 2.

      This figure was added to the supplementary figures (figure 3 - figure supplement 3)

      What is the relation of the units of each type with LFP features (theta phase, ripple recruitment)?

      We had LFP recordings for 15 out of 18 sessions. A large proportion of the cells showed phase precession (see Author response table 2). An example is shown in Author response image 3. We could not find a significant relation between phase precession and the cell type or the trial type.

      The table on the left shows the total cells analyzed, and on the right we show the percentage of cells that had a significant linear fit of the theta phase within 80% of the field width, when analyzed per time (topright) or per distance (bottom-right). FDist/Ftime are Fixed-distance and fixed-time trials and Dist/Time are the cell type.

      We did not identify ripple events during treadmill runs.

      Author response table 2

      Author response image 3

      Reviewer #3 (Public Review):

      Weaknesses:

      The original study of Kraus et al. consisted of 3 rats for which all sessions, including both training and recording, were of one type. Another 3 rats had a hybrid mixture of distance and time sessions. This is mentioned very briefly in the main text.

      It would appear that the theory of reward might lead to different predictions that could be verified by comparing these animals session to session at a finer grain. For example, are there examples of cells switching or transforming their “predictive” representations when a large number of trials in on session type is followed by a large number of trials of the opposite type?

      For another example, the transition from training to recording could give similar opportunities. It seems at least possible that ignoring these issues could cause a loss of power.

      We could not compare a particular cell for switching between encodings since the different types of trial were performed on different days. As an alternative, we compared the populations of cells within the first 20 vs. last 20 trials in recording sessions where the animal switched from fixed-time to fixed-distance or vice versa (see table below). The “Time-Distance balance index” (TDI) is defined as (#DistanceCells#TimeCells)/(#DistanceCells+#TimeCells) and is ranges between 0 and 1 if the majority of cells are classified as distance cells while between -1 to 0 if the majority of cells are classified as time cells.

      In all three animals there seems to be a change between the first 20 runs and last 20 runs of the same session, following a switch between trial types. However, this change is significant and with the expected trend only in one of the animals (BK49, p=0.02, chi-square test).

      The grayed boxes in Author response table 1 indicate the sessions which followed a transition between session types

      Some circularities in the construction and interpretation of the time-cell and distance-cell classifiers are not clearly addressed. The classifiers currently appear to be fit to predict the type of session a cell’s response patterns are observed within. But it is tautological to use the session type to define the cell type. I sense this is ultimately reasonable because of how the classifier is built, but this concern is not addressed or explained.

      We regret that the term ‘classifiers’ was not sufficiently precise. We used this term to describe the metrics designed to express the relation between the firing-time and the velocity, in order to classify cells, rather than classifiers that are fit to predict the type of session. We believe this to be the source of the apparent circularity. To circumvent this confusion, we now replaced all places where the term “classifier” was mentioned, with the term “metric”

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, May et al use H2B overexpression driven by Keratin14 Cre-mediated excision of a loxPstop cassette to quantify bulk chromatin dynamics in the live epidermis. They observe heterogeneity of H2B distribution within the basal stem cell layer and a change in distribution when the stem cells delaminate into the suprabasal layers. They further show that these chromatin rearrangements precede cell fate commitment, as detected by adding another Cre-mediated transgene on top (tetO-Cre mediated Keratin10 reporter). Finally, they generate an MST stem-loop transgene for the keratin 10 transcript and observe transcriptional bursting.

      We would like to clarify for the reviewer that the H2B system used is a transgenic allele of histone-2B-GFP that is driven directly by the Keratin-14 promoter (Kanda et al., 1998; Tumbar et al., 2004). This system does not rely on any Cre-mediated excision of the LoxP-stop cassette, and these mice do not carry Cre alleles. We will touch on this point below when addressing the comment on Cre expression in cells and the raised question on whether it influences the quantifications of chromatin compaction.

      The manuscript uses elegant in vivo imaging approaches to describe a set of observations that are logically based on a panel of studies that have used genetic approaches to dissect the role of heterochromatin and histone/DNA modifications in epidermal state transitions. In addition, the MST stem-loop analysis is a nice technical advance, confirming transcriptional bursting as a general phenomenon of how transcription is regulated in cells (see work from Daniel Larsson, Jonathan Chubb, Arjun Raj, and others).

      We thank the reviewer for their recognition of our contribution to the transcription field. To deepen the connection between our data and previous characterizations of transcriptional dynamics in other systems, we have added new analyses of K10MS2 transcriptional bursting on a finer temporal scale (Fig 5G-K). We find pervasive “transcriptional bursting,” consistent with findings in vitro and in other model organisms, and a surprising variation of burst durations. We believe these additional analyses significantly strengthen our conclusions and the relevance of our study to the overall transcription field.

      The value of the study in my view is recapitulating these known phenomena in a live tissue setting with high-quality imaging and careful quantification. Overall, the analyses appear thorough, although the overall changes appear relatively minor, which is perhaps to be expected from imaging bulk H2B distribution as a proxy for chromatin states.

      There is one major technical concern that might impact the interpretation of the data. The authors combine Cre lines for their key conclusions (Krt10 reporter and SRF KO) and analyze single cells that thus express very high levels of Cre. Knowing that Cre will target non-loxP sites and is genotoxic, it is possible that the effect of chromatin is due to high levels of Cre expression in single cells rather than specific effects due to cell state transitions. I would encourage the authors to carefully quantify the dose-dependent effects of the Cre protein (independent of the LoxP sites) on chromatin organization. Along these lines, is the phenotype of the SRF KO similar in the presence of two Cre alleles versus just one?

      Thank you for these kind words. This is an important potential caveat to consider. We believe that Cre activity does not significantly affect the chromatin compaction profiles for several reasons. First, we interrogated Cre activity. The quantifications in Figure 1A-E and Figure 2B-C are from mice containing K14H2B-GFP allele alone and do not carry any Cre allele. When these data were compared to those from mice that had been treated with a high dose of tamoxifen to induce Cre-mediated recombination in the vast majority of cells, the chromatin compaction profiles were not significantly different (Supp Fig 3C). We have added this comparison to Supplemental Figure 3 and addressed this point in the text (page 9). To further determine whether Cremediated recombination affects our measurement of chromatin compaction, we also analyzed adjacent basal cells with and without Cre activity in the same animal. K14H2BGFP; K14CreER; tdTomato mice were induced with a low dose of tamoxifen such that roughly 65% of epidermal cells underwent Cre recombination as demonstrated by expression of the tdTomato fluorescent reporter (Gallini et al., 2022). They also received a punch biopsy performed on the unimaged ear. Three days post injury and six days after Cre induction, the chromatin compaction profiles of cells positive and negative for Cre-mediated recombination were also not significantly different (Rebuttal Figure 1). Together, these direct comparisons between cells exposed to Cre activity and cells not exposed to Cre activity indicate that Cre activity at levels comparable to those used in our experiments has no measurable effect on our measurements of chromatin compaction.

      Rebuttal Figure 1: Effect of Cre expression on chromatin compaction profiles

      The second issue is the conclusion of "chromatin spinning". Concluding that chromatin is spinning would in my view require that the authors demonstrate that the nuclear envelope is not moving or is moving less than the chromatin. To support this conclusion the authors should do double imaging for example with LINC complex proteins, an ER/outer nuclear membrane marker, or equivalent.

      This is an excellent point. While we expect that the entire nucleus is spinning based on observations others have made in in vitro fibroblasts systems, we describe our observation as “chromatin spinning” instead of “nuclear spinning” because the K14H2B-GFP allele only allows us to directly visualize chromatin itself (Kumar et al., 2014; Zhu et al., 2018).

      Unfortunately, LINC complex proteins and nuclear membrane proteins have not been fluorescently tagged in mice, which prevents us from visualizing their dynamics in vivo. To establish these new tools and perform experiments would take more than a year, making it therefore beyond the scope of this current paper. Additionally, their relatively uniform distribution across the nuclear membrane would not allow us to visualize potential spinning of these components. We have made efforts towards the reviewer’s question by asking whether other compartments within the cell also spin in delaminating cells. To do this, we leveraged a mouse line developed by Claudio Franco’s lab (Barbacena et al., 2019), which fluorescently labels both the chromatin (H2B-GFP) and the Golgi (GTS-mCherry). As expected, this model showed a perinuclear and polarized Golgi in skin fibroblasts (Rebuttal Figure 2). However, this tool is incompatible with our questions in epidermal cells for a few reasons. First, the system is toxic to epithelial cells in vivo, resulting in apoptosis, nuclear fragmentation, and binucleate cells. Second, the Golgi is not discretely polarized (or even perinuclear) in epithelial cells (Rebuttal Figure 2). As such, although we observe chromatin spinning in delaminating basal cells, we are uncertain as to whether the whole nucleus or any other cellular compartments are spinning in these cells.

      Rebuttal Figure 2: Interrogation of intracellular spinning

      Given the above reasoning and efforts, we have altered the text and specified that we only have the capacity to visualize chromatin through the H2B-GFP allele and that we hypothesize the entire nucleus is spinning (page 11).

      Reviewer #2 (Public Review):

      In this work entitled "Live imaging reveals chromatin compaction transitions and dynamic transcriptional bursting during stem cell differentiation in vivo" the authors use a combination of genetic and imaging tools to characterize dynamic changes in chromatin compaction of cells undergoing epidermal stem cell differentiation and to relate chromatin compaction to transcriptional regulation in vivo. They track this phenomenon by imaging the epithelium at the ear of live mice, thus in a physiological context. By following individual nuclei expressing H2B-GFP along time ranges of hours and up to 3 days, they develop a strategy to quantify the profile of chromatin compaction across different epidermal layers based on normalized intensity profiles of H2B-GFP. They observe that cells belonging to the basal stem cell layer display a considerable level of internuclear variability in chromatin compaction that is cell-cycle independent. Instead, intercellular variability in chromatin compaction appears more related to the differentiation status of the cells as it is stable in the hours range but dynamic in the days range. The authors show that differentiated nuclei in the spinous layer exhibit higher chromatin compaction. They also identified a subset of cells in the basal stem layer with an intermediate profile of chromatin compaction and with the dynamic expression of the early differentiation marker keratin 10. Lastly, they show that the expression of keratin-10 precedes the chromatin compaction establishing relevant temporal relationships in the process of epidermal differentiation.

      This work includes a number of challenging approaches and techniques since it is carried out in living mice. Also, it provides nice tools and methods to study chromatin structure in vivo during multiple days and within a differentiation physiological system. On the other hand, the results are descriptive and, in some respect, expected in line with previous observations.

      Thank you very much for this great summary, kind words, and the recommendations listed below. We will address each of them specifically. We have also deepened the analysis of transcriptional dynamics in ways that are more comparable with how other groups have studied transcription and included those results in Figure 5.

      References

      Kanda, T., Sullivan, K.F., and Wahl, G.M. (1998). Histone–GFP fusion protein enables sensitive analysis of chromosome dynamics in living mammalian cells. Current Biology 8, 377–385. 10.1016/S09609822(98)70156-3.

      Tumbar, T., Guasch, G., Greco, V., Blanpain, C., Lowry, W.E., Rendl, M., and Fuchs, E. (2004). Defining the epithelial stem cell niche in skin. Science 303, 359–363. 10.1126/science.1092436.

      Kumar, A., Maitra, A., Sumit, M., Ramaswamy, S., and Shivashankar, G.V. (2014). Actomyosin contractility rotates the cell nucleus. Sci Rep 4, 3781. 10.1038/srep03781.

      Zhu, R., Liu, C., and Gundersen, G.G. (2018). Nuclear positioning in migrating fibroblasts. Seminars in Cell & Developmental Biology 82, 41–50. 10.1016/j.semcdb.2017.11.006.

      Sara Gallini, Nur-Taz Rahman, Karl Annusver, David G. Gonzalez, Sangwon Yun, Catherine Matte-Martone, Tianchi Xin, Elizabeth Lathrop, Kathleen C. Suozzi, Maria Kasper, Valentina Greco . Injury suppresses Ras cell competitive advantage through enhanced wild-type cell proliferation.<br /> bioRxiv 2022.01.05.475078; doi: https://doi.org/10.1101/2022.01.05.475078

      Pedro Barbacena, Marie Ouarné, Jody J Haigh, Francisca F Vasconcelos, Anna Pezzarossa, Claudio A Franco. GNrep mouse: A reporter mouse for front-rear cell polarity. Genesis 2019 Jun. DOI: 10.1002/dvg.23299

      Cristiana M Pineda, Sangbum Park, Kailin R Mesa, Markus Wolfel, David G Gonzalez, Ann M Haberman, Panteleimon Rompolas, Valentina Greco. Intravital imaging of hair follicle regeneration in the mouse. Nature Protocols 2015 July. DOI: 10.1038/nprot.2015.070

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes a series of behavioral experiments in which foraging rats are subjected to a novel fear conditioning paradigm. Different groups of animals receive a shock to the dorsal surface of the body paired with either tone, an artificial owl driven forward with pneumatic pressure, or a tone/owl combination. An additional control condition pairs tone with owl alone (ie no shock is delivered). In a subsequent test, only owl+shock and tone/owl+shock animals show increased latency to forage and a withdrawal response to tone (even though owl-shock rats do not experience tone during conditioning). The authors conclude that this tone response is due to sensitization and that fear conditioning does not occur in their experimental setup.

      This approach is intriguing and the issues raised by the manuscript are extremely important for the field to consider. However, there are many ways to interpret the results as they stand. One issue of primary importance is whether it can indeed be claimed that conditioning did not readily occur in the tone+shock group. The lack of a particular behavioral conditioned reaction does not equate to an absence of conditioning. It is possible that unseen (i.e. physiological) measures of conditioning, many of which were once standard DVs in the fear conditioning literature, are present in the tone+shock group. This possibility pushes against the claim made in the title and elsewhere. These claims should be softened.

      We agree with the reviewer and now acknowledge the following caveat in the discussion (pg. 10): “…although neither the tone-shock group nor the tone-owl group showed overt manifestations of fear conditioning (as measured by fleeing or freezing) to the tone that prevented a successful procurement of food, the possibility of physiological (e.g., cardiovascular, respiratory) changes associated with tone-induced fear (Steimer, 2002) cannot be excluded in these animals…”

      Because systemic, group-level retreat CRs are not noted in the tone+shock condition, it would indeed be important to establish if there are any experimental circumstances in which tone paired with a US applied to the dorsal surface of the body can produce consistent reactions (e.g. freezing) to tone alone. Though it may seem likely that tone + dorsal shock would indeed produce freezing in a different setting, this result should not be taken for granted - we've known since the 'noisy water' experiment (Garcia & Koelling, 1966) that not every CS pairs with every US and that association can indeed be selective. A positive control would be clarifying. If the authors could demonstrate that tone+dorsal shock produces freezing to tone in a commonly used fear conditioning setup (ie standard cubicle chamber) then the lack of a retreat CR in their naturalistic paradigm would gain added meaning.

      This is an excellent suggestion. As recommended, we performed a positive control experiment where naïve rats that underwent the same subcutaneous wire implant surgery were placed in a standard experimental chamber and presented with a delayed tone-shock pairing (same tone frequency/intensity and shock intensity/duration; the 24.1 s CS duration was based on the mean CS duration of tone-shock animals in the naturalistic fear conditioning experiment). As can be seen in Author response image 1 (Figure 4 in the revised manuscript) below, these animals exhibited reliable postshock freezing in a conditioning chamber (fear conditioning day 1) and tone CS-evoked freezing in a novel chamber (tone testing day 2), indicating that our original finding (i.e., no evidence of auditory and contextual fear conditioning in an ecologically-relevant environment) is unlikely due to a dorsal neck/body shock US per se.

      Author response image 1. Auditory fear conditioning in a standard experimental chamber. (A) Illustrations of a rat implanted with wires subcutaneously in the dorsal neck/body region undergoing successive days of habituation (10 min tethered, conditioning chamber), training (a single tone CS-shock US pairing), and tone testing (context shift). (B) Mean (crimson line) and individual (gray lines) percent freezing data from 8 rats (4 females, 4 males) during training in context A: 3 min baseline (BL1, BL2, BL3); 23.1 s epoch of tone (T) excluding 1 s overlap with shock (S); 1 min postshock (PS). (C) Mean and individual percent freezing data during tone testing in context B: 1 min baseline (BL1); 3 min tone (T1, T2, T3); 1 min post-tone (PT). (D) Mean + SEM (bar) and individual (dots) percent freezing to tone CS before (Train, T) and after (Test, T1) undergoing auditory fear conditioning (paired t-test; t(7) = -3.163, p = 0.016). * p < 0.05

      The altered withdrawal trajectory seen in owl+shock and tone/owl+shock groups occurs in neither the tone+shock nor the tone+owl group, introducing the possibility that it results from the specific pairing of owl and shock. Put differently - this response may indeed by an associative CR. Do altered withdrawal angles persist if animals that receive owl+shock are exposed to owl again the next day? Do manipulations of the owl and shock that diminish fear conditioning (e.g. unpairing of owl and shock stimuli) eliminate deflected withdrawal angles when the subject is exposed to owl alone? If so, it would cut against the interpretation that fear conditioning does not occur in the setup described here, and would instead demonstrate that it is indeed central to predatory defense. This interpretation is compatible with the effect of hippocampal lesion on freezing evoked by a live predator. Destruction of the rat hippocampus diminishes cat-evoked freezing - this is thought to occur because the rapid association of the cat's various features with threatening action is not formed by the rat (Fanselow, 2000, 2018). Even though this interpretation of the results differs from the authors', it in no way diminishes the interest of this work. This paradigm may indeed be a novel means by which to study rapidly acquired associations with ethological relevance. Follow-up experiments of the type described above are necessary to disambiguate opposing views of the current dataset.

      Whether “altered withdrawal angles persist if animals that receive owl+shock [a US-US pairing] are exposed to owl again the next day” is an interesting question, as it is conceivable that the owl US (Zambetti et al., 2019, iScience) can function as a CS to evoke anticipatory characteristic of the conditioned fear. This possibility is now mentioned as a caveat (pg. 10): “…the erratic escape trajectory behavior exhibited by owl-shock and tone/owl-shock animals may be indicative of rapid associative processes at work (Fanselow 2018). For example, the immediate-shock (and delayed shock-context shift) deficit in freezing (e.g., Fanselow 1986; Landeira-Fernandez et al., 2006) provides compelling evidence that postshock freezing is not a UR but rather a CR to the contextual representation CS that rapidly became associated with the footshock US. In a similar vein then the erratic escape CR topography in owl-shock and tone/owl-shock animals might represent a shift in ‘functional CR topography’ (Fanselow & Wassum 2016) resulting from the rapid association between some salient features of the owl and the dorsal neck/body shock. A rapid owl-shock association nevertheless cannot explain the owl-shock animals’ subsequent fleeing behavior to a novel tone (in the absence of owl), which likely reflects nonassociative fear.”

      Reviewer #2 (Public Review):

      This work is dealing with an interesting question whether a simple, one trial CS+US (Pavlovian) association occurs in a naturalistic environment. Pavlovian fear conditioning contains a repetition of a neutral sensory signal (tone, CS) which is paired with a mild US, usually foot-shock (<1 mA; thus, unpleasant rather than painful) and the CS+US association drives associative learning. In this paper, a single 2.5 mA electrical shock was paired with a novel 80 dB tone to monitor the occurrence of learning via measuring success rate and latency of foraging for food. Some animals experienced an owl-looming matched with the US, just before reaching the food. The authors placed hunger-motivated rats into a custom-built arena equipped with safe nest, gate, food zone as well as with a delivery of a self-controlled US (electrical shock in the neck muscle and/or owl-looming). The US was activated by the rats by approaching to the food. Thus, a conflicting situation was provoked where procuring the food is paired with an aversive conditioned signal. Four groups of rats were included in the experiments based on their conditioning types: tone+ shock, tone+ shock+ owl, shock+owl and tone+owl. Due to these conditioning procedures, none of the rat procured the food but fled to the nest. In contrast, in the retrieval phases (next two days), the tone-shock and tone-owl groups successfully procured the pellets but not the tone-shock-owl group during the conditioned tone presentation. Rats in the latter group fled to the nest upon tone presentation at the food zone. As the shock-owl animals (conditioned without tone) also fled to the nest triggered by (unfamiliar) tone presentation, their and the tone+shock+owl group's fled responses were assigned to be non-associative sensitization-like process. Furthermore, during the pre-tone trials, all groups showed similar behavior as in the tone test. These findings led the authors to conclude that classical Pavlovian fear conditioning may not present in an ecologically relevant environment.

      The raised question is relevant for broad audience of neuroscience and behavioral scientist. However, as the used fear conditioning paradigm is not a common one, it is difficult to interpret the finding. It is based on a single pairing of an unfamiliar, salient tone with a very strong (traumatizing?) electrical shock, delivered directly into the neck muscle and an innate signal (owl looming). In addition, as the tone presentation was followed by many events (gate opening, presence of food, shock and/or owl-looming) in front of the animals, it is hard to image what sort of tone association could be formed at all.

      We thank the reviewer for mentioning several important considerations. In regards to the shock amplitude used here, fear conditioning studies in rats have employed a wide range of numbers, durations and intensities of footshock; e.g., three footshocks: 1.0 mA/0.75-s and 4.0 mA/3-s (Fanselow 1984), 75 footshocks: 1 mA/2-s (Maren 1999; Zimmerman et al. 2007). Note also that 16-20 periorbital shocks (2.0 mA, 8 pulse train at 5 Hz) have been used in auditory fear conditioning in rats (Moita et al. 2003; Blair et al. 2005). Thus, it is unlikely that a single 2.5 mA dorsal neck/body shock (subcutaneous and not in the neck muscle) used in the present study is particularly traumatizing compared to higher intensity/longer duration (e.g., 4.0 mA/3-s) and far more numerous (e.g., 75) footshocks employed in fear conditioning studies.

      The relationship between footshock intensity and fear conditioning also warrants further discussion. Sigmundi, Bouton, and Bolles (1980) examined conditioned freezing in rats to 15 footshocks of 0.5, 1.0 and 2.0 mA intensities (0.5-s duration) and found that “[tone] CS-evoked freezing increased with US intensity.” In contrast, Fanselow (1984) observed relatively higher contextual freezing in rats subjected to three bouts of 1.0 mA/0.75-s than 4 mA/3-s footshocks. Irrespective, the animals that received three 4 mA/3-s footshocks still exhibited robust freezing. Based on the positive control experimental results (see above), it is unlikely that the present study’s failure to observe conditioned fear is due to the use of 2.5 mA shock intensity.

      As the animals in the present study underwent 5 baseline days of foraging (3 trials per day), they would have been habituated to the computer-controlled automated gate opening-closing and the presence of food by the time of tone-shock, tone-owl, owl-shock and tone-owl/shock events, making it unlikely that the tone would associate with the gate/food stimuli. In the employed delay conditioning configuration, the tone CS has greater temporal contiguity with the US (shock and/or owl) and the US is both novel and surprising relative to the other stimuli in the arena environment. Thus, it is more plausible that the tone CS would be associated with the intended US. In summary, we believe that if fear conditioning necessitates relatively sterile environmental settings in order to transpire, then fear conditioning would be implausible in the natural world filled with dynamic, complex stimuli.

      One could also argue that if a hungry animal does not try to collect food after an unpleasant, even a painful experience, then, it normally dies soon (thus, that is not a 'natural' behavior). The tone+shock and tone+owl groups showed similar behavioral features throughout the entire experiments and may reconcile the natural events: although these rats had had negative experience before, were still approaching to food zone due their hunger. Because of their motivation for food, the authors concluded that no association was formed. Based on this single measure, is it right to do so?

      In nature, prey animals adjust their foraging behavior to minimize danger (e.g., Stephens and Krebs 1986 Foraging Theory; Lima and Dill 1990 Can J Zool); thus, it is improbable that an aversive experience will lead to end of food seeking behavior leading to death. Indeed, Choi and Kim (2010 Proc Natl Acad Sci) employed a similar seminaturalistic environment (as the present study) and found that rats adjust their foraging behavior as a function of the predatory threat distance, consistent with the “predatory imminence” model (Fanselow and Lester 1988). Since only behavioral measures of fear were assessed (i.e., fleeing, latency to enter forage zone, pellet procurement), we now acknowledge a caveat in the discussion (see response to Reviewer 1’s comment 1). Note, however, that unlike the tone-shock paired animals that failed to flee to the tone CS and successfully procured the food pellet, the owl-shock animals exhibited robust fear behavior (promptly fled, ceasing foraging) to a novel tone.

      Reviewer #3 (Public Review):

      In this study, the authors aimed to test whether rats could be fear conditioned by pairing a subdermal electric shock to a tone, an owl-like approaching stimulus, or a combination of these in a naturalistic-like environment. The authors designed a task in which rats foraging for food were exposed to a tone paired to a shock, an owl-like stimulus, a combination of the owl and the shock, or paired the owl to a shock in a single trial. The authors indexed behaviors related to food approach after conditioning. The authors found that animals exposed to the owl-shock or the tone/owl-shock pairing displayed a higher latency to approach the food reward compared to animals that were presented with the tone-shock or the tone-owl pairing. These results suggest that pairing the owl with the shock was sufficient to induce inhibitory avoidance, whereas a single pairing of the tone-shock or the tone-owl was not. The authors concluded that standard fear conditioning does not readily occur in a naturalistic-like environment and that the inhibitory avoidance induced by the owl-shock pairing could be the result of increased sensitization rather than a fear association.

      Strengths:

      The manuscript is well-written, the behavioral assay is innovative, and the results are interesting. The inclusion of both males and females, and the behavioral sex comparison was commendable. The findings are timely and would be highly relevant to the field.

      Weaknesses:

      However, in its current state, this study does not provide convincing evidence to support their main claim that Pavlovian fear conditioning does not readily occur in naturalistic environments. The innovative task presented in this study is more akin to an inhibitory avoidance task rather than fear conditioning and should be reframed in such way.

      The reviewer’s comment is theoretically important in translating laboratory studies of fear to real world situations. Because our animals were engaged in a purposive/goal-oriented foraging behavior, that is, the leaving of nest in search of food in an open space brought about tone-shock, tone-owl, owl-shock and tone/owl shock outcomes, one can make the case that this is in principle an inhibitory avoidance (instrumental fear conditioning) task rather than a Pavlovian fear conditioning task. A pertinent question then is whether procedurally ‘pure’ laboratory Pavlovian conditioning tasks (i.e., displacing animals from their home cage to an experimental chamber and presenting CS and US) are possible in real world settings where behaviors of animals and humans are largely purposive/goal-oriented (Tolman 1948 Psychol Rev). It is generally accepted that “Outside the laboratory, stimulus [Pavlovian] learning and response [Instrumental] learning are almost inseparable (Bouton 2007 Learning and Behavior, pg. 28).” The goal of our study was to investigate whether widely-employed auditory fear conditioning readily produces associative fear memory that guides future behavior in animals performing naturalistic foraging behavior, and insofar as presenting a salient tone CS followed by an aversive shock US, the present study has a Pavlovian fear component.

      We thank the reviewer for raising this concern and have addressed the Pavlovian vs. Instrumental fear conditioning aspects of our study in the revised manuscript (pg. 10): “…there are obvious procedural differences between standard fear conditioning versus naturalistic fear conditioning. In the former paradigm, typically ad libitum fed animals are placed in an experimental chamber for a fixed time before receiving a CS-US pairing (irrespective of their ongoing behavior). Thus, the CS duration and ISI are constant across subjects. In our study, hunger-motivated rats searching for food must navigate to a fixed location in a large arena before experiencing a CS-US pairing (instrumental- or response-contingent). Because animals approach the US trigger zone at different latencies, the CS duration and ISI are variable across subjects.”

      References

      Bernstein, I. L., Vitiello, M. V., & Sigmundi, R. A. (1980). Effects of interference stimuli on the acquisition of learned aversions to foods in the rat. J Comp Physiol Psychol, 94(5), 921-931. doi:10.1037/h0077807

      Blair, H. T., Huynh, V. K., Vaz, V. T., Van, J., Patel, R. R., Hiteshi, A. K., . . . Tarpley, J. W. (2005). Unilateral storage of fear memories by the amygdala. J Neurosci, 25(16), 4198-4205. doi:10.1523/JNEUROSCI.0674-05.2005

      Bouton, M. E. (2007). Learning and Behavior: Sinauer Associates

      Choi, J. S., & Kim, J. J. (2010). Amygdala regulates risk of predation in rats foraging in a dynamic fear environment. Proc Natl Acad Sci U S A, 107(50), 21773-21777. doi:10.1073/pnas.1010079108

      Fanselow, M. S. (1984). Shock-induced analgesia on the formalin test: effects of shock severity, naloxone, hypophysectomy, and associative variables. Behav Neurosci, 98(1), 79-95. doi:10.1037//0735-7044.98.1.79

      Fanselow, M. S. (1986). Associative Vs Topographical Accounts of the Immediate Shock Freezing Deficit in Rats - Implications for the Response Selection-Rules Governing Species-Specific Defensive Reactions. Learning and Motivation, 17(1), 16-39. doi:Doi 10.1016/0023-9690(86)90018-4

      Fanselow, M. S. (2018). The Role of Learning in Threat Imminence and Defensive Behaviors. Curr Opin Behav Sci, 24, 44-49. doi:10.1016/j.cobeha.2018.03.003

      Fanselow, M. S., & Lester, L. S. (1988). A functional behavioristic approach to aversively motivated behavior: Predatory imminence as a determinant of the topography of defensive behavior: Lawrence Erlbaum Associates Inc.

      Fanselow, M. S., & Wassum, K. M. (2016). The Origins and Organization of Vertebrate Pavlovian Conditioning. Cold Spring Harbor Perspectives in Biology, 8(1). doi:ARTN a021717 10.1101/cshperspect.a021717

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behav Neurosci, 120(4), 873-879. doi:10.1037/0735-7044.120.4.873

      Lima, S. L., & Dill, L. M. (1990). Behavioral Decisions Made under the Risk of Predation - a Review and Prospectus. Canadian Journal of Zoology, 68(4), 619-640. doi:DOI 10.1139/z90-092

      Maren, S. (1999). Neurotoxic basolateral amygdala lesions impair learning and memory but not the performance of conditional fear in rats. J Neurosci, 19(19), 8696-8703.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497. doi:10.1016/s0896-6273(03)00033-3

      Sigmundi, R. A., Bouton, M. E., & Bolles, R. C. (1980). Conditioned Freezing in the Rat as a Function of Shock-Intensity and Cs Modality. Bulletin of the Psychonomic Society, 15(4), 254-256.

      Steimer, T. (2002). The biology of fear- and anxiety-related behaviors. Dialogues Clin Neurosci, 4(3), 231-249.

      Stephens, D. W., & Krebs, J. R. (1986). Foraging Theory: Princeton University Press.

      Tolman, E. C. (1948). Cognitive maps in rats and men. Psychol Rev, 55(4), 189-208. doi:10.1037/h0061626

      Zambetti, P. R., Schuessler, B. P., & Kim, J. J. (2019). Sex Differences in Foraging Rats to Naturalistic Aerial Predator Stimuli. iScience, 16, 442-452. doi:10.1016/j.isci.2019.06.011

      Zimmerman, J. M., Rabinak, C. A., McLachlan, I. G., & Maren, S. (2007). The central nucleus of the amygdala is essential for acquiring and expressing conditional fear after overtraining. Learn Mem, 14(9), 634-644. doi:10.1101/lm.607207

    1. Author Response:

      Reviewer #1 (Public Review):

      Overview

      This is a well-conducted study and speaks to an interesting finding in an important topic, whether ethological validity causes co-variation in gamma above and beyond the already present ethological differences present in systemic stimulus sensitivity.

      I like the fact that while this finding (seeing red = ethnologically valid = more gamma) seems to favor views the PI has argued for, the paper comes to a much simpler and more mechanistic conclusion. In short, it's good science.

      I think they missed a key logical point of analysis, in failing to dive into ERF <----> gamma relationships. In contrast to the modeled assumption that they have succeeded in color matching to create matched LGN output, the ERF and its distinct features are metrics of afferent drive in their own data. And, their data seem to suggest these two variables are not tightly correlated, so at very least it is a topic that needs treatment and clarity as discussed below.

      Further ERF analyses are detailed below.

      Minor concerns

      In generally, very well motived and described, a few terms need more precision (speedily and staircased are too inaccurate given their precise psychophysical goals)

      We have revised the results to clarify:

      "For colored disks, the change was a small decrement in color contrast, for gratings a small decrement in luminance contrast. In both cases, the decrement was continuously QUEST-staircased (Watson and Pelli, 1983) per participant and color/grating to 85% correct detection performance. Subjects then reported the side of the contrast decrement relative to the fixation spot as fast as possible (max. 1 s), using a button press."

      The resulting reaction times are reported slightly later in the results section.

      I got confused some about the across-group gamma analysis:

      "The induced change spectra were fit per participant and stimulus with the sum of a linear slope and up to two Gaussians." What is the linear slope?

      The slope is used as the null model – we only regarded gamma peaks as significant if they explained spectrum variance beyond any linear offsets in the change spectra. We have clarified in the Results:

      "To test for the existence of gamma peaks, we fit the per-participant, per-stimulus change spectra with three models: a) the sum of two gaussians and a linear slope, b) the sum of one Gaussian and a linear slope and c) only a linear slope (without any peaks) and chose the best-fitting model using adjusted R2-values."

      To me, a few other analyses approaches would have been intuitive. First, before averaging peak-aligned data, might consider transforming into log, and might consider making average data with measures that don't confound peak height and frequency spread (e.g., using the FWHM/peak power as your shape for each, then averaging).

      The reviewer comments on averaging peak-aligned data. This had been done specifically in Fig. 3C. Correspondingly, we understood the reviewer’s suggestion as a modification of that analysis that we now undertook, with the following steps: 1) Log-transform the power-change values; we did this by transforming into dB; 2) Derive FWHM and peak power values per participant, and then average those; we did this by a) fitting Gaussians to the per-participant, per-stimulus power change spectra, b) quantifiying FWHM as the Gaussian’s Standard Deviation, and the peak power as the Gaussian’s amplitude; 3) average those parameters over subjects, and display the resulting Gaussians. The resulting Gaussians are now shown in the new panel A in Figure 3-figure supplement 1.

      (A) Per-participant, the induced gamma power change peak in dB was fitted with a Gaussian added to an offset (for full description, see Methods). Plotted is the resulting Gaussian, with peak power and variance averaged over participants.

      Results seem to be broadly consistent with Fig. 3C.

      Moderate

      I. I would like to see a more precise treatment of ERF and gamma power. The initial slope of the ERF should, by typical convention, correlate strongly with input strength, and the peak should similarly be a predictor of such drive, albeit a weaker one. Figure 4C looks good, but I'm totally confused about what this is showing. If drive = gamma in color space, then these ERF features and gamma power should (by Occham's sledgehammer…) be correlated. I invoke the sledgehammer not the razor because I could easily be wrong, but if you could unpack this relationship convincingly, this would be a far stronger foundation for the 'equalized for drive, gamma doesn't change across colors' argument…(see also IIB below)…

      …and, in my own squinting, there is a difference (~25%) in the evoked dipole amplitudes for the vertically aligned opponent pairs of red- and green (along the L-M axis Fig 2C) on which much hinges in this paper, but no difference in gamma power for these pairs. How is that possible? This logic doesn't support the main prediction that drive matched differences = matched gamma…Again, I'm happy to be wrong, but I would to see this analyzed and explained intuitively.

      As suggested by the reviewer, we have delved deeper into ERF analyses. Firstly, we overhauled our ERF analysis to extract per-color ERF shape measures (such as timing and slope), added them as panels A and B in Figure 2-figure supplement 1:

      Figure 2-figure supplement 1. ERF and reaction time results: (A) Average pre-peak slope of the N70 ERF component (extracted from 2-12 ms before per-color, per-participant peak time) for all colors. (B) Average peak time of the N70 ERF component for all colors. […]. For panels A-C, error bars represent 95% CIs over participants, bar orientation represents stimulus orientation in DKL space. The length of the scale bar corresponds to the distance from the edge of the hexagon to the outer ring.

      We have revised the results to report those analyses:

      "The initial ERF slope is sometimes used to estimate feedforward drive. We extracted the per-participant, per-color N70 initial slope and found significant differences over hues (F(4.89, 141.68) = 7.53, pGG < 410 6). Specifically, it was shallower for blue hues compared to all other hues except for green and green-blue (all pHolm < 710-4), while it was not significantly different between all other stimulus hue pairs (all pHolm > 0.07, Figure 2-figure supplement 1A), demonstrating that stimulus drive (as estimated by ERF slope) was approximately equalized over all hues but blue.

      The peak time of the N70 component was significantly later for blue stimuli (Mean = 88.6 ms, CI95% = [84.9 ms, 92.1 ms]) compared to all (all pHolm < 0.02) but yellow, green and green-yellow stimuli, for yellow (Mean = 84.4 ms, CI95% = [81.6 ms, 87.6 ms]) compared to red and red-blue stimuli (all pHolm < 0.03), and fastest for red stimuli (Mean = 77.9 ms, CI95% = [74.5 ms, 81.1 ms]) showing a general pattern of slower N70 peaks for stimuli on the S-(L+M) axis, especially for blue (Figure 2-figure supplement 1B)."

      We also checked if our main findings (equivalence of drive-controlled red and green stimuli, weaker responses for S+ stimuli) are robust when controlled for differences in ERF parameters and added in the Results:

      "To attempt to control for potential remaining differences in input drive that the DKL normalization missed, we regressed out per-participant, per-color, the N70 slope and amplitude from the induced gamma power. Results remained equivalent along the L-M axis: The induced gamma power change residuals were not statistically different between red and green stimuli (Red: 8.22, CI95% = [-0.42, 16.85], Green: 12.09, CI95% = [5.44, 18.75], t(29) = 1.35, pHolm = 1.0, BF01 = 3.00).

      As we found differences in initial ERF slope especially for blue stimuli, we checked if this was sufficient to explain weaker induced gamma power for blue stimuli. While blue stimuli still showed weaker gamma-power change residuals than yellow stimuli (Blue: -11.23, CI95% = [-16.89, -5.57], Yellow: -6.35, CI95% = [-11.20, -1.50]), this difference did not reach significance when regressing out changes in N70 slope and amplitude (t(29) = 1.65, pHolm = 0.88). This suggests that lower levels of input drive generated by equicontrast blue versus yellow stimuli might explain the weaker gamma oscillations induced by them."

      We added accordingly in the Discussion:

      "The fact that controlling for N70 amplitude and slope strongly diminished the recorded differences in induced gamma power between S+ and S- stimuli supports the idea that the recorded differences in induced gamma power over the S-(L+M) axis might be due to pure S+ stimuli generating weaker input drive to V1 compared to DKL-equicontrast S- stimuli, even when cone contrasts are equalized.."

      Additionally, we made the correlation between ERF amplitude and induced gamma power clearer to read by correlating them directly. Accordingly, the relevant paragraph in the results now reads:

      "In addition, there were significant correlations between the N70 ERF component and induced gamma power: The extracted N70 amplitude was correlated across colors with the induced gamma power change within participants with on average r = -0.38 (CI95% = [-0.49, -0.28], pWilcoxon < 4*10-6). This correlation was specific to the gamma band and the N70 component: Across colors, there were significant correlation clusters between V1 dipole moment 68-79 ms post-stimulus onset and induced power between 28 54 Hz and 72 Hz (Figure 4C, rmax = 0.30, pTmax < 0.05, corrected for multiple comparisons across time and frequency)."

      II. As indicated above, the paper rests on accurate modeling of human LGN recruitment, based in fact on human cone recruitment. However, the exact details of how such matching was obtained were rapidly discussed-this technical detail is much more than just a detail in a study on color matching: I am not against the logic nor do I know of a flaw, but it's the hinge of the paper and is dealt with glancingly.

      A. Some discussion of model limitations

      B. Why it's valid to assume LGN matching has been achieved using data from the periphery: To buy knowledge, nobody has ever recorded single units in human LGN with these color stimuli…in contrast, the ERF is 'in their hands' and could be directly related (or not) to gamma and to the color matching predictions of their model.

      We have revised the respective paragraph of the introduction to read:

      "Earlier work has established in the non-human primate that LGN responses to color stimuli can be well explained by measuring retinal cone absorption spectra and constructing the following cone-contrast axes: L+M (capturing luminance), L-M (capturing redness vs. greenness), and S-(L+M) (capturing S-cone activation, which correspond to violet vs. yellow hues). These axes span a color space referred to as DKL space (Derrington, Krauskopf, and Lennie, 1984). This insight can be translated to humans (for recent examples, see Olkkonen et al., 2008; Witzel and Gegenfurtner, 2018), if one assumes that human LGN responses have a similar dependence on human cone responses. Recordings of human LGN single units to colored stimuli are not available (to our knowledge). Yet, sensitivity spectra of human retinal cones have been determined by a number of approaches, including ex-vivo retinal unit recordings (Schnapf et al., 1987), and psychophysical color matching (Stockman and Sharpe, 2000). These human cone sensitivity spectra, together with the mentioned assumption, allow to determine a DKL space for human observers. To show color stimuli in coordinates that model LGN activation (and thereby V1 input), monitor light emission spectra for colored stimuli can be measured to define the strength of S-, M-, and L-cone excitation they induce. Then, stimuli and stimulus background can be picked from an equiluminance plane in DKL space. "

      Reviewer #2 (Public Review):

      The major strengths of this study are the use of MEG measurements to obtain spatially resolved estimates of gamma rhythms from a large(ish) sample of human participants, during presentation of stimuli that are generally well matched for cone contrast. Responses were obtained using a 10deg diameter uniform field presented in and around the centre of gaze. The authors find that stimuli with equivalent cone contrast in L-M axis generated equivalent gamma - ie. that 'red' (+L-M) stimuli do not generate stronger responses than 'green (-L+M). The MEG measurements are carefully made and participants performed a decrement-detection task away from the centre of gaze (but within the stimulus), allowing measurements of perceptual performance and in addition controlling attention.

      There are a number of additional observations that make clear that the color and contrast of stimuli are important in understanding gamma. Psychophysical performance was worst for stimuli modulated along the +S-(L+M) direction, and these directions also evoked weakest evoked potentials and induced gamma. There also appear to be additional physiological asymmetries along non-cardinal color directions (e.g. Fig 2C, Fig 3E). The asymmetries between non-cardinal stimuli may parallel those seen in other physiological and perceptual studies and could be drawn out (e.g. Danilova and Mollon, Journal of Vision 2010; Goddard et al., Journal of Vision 2010; Lafer-Sousa et al., JOSA 2012).

      We thank the review for the pointers to relevant literature and have added in the Discussion:

      "Concerning off-axis colors (red-blue, green-blue, green-yellow and red-yellow), we found stronger gamma power and ERF N70 responses to stimuli along the green-yellow/red-blue axis (which has been called lime-magenta in previous studies) compared to stimuli along the red-yellow/green-blue axis (orange-cyan). In human studies varying color contrast along these axes, lime-magenta has also been found to induce stronger fMRI responses (Goddard et al., 2010; but see Lafer-Sousa et al., 2012), and psychophysical work has proposed a cortical color channel along this axis (Danilova and Mollon, 2010; but see Witzel and Gegenfurtner, 2013)."

      Similarly, the asymmetry between +S and -S modulation is striking and need better explanation within the model (that thalamic input strength predicts gamma strength) given that +S inputs to cortex appear to be, if anything, stronger than -S inputs (e.g. DeValois et al. PNAS 2000).

      We followed the reviewer’s suggestion and modified the Discussion to read:

      "Contrary to the unified pathway for L-M activation, stimuli high and low on the S-(L+M) axis (S+ and S ) each target different cell populations in the LGN, and different cortical layers within V1 (Chatterjee and Callaway, 2003; De Valois et al., 2000), whereby the S+ pathway shows higher LGN neuron and V1 afferent input numbers (Chatterjee and Callaway, 2003). Other metrics of V1 activation, such as ERPs/ERFs, reveal that these more numerous S+ inputs result in a weaker evoked potential that also shows a longer latency (our data; Nunez et al., 2021). The origin of this dissociation might lie in different input timing or less cortical amplification, but remains unclear so far. Interestingly, our results suggest that cortical gamma is more closely related to the processes reflected in the ERP/ERF: Stimuli inducing stronger ERF induced stronger gamma; and controlling for ERF-based measures of input drives abolished differences between S+ and S- stimuli in our data."

      Given that this asymmetry presents a potential exception to the direct association between LGN drive and V1 gamma power, we have toned down claims of a direct input drive to gamma power relationship in the Title and text and have refocused instead on L-M contrast.

      My only real concern is that the authors use a precomputed DKL color space for all observers. The problem with this approach is that the isoluminant plane of DKL color space is predicated on a particular balance of L- and M-cones to Vlambda, and individuals can show substantial variability of the angle of the isoluminant plane in DKL space (e.g. He, Cruz and Eskew, Journal of Vision 2020). There is a non-negligible chance that all the responses to colored stimuli may therefore be predicted by projection of the stimuli onto each individual's idiosyncratic Vlambda (that is, the residual luminance contrast in the stimulus). While this would be exhaustive to assess in the MEG measurements, it may be possible to assess perceptually as in the He paper above or by similar methods. Regardless, the authors should consider the implications - this is important because, for example, it may suggest that important of signals from magnocellular pathway, which are thought to be important for Vlambda.

      We followed the suggestion of the reviewer, performed additional analyses and report the new results in the following Results text:

      "When perceptual (instead of neuronal) definitions of equiluminance are used, there is substantial between-subject variability in the ratio of relative L- and M-cone contributions to perceived luminance, with a mean ratio of L/M luminance contributions of 1.5-2.3 (He et al., 2020). Our perceptual results are consistent with that: We had determined the color-contrast change-detection threshold per color; We used the inverse of this threshold as a metric of color change-detection performance; The ratio of this performance metric between red and green (L divided by M) had an average value of 1.48, with substantial variability over subjects (CI95% = [1.33, 1.66]).

      If such variability also affected the neuronal ERF and gamma power measures reported here, L/M-ratios in color-contrast change-detection thresholds should be correlated across subjects with L/M-ratios in ERF amplitude and induced gamma power. This was not the case: Change-detection threshold red/green ratios were neither correlated with ERF N70 amplitude red/green ratios (ρ = 0.09, p = 0.65), nor with induced gamma power red/green ratios (ρ = -0.17, p = 0.38)."

      Reviewer #3 (Public Review):

      This is an interesting article studying human color perception using MEG. The specific aim was to study differences in color perception related to different S-, M-, and L-cone excitation levels and especially whether red color is perceived differentially to other colors. To my knowledge, this is the first study of its kind and as such very interesting. The methods are excellent and manuscript is well written as expected this manuscript coming from this lab. However, illustrations of the results is not optimal and could be enhanced.

      Major

      The results presented in the manuscript are very interesting, but not presented comprehensively to evaluate the validity of the results. The main results of the manuscript are that the gamma-band responses to stimuli with absolute L-M contrast i.e. green and red stimuli do not differ, but they differ for stimuli on the S-(L+M) (blue vs red-green) axis and gamma-band responses for blue stimuli are smaller. These data are presented in figure 3, but in it's current form, these results are not well conveyed by the figure. The main results are illustrated in figures 3BC, which show the average waveforms for grating and for different color stimuli. While there are confidence limits for the gamma-band responses for the grating stimuli, there are no confidence limits for the responses to different color stimuli. Therefore, the main results of the similarities / differences between the responses to different colors can't be evaluated based on the figure and hence confidence limits should be added to these data.

      Figure 3E reports the gamma-power change values after alignment to the individual peak gamma frequencies, i.e. the values used for statistics, and does report confidence intervals. Yet, we see the point of the reviewer that confidence intervals are also helpful in the non-aligned/complete spectra. We found that inclusion of confidence intervals into Figure 3B,C, with the many overlapping spectra, renders those panels un-readable. Therefore, we included the new panel Figure 3-figure supplement 2A, showing each color’s spectrum separately:

      (A) Per-color average induced power change spectra. Banding shows 95% confidence intervals over participants. Note that the y-axis varies between colors.

      It is also not clear from the figure legend, from which time-window data is averaged for the waveforms.

      We have added in the legend:

      "All panels show power change 0.3 s to 1.3 s after stimulus onset, relative to baseline."

      The time-resolved profile of gamma-power changes are illustrated in Fig. 3D. This figure would a perfect place to illustrate the main results. However, of all color stimuli, these TFRs are shown only for the green stimuli, not for the red-green differences nor for blue stimuli for which responses were smaller. Why these TFRs are not showed for all color stimuli and for their differences?

      Figure 3-figure supplement 3. Per-color time-frequency responses: Average stimulus-induced power change in V1 as a function of time and frequency, plotted for each frequency.

      We agree with the reviewer that TFR plots can be very informative. We followed their request and included TFRs for each color as Figure 3-Figure supplement 3.

      Regarding the suggestion to also include TFRs for the differences between colors, we note that this would amount to 28 TFRs, one each for all color combinations. Furthermore, while gamma peaks were often clear, their peak frequencies varied substantially across subjects and colors. Therefore, we based our statistical analysis on the power at the peak frequencies, corresponding to peak-aligned spectra (Fig. 3c). A comparison of Figure 3C with Figure 3B shows that the shape of non-aligned average spectra is strongly affected by inter-subject peak-frequency variability and thereby hard to interpret. Therefore, we refrained from showing TFR for differences between colors, which would also lack the required peak alignment.

    1. Author Response:

      Reviewer #1:

      Insulin-secreting beta-cells are electrically excitable, and action potential firing in these cells leads to an increase in the cytoplasmic calcium concentration that in turn stimulates insulin release. Beta-cells are electrically coupled to their neighbours and electrical activity and calcium waves are synchronised across the pancreatic islets. How these oscillations are initiated are not known. In this study, the authors identify a subset of 'first responders' beta-cells that are the first to respond to glucose and that initiate a propagating Ca2+ wave across the islet. These cells may be particularly responsive because of their intrinsic electrophysiological properties. Somewhat unexpectedly, the electrical coupling of first responder cells appears weaker than that in the other islet cells but this paradox is well explained by the authors. Finally, the authors provide evidence of a hierarchy of beta-cells within the islets and that if the first responder cells are destroyed, other islet cells are ready to take over.

      The strengths of the paper are the advanced calcium imaging, the photoablation experiments and the longitudinal measurements (up to 48h).

      Whilst I find the evidence for the existence of first responders and hierarchy convincing, the link between the first responders in isolated individual islets and first phase insulin secretion seen in vivo (which becomes impaired in type-2 diabetes) seems somewhat overstated. It is is difficult to see how first responders in an islet can synchronise secretion from 1000s (rodents) to millions of islets (man) and it might be wise to down-tone this particular aspect.

      We thank the reviewer for highlighting this point. We acknowledge that we did not measure insulin from individual islets post first responder cell ablation, where we observed diminished first phase Ca2+. We do note that studies have linked the first phase Ca2+ response to first phase insulin release [Henquin et al, Diabetes (2006) and Head et al, Diabetes (2012)], albeit with additional amplification signals for higher glucose elevations. Thus a diminished first phase Ca2+ would imply a diminished first phase insulin (although given the amplifying signals the converse would not necessarily be the case).

      Nevertheless there are also important caveats to our experiment. Within islets we ablated a single first responder cell. In small islets this ablation diminished Ca2+ in the plane that we imaged. In larger islets this ablation did not, pointing to the presence of multiple first responder cells. Furthermore we only observed the plane of the islet containing the ablated first responder. It is possible elsewhere in the islet that [Ca2+] was not significantly disrupted. Thus even within a small islet it is possible for redundancy, where multiple first responder cells are present and that together drive first phase [Ca2+] across the islet. Loss of a single first responder cell only disrupts Ca2+ locally. That we see a relationship between the timing of the [Ca2+] response and distance from the first responder would support this notion. Results from the islet model also support this notion, where >10% of cells were required to be ablate to significantly disrupt first-phase Ca2+.

      While we already discuss the issue of redundancy in large islets and in 3D, we now briefly mention the importance of measuring insulin release.

      Reviewer #2:

      Kravets et al. further explored the functional heterogeneity in insulin-secreting beta cells in isolated mouse islets. They used slow cytosolic calcium [Ca2+] oscillations with a cycle period of 2 to several minutes in both phases of glucose-dependent beta cell activity that got triggered by a switch from unphysiologically low (2 mM) to unphysiologically high (11 mM) glucose concentration. Based on the presented evidence, they described a distinct population of beta cells responsible for driving the first phase [Ca2+] elevation and characterised it to be different from some other previously described functional subpopulations.

      Strengths:

      The study uses advanced experimental approaches to address a specific role a subpopulation of beta cells plays during the first phase of an islet response to 11 mM glucose or strong secretagogues like glibenclamide. It finds elements of a broadscale complex network on the events of the slow time scale [Ca2+] oscillations. For this, they appropriately discuss the presence of most connected cells (network hubs) also in slower [Ca2+] oscillations.

      Weakness:

      The critical weakness of the paper is the evaluation of linear regressions that should support the impact of relative proximity (Fig. 1E), of the response consistency (Fig. 2C), and of increased excitability of the first responder cells (Fig. 3B). None of the datasets provided in the submission satisfies the criterion of normality of the distribution of regression residuals. In addition, the interpretation that the majority of first responder cells retain their early response time could as well be interpreted that the majority does not.

      We thank the reviewers for their input, as it really opened multiple opportunities for us to improve our analysis and strengthen our arguments of the existence and consistency of the first responder cells. We present more detailed analysis for these respective figures below and describe how these are included in the manuscript.

      As it is described below, we performed additional in-depth analysis and statistical evaluation of the data presented in figures 1E, 2C, and 3B. We now report that two of the datasets (Fig.1 E, Fig.2 C) satisfy the criterion of normality of the distribution of regression residuals. The third dataset (Fig.3 B) does not satisfy this criterion, and we update our interpretation of this data in the text.

      Figure 1E Statistics, Scatter: We now show the slope and p-value indicating deviation of the slope from 0, and r^2 values in Fig.1 E. While the scatter is large (r^2=0.1549 in Fig.1E) for cells located at all distances from the first responder cell, we found that scatter substantially diminishes when we consider cells located closer to the first responder (r^2=0.3219 in Fig.S1 F): the response time for cells at distances up to 60 μm from the first responder cells now is shown in Fig.S1 F. The choice of 60 μm comes from it being the maximum first-to-last responder distance in our data set (see red box in Fig.1D).

      Additionally, we noticed that within larger islets there may be multiple domains with their own first responder in the center (now in Fig.S1 E) and below. Linear distance/time dependence is preserved withing each domain.

      Figure 1E Normality of residuals: We appreciate reviewer’s suggestion and now see that the original “distance vs time” dependence in Fig.1 E did not meet normality of residuals test. When plotted as distance (μm)/response time (percentile), the cumulative distribution still did not meet the Shapiro-Wilk test for normality of residuals (see QQ plot “All distances” below). However, for cells located in the 60 μm proximity of the first responder, the residuals pass the Shapiro- Wilk normality test. The QQ-plots for “up to 60 μm distances” are included in Fig.S1 G.

      Figure 2C Statistic and Scatter: After consulting a biostatistician (Dr. Laura Pyle), we realized that since the Response time during initial vs repeated glucose elevation was measured in the same islet, these were repeated measurements on the same statistical units (i.e. a longitudinal study). Therefore, it required a mixed model analysis, as opposed to simple linear regression which we used initially. We now have applied linear mixed effects model (LMEM) to LN- transformed (original data + 0.0001). The 0.0001 value was added to avoid issues of LN(0).

      We now show LMEM-derived slope and p-value indicating deviation of the slope from 0 in Fig.2 C. Further, we performed sorting of the data presented in Fig.2 C by distance to each of the first responders (now added to Fig.2D). An example of the sorted vs non-sorted time of response in the large islet with multiple first responders is added to the Source Data – Figure 1. We found a substantial improvement of the scatter in the distance- sorted data, compared to the non-sorted, which indicates that consistency of the glucose response of a cell correlates with it’s proximity to the first responder. We also discuss this in the first sub-section of the Discussion.

      Figure 2C Normality of residuals: The residuals pass Shapiro-Wilk normality test for LMEM of the LN-transformed data. We added very small number (0.0001) to all 0 values in our data set, presented in Fig.2C, D, and Fig.S4 A, to perform natural-log transformation. Details on the LMEM and it’s output are added to the Source data – Statistical analysis file.

      Figure 3B Statistic and Scatter: We now show LMEM-derived slope and p-value, indicating deviation of the slope from 0, values in Fig.3 B (below). The LMEM-derived slope has p-value of 0.1925, indicating that the slope is not significantly different from 0. This result changes our original interpretation, and we now edit the associated results and discussion.

      Figure 3B Normality of residuals: This data set does not pass Shapiro-Wilk test.

      A major issue of the work is also that it is unnecessarily complicated. In the Results section, the authors introduce a number of beta cell subpopulations: first responder cell, last responder cell, wave origin cell, wave end cell, hub-like phase 1, hub-like phase 2, and random cells, which are all defined in exclusively relative terms, regarding the time within which the cells responded, phase lags of their oscillations, or mutual distances within the islet. These cell types also partially overlap.

      To address this comment, we added Table 1 to describe the properties of these different populations.

      Their choice to use the diameter percentile as a metrics for distances between the cells is not well substantiated since they do not demonstrate in what way would the islet size variability influence the conclusion. All presented islets are of rather a comparable size within the diffusion limits.

      We replaced normalized distances in Fig.1 D with absolute distance from first responder in μm.

      The functional hierarchy of cells defining the first response should be reflected in the consistency of their relative response time. The authors claim that the spatial organisation is consistent over a time of up to 24 hours. In the first place, it is not clear why would this prolonged consistency be of an advantage in comparison to the absence of such consistency. The linear regression analysis between the initial and repeated relative activation times does suggest a significant correlation, but the distribution of regression residuals of the provided data is again not normal and non-conclusive, despite the low p-value. 50% of the cells defined a first responder in the initial stimulation were part of that subpopulation also during the second stimulation, which is rather random.

      We began to describe our analysis of the response time to initial and repeated glucose stimulation earlier in this reply. Further evidence of the distance-dependence of the consistency of the response time is now presented in Fig.S4 A: a response time consistency for cells at 60 μm, 50μm, and 40 μm proximity to the first responder. The closer a cell is located to the first responder, the higher is the consistency of its response time (the lower the scatter), below.

      If we analyze this data with a linear regression model, where the r^2 allows us to quantitatively demonstrate decrease of the scatter, we observe r^2 of 0.3013, 0.3228, 0.3674 respectively for cells at 60 μm, 50μm, and 40 μm proximity to the first responder (below). This data is not included in the manuscript because residuals do not pass Shapiro-Wilk Normality test for this model (while they do for the LMEM).

      One of the most surprising features of this study is the total lack of fast [Ca2+] oscillations, which are in mouse islets, stimulated with 11 mM glucose typically several seconds long and should be easily detected with the measurement speed used.

      Our data used in this manuscript contains Ca2+ dynamics from islets with a) slow oscillations only, b) fast oscillations superimposed on the slow oscillations, c) no obvious oscillations (likely continual spiking). Representative curves are below. Because we focused our study on the slow oscillations, we used dynamics of type (a) in our figures, which formed an impression that no fast oscillations were present. In our analysis of dynamics of type (b) we used Fourier transformation to separate slow oscillations from the fast (described in Methods). Dynamics of type (c) were excluded from the analysis of the oscillatory phase, and instead only used for the first-phase analysis. We indicate this exclusion in the methods.

      And lastly, we should also not perpetuate imprecise information about the disease if we know better. The first sentence of the Introduction section, stating that "Diabetes is a disease characterised by high blood glucose, …" is not precise. Diabetes only describes polyuria. Regarding the role of high glucose, a quote from a textbook by K. Frayn, R Evans: Human metabolism - a regulatory perspective, 4rd. 2019 „The changes in glucose metabolism are usually regarded as the "hallmark" of diabetes mellitus, and treatment is always monitored by the level of glucose in the blood. However, it has been said that if it were as easy to measure fatty acids in the blood as it is to measure glucose, we would think of diabetes mellitus mainly as a disorder of fat metabolism."

      We acknowledge that Diabetes alone refers to polyurea, and instead state Diabetes Mellitus to be more precise to the disease we refer to. We stated “Diabetes is a disease characterized by high blood glucose, ... “ as this is in line with internationally accepted diagnoses and classification criteria, such as position statements from the American Diabetes Association [‘Diagnosis and Classification of Diabetes Mellitus” AMERICAN DIABETES ASSOCIATION, DIABETES CARE, 36, (2013)]. We certainly acknowledge the glucose-centric approach to characterizing and diagnosing Diabetes Mellitus is largely born of the ease of which glucose can be measured. Thus if blood lipids could be easily measured we may be characterizing diabetes as a disease of hyperlipidemia (depending how lipidemia links with complications of diabetes).

    1. Author Response:

      Joint Public Review:

      A highly robust result when investigating how neural population activity is impacted by performance in a task is that the trial to trial correlations (noise correlations) between neurons is reduced as performance increases. However the theoretical and experimental literature so far has failed to account for this robust link since reduced noise correlations do not systematically contribute to improved availability or transmission of information (often measured using decoding of stimulus identity). This paper sets out to address this discrepancy by proposing that the key to linking noise correlations to decoding and thus bridging the gap with performance is to rethink the decoders we use : instead of decoders optimized to the specific task imposed on the animal on any given trial (A vs B / B vs C / A vs C), they hypothesize that we should favor a decoder optimized for a general readout of stimulus properties (A vs B vs C).

      To test this hypothesis, the authors use a combination of quantitative data analysis and mechanistic network modeling. Data were recorded from neuronal populations in area V4 of two monkeys trained to perform an orientation change detection task, where the magnitude of orientation change could vary across trials, and the change could happen at cued (attended) or uncued (unattended) locations in the visual field. The model, which extends previous work by the authors, reproduces many basic features of the data, and both the model and data offer support for the hypothesis.

      The reviewers agreed that this is a potentially important contribution, that addresses a widely observed, but puzzling, relation between perceptual performance and noise correlations. The clarity of the hypothesis, and the combination of data analysis and computational modelling are two essential strengths of the paper.

      Overall this paper exhibits a new factor to be taken into account when analysing neural data : the choice of decoder and in particular how general or specific the decoder is. The fact that the generality of the decoder sheds light on the much debated question of noise correlations underscores its importance. The paper therefore opens multiple avenues for future research to probe this new idea, in particular for tasks with multiple stimuli dimensions.

      Nonetheless, as detailed below, the reviewers believe the manuscript clarity could be further improved in several points, and some additional analysis of the data would provide more straightforward test of the hypothesis.

      1. It would be important to verify that the model reproduces the correlation between noise and signal correlations since this is really a key argument leading to the author's hypothesis.

      We have incorporated this verification of the model into the manuscript, as referred to below in the Results:

      “Importantly, this model reproduces the correlation between noise and signal correlations (Figure 2–figure supplement 1) observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011). This correlation between the shared noise and the shared tuning is a key component of the general decoder hypothesis. We observed this strong relationship between noise and signal correlations in our recorded neurons (Figure 2–figure supplement 1A) as well as in our modeled data (Figure 2–figure supplement 1B). Using this model, we were able to measure the relationship between noise and signal correlations for varying strengths of attentional modulation. Consistent with the predictions of the general decoder hypothesis, attention weakened the relationship between noise and signal correlations (Figure 2–figure supplement 1C).”

      The new figure is as below:

      Figure 2–figure supplement 1. The model reproduces the relationship between noise and signal correlations that is key to the general decoder hypothesis. (A) As previously observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011), we observe a strong relationship between noise and signal correlations. During additional recordings collected during most recording sessions (for Monkey 1 illustrated here, n = 37 days with additional recordings), the monkey was rewarded for passively fixating the center of the monitor while Gabors with randomly interleaved orientations were flashed at the receptive field location (‘Stim 2’ location in Figure 1C). The presented orientations spanned the full range of stimulus orientations (12 equally spaced orientations from 0 to 330 degrees). We calculated the signal correlation for each pair of units based on their mean responses to each of the 12 orientations. We define the noise correlation for each pair of units as the average noise correlation for each orientation. The plot depicts signal correlation as a function of noise correlation across all recording sessions, binned into 8 equally sized sets of unit pairs. Error bars represent SEM. (B) The model reproduces the relationship between noise and signal correlations. Signal correlation is plotted as a function of noise correlation, binned into 20 equally sized sets of unit pairs (n = 2000 neurons), for each attentional modulation strength (green: least attended; yellow: most attended). The results were averaged over 50 tested orientations. (C) The slope of the relationship between noise and signal correlations (y-axis) decreases with increasing attentional modulation (x-axis). This suggests that noise is less aligned with signal correlation with increasing attentional modulation.

      2. Testing the hypothesis of the general decoder:<br /> 2.1 In the data, the authors compare mainly the specific (stimulus) decoder and the monkey's choice decoder. The general stimulus decoder is only considered in fig. 3f, because data across multiple orientations are available only for the cued condition, and therefore the general and specific decoders cannot be compared for changes between cued and uncued. However, the hypothesized relation between mean correlations and performance should also be true within a fixed attention condition (cued), comparing sessions with larger vs. smaller correlation. In other words, if the hypothesis is correct, you should find that performance of the "most general" decoder (as in fig. 3f) correlates negatively with average noise correlations, across sessions, more so than the "most specific" decoder.<br /> We have added a new supplementary figure to the manuscript:

      Figure 3–figure supplement 1. Based on the electrophysiological data, the performance of the monkey’s decoder was more related to mean correlated variability than the performance of the specific decoder within each attention condition. (A) Within the cued attention condition, the performance of the monkey’s decoder was more related to mean correlated variability (left plot; correlation coefficient: n = 71 days, r = -0.23, p = 0.058) than the performance of the specific decoder (right plot; correlation coefficient: r = 0.038, p = 0.75). The correlation coefficients associated with the two decoders were significantly different from each other (Williams’ procedure: t = 3.8, p = 1.5 x 10^-4). Best fit lines plotted in gray. Data from both monkeys combined (Monkey 1 data shown in orange: n = 44 days; Monkey 2 data shown in purple: n = 27 days) with mean correlated variability z-scored within monkey. (B) The data within the uncued attention condition showed a similar pattern, with the performance of the monkey’s decoder more related to mean correlated variability (n = 69 days, r = -0.20, p = 0.14) than the performance of the specific decoder (r = 0.085, p = 0.51; Williams’ procedure: t = 2.0, p = 0.049). Conventions as in (A) (Monkey 1: n = 42 days – see Methods for data exclusions as in Figure 3C; Monkey 2: n = 27 days).

      2.2 In figure 3f, a more straightforward and precise comparison is to use the stimulus decoders to predict the choice, and test whether the more specific or the more general can predict choices more accurately.

      We have added a new panel to Figure 3 (Figure 3G) that illustrates the results of this analysis comparing whether the specific or more-general decoders predict the monkey’s trial-by-trial choices more accurately:

      Figure 3… (G) The more general the decoder (x-axis), the better its performance predicting the monkey’s choices on the median changed orientation trials (y-axis; the proportion of leave-one-out trials in which the decoder correctly predicted the monkey’s decision as to whether the orientation was the starting orientation or the median changed orientation). Conventions as in (F) (see Methods for n values).

      The description of this new panel in the Results section is as below:

      “Further, the more general the decoder, the better it predicted the monkey’s trial-by-trial choices on the median changed orientation trials (Figure 3G).”

      The updated Methods section describing this new panel is as below:

      “For Figure 3G, we performanced analyses similar to those performed for Figure 3F, in that we tested each stimulus decoder: ‘1 ori’ decoders (n = 8 decoders; 1 specific decoder for either the first, second, fourth, or fifth largest changed orientation, for each of the 2 monkeys), ‘2 oris’ decoders (n = 12 decoders; 1 decoder for each of the 6 combinations of 2 changed orientations, for each of the 2 monkeys), ‘3 oris’ decoders (n = 8 decoders; 1 decoder for each of the 4 combinations of 3 changed orientations, for each of the 2 monkeys), and ‘4 oris’ decoders (n = 2 decoders; 1 decoder for the 1 combination of 4 changed orientations, for each of the 2 monkeys). However, unlike in Figure 3F, where the performance of the stimulus decoders was compared to the performance of the monkey’s decoder on the median orientation-change trials, here we calculated the performance of the stimulus decoder when tasked with predicting the trial-by-trial choices that the monkey made on the median orientation-change trials. We plotted the proportion of leave-one-out trials in which each decoder correctly predicted the monkey’s choice as to whether the orientation was the starting orientation or the median changed orientation.”

      3. The main goal of the manuscript is to determine the impact of noise correlations on various decoding schemes. The figures however only show how decoding co-varies with correlations, but a direct, more causal analysis of the effect of correlations on decoding seems to be missing. Such an analysis can be obtained by comparing decoding on simultaneously recorded activity with decoding on trial-shuffled activity, in which noise-correlations are removed.

      We have added the following Discussion section to address this point:

      “The purpose of this study was to investigate the relationship between mean correlated variability and a general decoder. We made an initial test of the overarching hypothesis that observers use a general decoding strategy in feature-rich environments by testing whether a decoder optimized for a broader range of stimulus values better matched the decoder actually used by the monkeys than a specific decoder optimized for a narrower range of stimulus values. We purposefully did not make claims about the utility of correlated variability relative to hypothetical situations in which correlated variability does not exist in the responses of a group of neurons, as we suspect that this is not a physiologically realistic condition. Studies that causally manipulate the level of correlated variability in neuronal populations to measure the true physiological and behavioral effects of increasing or decreasing correlated variability levels, through pharmacological or genetic means, may provide important insights into the impact of correlated variability on various decoding strategies.”

      4. How different are the four different decoders (specific/monkey, cued/uncued)? It would be interesting to see how much they overlap. More generally, the authors should discuss the alternative that attention modulates also the readout/decoding weights, rather than or in addition to modulating V4 activity.

      We have added the following to the manuscript:

      A fixed readout mechanism

      A prior study from our lab found that attention, rather than changing the neuronal weights of the observer’s decoder, reshaped neuronal population activity to better align with a fixed readout mechanism (Ruff & Cohen, 2019). To test whether the neuronal weights of the monkey’s decoder changed across attention conditions (attended versus unattended), Ruff and Cohen switched the neuronal weights across conditions, testing the stimulus information in one attention condition with the neuronal weights from the other. They found that even with the switched weights, the performance of the monkey’s decoder was still higher in the attended condition. The results of this study support the conclusion that attention reshapes neuronal activity so that a fixed readout mechanism can better read out stimulus information. In other words, differences in the performance of the monkey’s decoder across attention conditions may be due to differences in how well the neuronal activity aligns with a fixed decoder.

      Our study extends the findings of Ruff and Cohen to test whether that fixed readout mechanism is determined by a general decoding strategy. Our findings support the hypothesis that observers use a general decoding strategy in the face of changing stimulus and task conditions. Our findings do not exclude other potential explanations for the suboptimality of the monkey’s decoder, nor do they exclude the possibility that attention modulates decoder neuronal weights. However, our findings together with those of Ruff and Cohen shed light on why neuronal decoders are suboptimal in a manner that aligns the fixed decoder axis with the correlated variability axis (Ni et al., 2018; Ruff et al., 2018).”

      5. Quantifying the link between model and data :<br /> 5.1 the text providing motivation for the model could be improved. The motivation used in the manuscript is, essentially, that the model allows to extrapolate beyond the data (more stimuli, more repetitions, more neurons). The dangers of extrapolation beyond the range of the data are however well known. A model that extrapolates beyond existing data is useful to design new experiments and test predictions, but this is not done here. Because the manuscript is about information and decoding, a better motivation is the fact that this model takes an actual image as input, and produces tuning and covariance compatible with each other because they are constrained by an actual network that processes the input (as opposed to parametric models where tuning and covariance can be manipulated independently).

      We have modified the manuscript as below:

      “Here, we describe a circuit model that we designed to allow us to compare the specific and monkey’s decoders from our electrophysiological dataset to modeled ideal specific and general decoders. The primary benefit of our model is that it can take actual images as inputs and produce neuronal tuning and covariance that are compatible with each other because of constraints from the simulated network that processed the inputs (Huang et al., 2019). Parametric models in which tuning and covariance can be manipulated independently would not provide such constraints. In our model, the mean correlated variability of the population activity is restricted to very few dimensions, matching experimentally recorded data from visual cortex demonstrating that mean correlated variability occupies a low-dimensional subset of the full neuronal population space (Ecker et al., 2014; Goris et al., 2014; Huang et al., 2019; Kanashiro et al., 2017; Lin et al., 2015; Rabinowitz et al., 2015; Semedo et al., 2019; Williamson et al., 2016).”

      “Our study also demonstrates the utility of combining electrophysiological and circuit modeling approaches to studying neural coding. Our model mimicked the correlated variability and effects of attention in our physiological data. Critically, our model produced neuronal tuning and covariance based on the constraints of an actual network capable of processing images as inputs.”

      We have also removed the Results and Discussion text that suggested that the model allowed us to extrapolate beyond the data.

      5.2 The ring structure, and the orientation of correlations (Fig 2b) seem to be key ingredients of the model, but are they based on data, or ad-hoc assumptions?

      We have modified the manuscript to clarify this point, as below:

      “As the basis for our modeled general decoder, we first mapped the n-dimensional neuronal activity of our model in response to the full range of orientations to a 2-dimensional space. Because the neurons were tuned for orientation, we could map the n-dimensional population responses to a ring (Figure 2B, C). The orientation of correlations (the shape of each color cloud in Figure 2B) was not an assumed parameter, and illustrates the outcome of the correlation structure and dimensionality modeled by our data. In Figure 2B, we can see that the fluctuations along the radial directions are much larger than those along other directions for a given orientation. This is consistent with the low-dimensional structure of the modeled neuronal activity. In our model, the fluctuations of the neurons, mapped to the radial direction on the ring, were more elongated in the unattended state (Figure 2B) than in the attended state (Figure 2C).”

      5.3 In the model, the specific decoder is quite strongly linked to correlated variability and the improvement of the general decoder is clear but incremental (0.66 vs 0.83) whereas in the data there really is no correlation at all (Fig 3c). This is a bit problematic because the author's begin by stating that specific decoders cannot explain the link between noise correlations and accuracy but their specific decoder clearly shows a link.

      We appreciate this point and have modified the manuscript as below:

      “Indeed, we found that just as the performance of the physiological monkey’s decoder was more strongly related to mean correlated variability than the performance of the physiological specific decoder (Figure 3C; see Figure 3–figure supplement 1 for analyses per attention condition), the performance of the modeled general decoder was more strongly related to mean correlated variability than the performance of the modeled specific decoder (Figure 3D). We modeled much stronger relationships to correlated variability (Figure 3D) than observed with our physiological data (Figure 3C). We observed that the correlation with specific decoder performance was significant with the modeled data but not with the physiological data. This is not surprising as we saw attentional effects, albeit small ones, on specific decoder performance with both the physiological and the modeled data (Figure 3A, B). Even small attentional effects would result in a correlation between decoder performance and mean correlated variability with a large enough range of mean correlated variability values. It is possible that with enough electrophysiological data, the performance of the specific decoder would be significantly related to correlated variability, as well. As described above, our focus is not on whether the performance of any one decoder is significantly correlated with mean correlated variability, but on which decoder provides a better explanation of the frequently observed relationship between performance and mean correlated variability. The performance of the general decoder was more strongly related to mean correlated variability than the performance of the specific decoder.”

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.”

      6. General decoder: Some parts of the text (eg. Line 60, Line 413) refer to a decoder that accounts for discrimination along different stimulus dimensions (eg. different values of orientation, or different color of the visual input). But the results of the manuscripts are about a general decoder for multiple values along a single stimulus dimension. The disconnect should be discussed, and the relation between these two scenarios explained.

      We have modified the manuscript as below:

      “Here, we report the results of an initial test of this overarching hypothesis, based on a single stimulus dimension. We used a simple, well-studied behavioral task to test whether a more-general decoder (optimized for a broader range of stimulus values along a single dimension) better explained the relationship between behavior and mean correlated variability than a more-specific decoder (optimized for a narrower range of stimulus values along a single dimension). Specifically, we used a well-studied orientation change-detection task (Cohen & Maunsell, 2009) to test whether a general decoder for the full range of stimulus orientations better explained the relationship between behavior and mean correlated variability than a specific decoder for the orientation change presented in the behavioral trial at hand.

      This test based on a single stimulus dimension is an important initial test of the general decoder hypothesis because many of the studies that found that performance increased when mean correlated variability decreased used a change-detection task…”

      “We performed this initial test of the overarching general decoder hypothesis in the context of a change-detection task along a single stimulus dimension because this type of task was used in many of the studies that reported a relationship between perceptual performance and mean correlated variability (Cohen & Maunsell, 2009; 2011; Herrero et al., 2013; Luo & Maunsell, 2015; Mayo & Maunsell, 2016; Nandy et al., 2017; Ni et al., 2018; Ruff & Cohen, 2016; 2019; Verhoef & Maunsell, 2017; Yan et al., 2014; Zénon & Krauzlis, 2012). This simple and well-studied task provided an ideal initial test of our general decoder hypothesis.

      This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      “This initial study of the general decoder hypothesis tested this idea in the context of a visual environment in which stimulus values only changed along a single dimension. However, our overarching hypothesis is that observers use a general decoding strategy in the complex and feature-rich visual scenes encountered in natural environments. In everyday environments, visual stimuli can change rapidly and unpredictably along many stimulus dimensions. The hypothesis that such a truly general decoder explains the relationship between perceptual performance and mean correlated variability is suggested by our finding that the modeled general decoder for orientation was more strongly related to mean correlated variability than the modeled specific decoder (Figure 3D). Future tests of a general decoder for multiple stimulus features would be needed to determine if this decoding strategy is used in the face of multiple changing stimulus features. Further, such tests would need to consider alternative hypotheses for how sensory information is decoded when observing multiple aspects of a stimulus (Berkes et al., 2009; Deneve, 2012; Lorteije et al., 2015). Studies that use complex or naturalistic visual stimuli may be ideal for further investigations of this hypothesis.”

      7. Some statements in the discussion such as l 354 "the relationship between behavior and mean correlated variability is explained by the hypothesis that observers use a general strategy" should be qualified : the authors clearly show that the general decoder amplifies the relationship but in their own data the relationship exists already with a specific decoder.

      We have modified the manuscript as below:

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.

      “Together, these results support the hypothesis that observers use a more general decoding strategy in scenarios that require flexibility to changing stimulus conditions.”

      “This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      8. Low-Dimensionality, beginning of Introduction and end of Discussion: experimentally, cortical activity is low-dimensional, and the proposed model captures that. But some of the reviewers did not understand the argument offered for why this matters, for the relation between average correlations and performance. It seems that the dimensionality of the population covariance is not relevant: The point instead is that a change in amplitude of fluctuations along the f'f' direction necessarily impact performance of a "specific" decoder, whereas changes in all other dimensions can be accounted for by the appropriate weights of the "specific" decoder. On the other hand, changes in fluctuation strength along multiple directions may impact the performance of the "general" decoder.

      We have modified the manuscript as below:

      “These observations comprise a paradox because changes in this simple measure should have a minimal effect on information coding. Recent theoretical work shows that neuronal population decoders that extract the maximum amount of sensory information for the specific task at hand can easily ignore mean correlated noise (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; for review, see Kohn et al., 2016). Decoders for the specific task at hand can ignore mean correlated variability because it does not corrupt the dimensions of neuronal population space that are most informative about the stimulus (Moreno-Bote et al., 2014).”

      “Our results address a paradox in the literature. Electrophysiological and theoretical evidence supports that there is a relationship between mean correlated variability and perceptual performance (Abbott & Dayan, 1999; Clery et al., 2017; Haefner et al., 2013; Jin et al., 2019; Ni et al., 2018; Ruff & Cohen, 2019; reviewed by Ruff et al., 2018). Yet, a specific decoding strategy in which different sets of neuronal weights are used to decode different stimulus changes cannot easily explain this relationship (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; reviewed by Kohn et al., 2016). This is because specific decoders of neuronal population activity can easily ignore changes in mean correlated noise (Moreno-Bote et al., 2014).”

    1. Author Response:

      Reviewer #1 (Public Review):

      The introduction felt a bit short. I was hoping early on I think for a hint at what biotic and abiotic factors UV could be important for and how this might be important for adaptation. A bit more on previous work on the genetics of UV pigmentation could be added too. I think a bit more on sunflowers more generally (what petiolaris is, where natural pops are distributed, etc.) would be helpful. This seems more relevant than its status as an emoji, for example.

      We had opted to provide some of the relevant background in the corresponding sections of the manuscript, but agree that it would be beneficial to expand the introduction. In the revised version of the manuscript, we have modified the introduction and the first section of Results and Discussion to include more information about wild sunflowers, possible adaptive functions of floral UV patterns, and previous work on the genetic basis of floral UV patterning. More generally, we have strived to provide more background information throughout the manuscript.

      The authors present the % of Vp explained by the Chr15 SNP. Perhaps I missed it, but it might be nice to also present the narrow sense heritability and how much of Va is explained.

      Narrow sense heritability for LUVp is extremely high in our H. annuus GWAS population; four different software [EMMAX (Kang et al., Nat Genet 2010), GEMMA (Zhou and Stephens, Nat Genet. 2012), GCTA (Yang et al., Am J Hum Genet 2011) and BOLT_LMM (Loh et al., Nat Genet 2015)] provided h2 estimates of ~1. While it is possible that these estimates are somewhat inflated by the presence of a single locus of extremely large effect, all individuals in this populations were grown at the same time under the same conditions, and limited environmental effects would therefore be expected. The percentage of additive variance explained by HaMYB111 appears therefore to be equal to the percentage of phenotypic variance (~62%).

      We have included details in the Methods section – Genome-wide association mapping, and added this information to the relevant section of the main text:

      “The chromosome 15 SNP with the strongest association with ligule UV pigmentation patterns in H. annuus (henceforth “Chr15_LUVp SNP”) explained 62% of the observed phenotypic and additive variation (narrow-sense heritability for LUVp in this dataset is ~1).”

      A few lines of discussion about why the Chr15 allele might be observed at only low frequencies in petiolaris I think would be of interest - the authors appear to argue that the same abiotic factors may be at play in petiolaris, so why don't we see this allele at frequencies higher than 2%? Is it recent? Geographically localized?

      That is a very interesting observation, and we currently do not have enough data to provide a definitive answer to why that is. From GWAS, HaMYB111 does not seem to play a measurable role in controlling variation for LUVp in H. petiolaris; Even when we repeat the GWAS with MAF > 1%, so that the Chr15_LUVp SNP would be included in the analysis, there is no significant association between that SNP and LUVp (the significant association on chr. 15 seen in the Manhattan plot for H. petiolaris is ~20 Mbp downstream of HaMYB111). The rarity of the L allele in H. petiolaris could complicate detection of a GWAS signal; on the other hand, the few H. petiolaris individuals carrying the L allele have, on average, only marginally larger LUVp than the rest of the population (LL = 0.32 allele).

      The two most likely explanations for the low frequencies of the L allele in H. petiolaris are differences in alleles, or their effect, between H. annuus and H. petiolaris; or, as suggested by the reviewer, a recent introgression. In H. annuus, the Chr15_LUVp SNP is likely not the actual causal polymorphism affecting HaMYB111 activity, but is only in LD with it (or them); this association might be absent in H. petiolaris alleles. An alternative possibility is that downstream differences in the genetic network regulating flavonol glycosides biosynthesis mask the effect of different HaMYB111 alleles.

      H. annuus and H. petiolaris hybridize frequently across their range, so this could be a recent introgression that has not established itself; alternatively, physiological differences in H. petiolaris could make the L allele less advantageous, so the introgressed allele is simply being maintained by drift (or recurring hybridization). Further analysis of genetic and functional diversity at HaMYB111 in H. petiolaris will be required to differentiate between these possibilities.

      We have added a few sentences highlighting some of these possible explanations at the end the main text of the manuscript, which now reads:

      “Despite a more limited range of variation for LUVp, a similar trend (larger UV patterns in drier, colder environments) is present also in H. petiolaris (Figure 4 – figure supplement 4). Interestingly, while the L allele at Chr_15 LUVp SNP is present in H. petiolaris (Figure 1 – figure supplement 2), it is found only at a very low frequency, and does not seem to significantly affect floral UV patterns in this species (Figure 2a). This could represent a recent introgression, since H. annuus and H. petiolaris are known to hybridize in nature (Heiser, 1947, Yatabe et al., 2007). Alternatively, the Chr_15 LUVp SNP might not be associated with functional differences in HaMYB111 in H. petiolaris, or differences in genetic networks or physiology between H. annuus and H. petiolaris could mask the effect of this allele, or limit its adaptive advantage, in the latter species.“

      Page 14: It's unclear to me why there is any need to discretize the LUVp values for the analyses presented here. Seems like it makes sense to either 1) analyze by genotype of plant at the Chr15 SNP, if known, or 2) treat it as a continuous variable and analyze accordingly.

      We designed our experiment to be a comparison between three well-defined phenotypic classes, to reduce the experimental noise inherent to pollinator visitation trials. As a consequence, intermediate phenotypic classes (0.3 < LUVp < 0.5 and 0.8 < LUVp < 0.95) are not represented in the experiment, and therefore we believe that analyzing LUVp as a continuous variable would be less appropriate in this case. In the revised manuscript, we have provided a modified Figure 4 – figure supplement 1 in which individual data points are show (colour-coded by pollinator type), as well as a fitted lines showing the general trend across the data.

      The individuals in pollinator visitation experiments were not genotyped for the Chr15_LUVp SNP; while having that information might provide a more direct link between HaMYB111 and pollinator visitation rates, our main interest in this experiment was to test the possible adaptive effects of variation in floral UV pigmentation.

      Page 14: I'm not sure you can infer selection from the % of plants grown in the experiment unless the experiment was a true random sample from a larger metapopulation that is homogenous for pollinator preference. In addition, I thought one of the Ashman papers had actually argued for intermediate level UV abundance in the presence of UV?

      We have removed mentions of selection from the sentence - while the 110 populations included in our 2019 common garden experiment were selected to represent the whole range of H. annuus, we agree that the pattern we observe is at best suggestive. We have, however, kept a modified version of the sentence in the revised version of the manuscript, since we believe that is an interesting observation. The sentence now reads:

      “Pollination rates are known to be yield-limiting in sunflower (Greenleaf and Kremen, 2006), and a strong reduction in pollination could therefore have a negative effect on fitness; consistent with this plants with very small LUVp values were rare (~1.5% of individuals) in our common garden experiment, which was designed to provide a balanced representation of the natural range of H. annuus.”. (new lines 373-378)

      It is correct that Koski et al., Nature Plants 2015 found intermediate UV patterns to increase pollen viability in excised flowers of Argentina anserina exposed to artificial UV radiation. However, the authors also remark that larger UV patterns would probably be favoured in natural environments, in which UV radiation would be more than two times higher than in their experimental setting. Additionally, when using artificial flowers, they found that pollen viability increased linearly with the size of floral UV pattern.

      More generally, as we discuss later on in the manuscript, the pollen protection mechanism proposed in Koski et al., Nature Plants 2015 is unlikely to be as important in sunflower inflorescences, which are much flatter than the bowl- shaped flowers of A. anserina; consistent with this, and contrary to what was observed for A. anserina, we found no correlation between UV radiation and floral UV patterns in wild sunflowers (Figure 4c).

      I would reduce or remove the text around L316-321. If there's good a priori reason to believe flower heat isn't a big deal (L. 323) and the experimental data back that up, why add 5 lines talking up the hypothesis?

      We had fairly strong reasons to believe temperature might play an important role in floral UV pattern diversity: a link between flower temperature and UV patterns has been proposed before (Koski et al., Current Biol 2020); a very strong correlation exists between temperature and LUVp in our dataset; and, perhaps more importantly, inflorescence temperature is known to have a major effect on pollinator attraction (Atamian et al., Science 2016; Creux et al., New Phytol 2021). While it is known that UV radiation is not particularly energetic, we didn’t mean line 323 to imply that we were sure a priori that there wouldn’t be any effect of UV patterns of inflorescence temperature.

      In the revised manuscript, we have re-organized that section and provided the information reported in line 323 (UV radiation accounts for only 3-7% of the total radiation at earth level) before the experimental results, to clarify what our thought process was in designing those experiments. The paragraph now reads:

      “By absorbing more radiation, larger UV bullseyes could therefore contribute to increasing temperature of the sunflower inflorescences, and their attractiveness to pollinators, in cold climates. However, UV wavelengths represents only a small fraction (3-7%) of the solar radiation reaching the Earth surface (compared to >50% for visible wavelengths), and might therefore not provide sufficient energy to significantly warm up the ligules (Nunez et al., 1994). In line with this observation, different levels of UV pigmentation had no effect on the temperature of inflorescences or individual ligules exposed to sunlight (Figure 4e-g; Figure 4 – figure supplement 3).”

      Page 17: The discussion of flower size is interesting. Is there any phenotypic or genetic correlation between LUVP and flower size?

      This is a really interesting question! There is no obvious genetic correlation between LUVp and flower size – in GWAS, HaMYB111 is not associated to any of the floral characteristics we measured (flowerhead diameter; disk diameter; ligule length; ligule width; relative ligule size; see Todesco et al., Nature 2020). There is also no significant association between ligule length and LUVp (R^2 = 0.0024, P = 0.1282), and only a very weak positive association between inflorescence size and LUVp (R^2 = 0.0243, P = 0.00013; see attached figure). There is, however, a stronger positive correlation between LUVp and disk size (the disk being the central part of the sunflower inflorescence, composed of the fertile florets; R^2 = 0.1478. P = 2.78 × 10-21), and as a consequence a negative correlation between LUVp and relative ligule size (that is, the length of the ligule relative to the diameter of the whole inflorescence; R^2 = 0.1216, P = 1.46 × 10-17). This means that, given an inflorescence of the same size, plants with large LUVp values will tend to have smaller ligules and larger discs. Since the disk of sunflower inflorescences is uniformly UV- absorbing, this would further increase the size of UV-absorbing region in these inflorescences.

      While it is tempting to speculate that this might be connected with regulation of transpiration (meaning that plants with larger LUVp further reduce transpiration from ligules by having smaller ligules - relative ligule size is also positively correlated with summer humidity; R^2 = 0.2536, P = 2.86 × 10_-5), there are many other fitness-related factors that could determine inflorescence size, and disk size in particular (seed size, florets/seed number...). Additionally, in common garden experiments, flowerhead size (and plant size in general) is affected by flowering time, which is also one of the reason why we use LUVp to measure floral UV patterns instead of absolute measurements of bullseye size; in a previous work from our group in Helianthus argophyllus, size measurements for inflorescence and UV bullseye mapped to the same locus as flowering time, while genetic regulation of LUVp was independent of flowering time (Moyers et al., Ann Bot 2017). Flowering time in H. annuus is known to be strongly affected by photoperiod (Blackman et al., Mol Ecol 2011), meaning that the flowering time we measured in Vancouver might not reflect the exact flowering time in the populations of origin of those plants – with consequences on inflorescence size.

      In summary, there is an interesting pattern of concordance between floral UV pattern and some aspects of inflorescence morphology, but we think it would be premature to draw any inference from them. Measurements of inflorescence parameters in natural populations would be much more informative in this respect.

      Reviewer #2 (Public Review):

      The genetic analysis is rigorously conducted with multiple Helianthus species and accessions of H. annuus. The same QTL was inputed in two Helianthus species, and fine mapped to promotor regions of HaMyb111.

      While there is a significant association at the beginning of chr. 15 in the GWAS for H. petiolaris petiolaris, we should clarify that that peak is unfortunately ~20 Mbp away from HaMYB111. While it is not impossible that the difference is due to reference biases in mapping H. petiolaris reads to the cultivated H. annuus genome, the most conservative explanation is that those two QTL are unrelated. We have clarified this in the legend to Fig. 2 in the revised manuscript.

      The allelic variation of the TF was carefully mapped in many populations and accessions. Flavonol glycosides were found to correlate spatially and developmentally in ligules and correlate with Myb111 transcript abundances, and a downstream flavonoid biosynthetic gene. Heterologous expression in Arabidopsis in Atmyb12 mutants, showed that HaMyb111 to be able to regulate flavonol glycoside accumulations, albeit with different molecules than those that accumulate in Helianthus. Several lines of evidence are consistent with transcriptional regulation of myb111 accounting for the variation in bullseye size.

      Functional analysis examined three possible functional roles, in pollinator attraction, thermal regulation of flowers, and water loss in excised flowers (ligules?), providing support for the first and last, but not the second possible functions, confirming the results of previous studies on the pollinator attraction and water loss functions for flavonol glycosides. The thermal imaging work of dawn exposed flower heads provided an elegant falsification of the temperature regulation hypothesis. Biogeographic clines in bullseye size correlated with temperature and humidity clines, providing a confirmation of the hypothesis posed by Koski and Ashmann about the patterns being consistent with Gloger's rule, and historical trends from herbaria collections over climate change and ozone depletion scenarios. The work hence represents a major advance from Moyers et al. 2017's genetic analysis of bullseyes in sunflowers, and confirms the role established in Petunia for this Myb TF for flavonoid glycoside accumulations, in a new tissue, the ligule.

      Thank you. We have specified in the legend of Fig. 4i of the revised manuscript that desiccation was measured in individual detached ligules, and added further details about the experiment in the Methods section.

      While there is a correlation between pigmentation and temperature/humidity in our dataset, it goes in the opposite direction to what would be expected under Gloger’s rule – that is, we see stronger pigmentation in drier/colder environments, contrary to what is generally observed in animals. This is also contrary to what observed in Koski and Ashman, Nature Plants 2015, where the authors found that floral UV pigmentation increased at lower latitudes and higher levels of UV radiation. While possibly rarer, such “anti-Gloger” patterns have been observed in plants before (Lev-Yadun, Plant Signal Behav 2016).

      Weakness: The authors were not able to confirm their inferences about myb111 function through direct manipulations of the locus in sunflower.

      That is unfortunately correct. Reliable and efficient transformation of cultivated sunflower (much less of wild sunflower species) has eluded the sunflower community (including our laboratories) so far – see for example discussion on the topic in Lewi et al. Agrobacterium protocols 2016, and Sujatha et al. PCTOC 2012. We had therefore to rely on heterologous complementation in Arabidopsis; while this approach has limitations, we believe that its results, given also the similarity in expression patterns between HaMYB111 and AtMYB111, and in combination with the other experiments reported in our manuscript, make a convincing case that HaMYB111 regulates flavonol glycosides accumulation in sunflower ligules.

      Given that that the flavonol glycosides that accumulate in Helianthus are different from those regulated when the gene is heterologously expressed in Arabidopsis, the biochemical function of Hamyb111, while quite reasonable, is not completely watertight. The flavonol glycosides are not fully characterized (only Ms/Ms data are provided) and named only with cryptic abbreviations in the main figures.

      We believe that the fact that expression of HaMYB111 in the Arabidopsis myb111 mutant reproduces the very same pattern of flavonol glycosides accumulation found in wild type Col-0 is proof that its biochemical function is the same as that of the endogenous AtMYB111 gene – that is, HaMYB111 induces expression of the same genes involved in flavonol glycosides biosynthesis in Arabidopsis. Differences in function between HaMYB11 and AtMYB111 would have resulted in different flavonol profiles between wild type Col-0 and 35S::HaMYB111 myb111 lines. It should be noted that the known direct targets of AtMYB111 in Arabidopsis are genes involved in the production of the basic flavonol aglycone (Strake et al., Plant J 2007). Differences in flavonol glycoside profiles between the two species are likely due to broader differences between the genetic networks regulating flavonol biosynthesis: additional layers of regulation of the genes targeted by MYB111, or differential regulation (or presence/absence variation) of genes controlling downstream flavonol glycosylation and conversion between different flavonols.

      In the revised manuscript, we have added the full names of all identified peaks to the legend of Figures 3a,b,e.

      This and the differences in metabolite accumulations between Arabidopsis and Helianthus becomes a bit problematic for the functional interpretations. And here the authors may want to re-read Gronquist et al. 2002: PNAS as a cautionary tale about inferring function from the spatial location of metabolites. In this study, the Eisner/Meinwald team discovered that imbedded in the UV-absorbing floral nectar guides amongst the expected array of flavonoid glycosides, were isoprenilated phloroglucinols, which have both UV-absorbing and herbivore defensive properties. Hence the authors may want to re-examine some of the other unidentified metabolites in the tissues of the bullseyes, including the caffeoyl quinic acids, for alternative functional hypotheses for their observed variation in bullseye size (eg. herbivore defense of ligules).

      This is a good point, and we have included a mention of a more explicit mention possible role of caffeoyl quinic acid (CQA) as a UV pigment in the main text, as well as highlighted at the end of the manuscript other possible factors that could contribute to variation for floral UV patterns in wild sunflowers.

      We should note, however, that CQA plays a considerably smaller role than flavonols in explaining UV absorbance in UV-absorbing (parts of) sunflower ligules, and the difference in abundance with respect to UV-reflecting (parts of) ligules is much less obvious than for flavonols (height of the absorbance peak is reduced only 2-3 times in UV- reflecting tissues for CQA, vs. 7-70 fold reductions for individual quercetin glycosides). Therefore, flavonols are clearly the main pigment responsible for UV patterning in ligules. This is in contrast with the situation for Hypericum calycinum reported in Gronquist et al., PNAS 2002, were dearomatized isoprenylated phloroglucinols (DIPs) are much more abundant than flavonols in most floral tissue, including petals. The localization of DIPs accumulation, in reproductive organs and on the abaxial (“lower”) side of the petals (so that they would be exposed when the flower is closed), is also more consistent with a role in prevention of herbivory; no UV pigmentation is found on the adaxial (“upper”) part of petals in this species, which would be consistent with a role in pollinator attraction.

      The hypotheses regarding a role for the flavonoid glycosides regulated by Myb111 expression in transpirational mitigation and hence conferring a selective advantage under high temperatures and low and high humidities, are not strongly supported by the data provided. The water loss data from excised flowers (or ligules-can't tell from the methods descriptions) is not equivalent to measures of transpiration rates (the stomatal controlled release of water), which are better performed with intact flowers by porometry or other forms of gas-exchange measures. Excised tissues tend to have uncontrolled stomatal function, and elevated cuticular water loss at damaged sites. The putative fitness benefits of variable bullseye size under different humidity regimes, proposed to explain the observed geographical clines in bullseye size remain untested.

      We have clarified in the text and methods section that the desiccation experiments were performed on detached ligules. We agree that the results of this experiments do not constitute a direct proof that UV patterns/flavonol levels have an impact on plant fitness under different humidities in the wild – our aim was simply to provide a plausible physiological explanation for the correlation we observe between floral UV patterns and relative humidity. However, we do believe they are strongly suggestive of a role for floral flavonol/UV patterns in regulating transpiration, which is consistent with previous observations that flowers are a major source of transpiration in plants (Galen et al., Am Nat 2000, and other references in the manuscript). As suggested also by other reviewers, we have softened our interpretation of these result to clarify that they are suggestive, but not proof, of a connection between floral UV patterns, ligule transpiration and environmental humidity levels.

      “While desiccation rates are only a proxy for transpiration in field conditions (Duursma et al. 2019, Hygen et al. 1951), and other factors might affect ligule transpiration in this set of lines, this evidence (strong correlation between LUVp and summer relative humidity; known role of flavonol glycosides in regulating transpiration; and correlation between extent of ligule UV pigmentation and desiccation rates) suggests that variation in floral UV pigmentation in sunflowers is driven by the role of flavonol glycosides in reducing water loss from ligules, with larger floral UV patterns helping prevent drought stress in drier environments.” (new lines 462-469)

      Detached ligules were chosen to avoid confounding the results should differences in the physiology of the rest of the inflorescence/plant between lines also affect rates of water loss. Desiccation/water loss measurements were performed for consistency with the experiments reported in Nakabayashi et al Plant J. 2014, in which the effects of flavonol accumulation (through overexpression of AtMYB12) on water loss/drought resistance were first reported. It should also be noted that the use of detached organs to study the effect of desiccation on transpiration, water loss and drought responses is common in literature (see for example Hygen, Physiol Plant 1951; Aguilar et al., J Exp Bot 2000; Chen et al., PNAS 2011; Egea et al., Sci Rep 2018; Duursma et al., New Phytol 2019, among others). While removing the ligules create a more stressful/artificial situation, mechanical factors are likely to affect all ligules and leaves in the same way, and we can see no obvious reason why that would affect the small LUVp group more than the large LUVp group (individuals in the two groups were selected to represent several geographically unrelated populations).

      We have included some of the aforementioned references to the main text and Methods sections in the revised manuscript to support our use of this experimental setup.

      Alternative functional hypotheses for the observed variation in bullseye size in herbivore resistance or floral volatile release could also be mentioned in the Discussion. Are the large ligules involved in floral scent release?

      We have added sentences in the Results and Discussion, and Conclusions section in the revised manuscript to explore possible additional factors that could influence patterns of UV pigmentation across sunflower populations, including resistance to herbivory and floral volatiles. While some work has been done to characterize floral volatiles in sunflower (e.g. Etievant et al. J. Agric. Food Chem; Pham-Delegue et al. J. Chem. Ecol. 1989), to our knowledge the role of ligules in their production has not been investigates.

      In the revised manuscript, the section “A dual role for floral UV pigmentation” now includes the sentences:

      “Although pollinator preferences in this experiment could still affected by other unmeasured factors (nectar content, floral volatiles), these results are consistent with previous results showing that floral UV patterns play a major role in pollinator attraction (Horth et al., 2014, Koski ad Ashman, 2014, Rae and Vamosi, 2013, Sheehan et al., 2016).” (new lines 378-381)

      And the Conclusions sections includes the sentence:

      “It should be noted that, while we have examined some of the most likely factors explaining the distribution of variation for floral UV patterns in wild H. annuus across North America, other abiotic factors could play a role, as well as biotic ones (e.g. the aforementioned differences in pollinator assemblages, or a role of UV pigments in protection from herbivory (Gronquist et al., 2001)).” (new lines 540-544)

      Reviewer #3 (Public Review):

      Todesco et al undertake an ambitious study to understand UV-absorbing variation in sunflower inflorescences, which often, but not always display a "bullseye" pattern of UV-absorbance generated by ligules of the ray flowers. [...] I think this manuscript has high potential impact on science on both of these fronts.

      Thank you! We are aware that our experiments do not provide a direct link between UV patterns and fitness in natural populations (although we think they are strongly suggestive) and that, as pointed out also by other reviewers, there are other possible (unmeasured) factors that could explain or contribute to explain the patterns we observed. In the revised manuscript we have better characterized the aims and interpretation of our desiccation experiment, and modified the main text to acknowledge other possible factors affecting pollination preferences (nectar production, floral volatiles) and variation for floral UV patterns in H. annuus (pollinator assemblages, resistance to herbivory).

    1. Author Response

      Reviewer #1 (Public Review):

      The work by Yijun Zhang and Zhimin He at al. analyzes the role of HDAC3 within DC subsets. Using an inducible ERT2-cre mouse model they observe the dependency of pDCs but not cDCs on HDAC3. The requirement of this histone modifier appears to be early during development around the CLP stage. Tamoxifen treated mice lack almost all pDCs besides lymphoid progenitors. Through bulk RNA seq experiment the authors identify multiple DC specific target gens within the remaining pDCs and further using Cut and Tag technology they validate some of the identified targets of HDAC3. Collectively the study is well executed and shows the requirement of HDAC3 on pDCs but not cDCs, in line with the recent findings of a lymphoid origin of pDC.

      1) While the authors provide extensive data on the requirement of HDAC3 within progenitors, the high expression of HDAC3 in mature pDCs may underly a functional requirement. Have you tested INF production in CD11c cre pDCs? Are there transcriptional differences between pDCs from HDAC CD11c cre and WT mice?

      We greatly appreciate the reviewer’s point. We have confirmed that Hdac3 can be efficiently deleted in pDCs of Hdac3fl/fl-CD11c Cre mice (Figure 5-figure supplement 1 in revised manuscript). Furthermore, in those Hdac3fl/fl-CD11c Cre mice, we have observed significantly decreased expression of key cytokines (Ifna, Ifnb, and Ifnl) by pDCs upon activation by CpG ODN (shown in Author response image 1). Therefore, HDAC3 is also required for proper pDC function. However, we have yet to conduct RNA-seq analysis comparing pDCs from HDAC CD11c cre and WT mice.

      Author response image 1.

      Cytokine expression in Hdac3 deficient pDCs upon activation

      2) A more detailed characterization of the progenitor compartment that is compromised following depletion would be important, as also suggested in the specific points.

      We thank the reviewer for this constructive suggestion. We have performed thorough analysis of the phenotype of hematopoietic stem cells and progenitor cells at various developmental stages in the bone marrow of Hdac3 deficient mice, based on the gating strategy from the recommended reference. Briefly, we analyzed the subpopulations of progenitors based on the description in the published report by "Pietras et al. 2015", namely MPP2, MPP3 and MPP4, using the same gating strategy for hematopoietic stem/progenitor cells. As shown in Author response image 2 and Author response image 3, we found that the number of LSK cells was increased in Hdac3 deficient mice, especially the subpopulations of MPP2 and MPP3, whereas no significant changes in MPP4. In contrast, the numbers of LT-HSC, ST-HSC and CLP were all dramatically decreased. This result has been optimized and added as Figure 3A in revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 6 Line 164-168.

      Author response image 2.

      Gating strategy for hematopoietic stem/progenitor cells in bone marrow.

      Author response image 3.

      Hematopoietic stem/progenitor cells in Hdac3 deficient mice

      Reviewer #2 (Public Review):

      In this article Zhang et al. report that the Histone Deacetylase-3 (HDAC3) is highly expressed in mouse pDC and that pDC development is severely affected both in vivo and in vitro when using mice harbouring conditional deletion of HDAC3. However, pDC numbers are not affected in Hdac3fl/fl Itgax-Cre mice, indicating that HDCA3 is dispensable in CD11c+ late stages of pDC differentiation. Indeed, the authors provide wide experimental evidence for a role of HDAC3 in early precursors of pDC development, by combining adoptive transfer, gene expression profiling and in vitro differentiation experiments. Mechanistically, the authors have demonstrated that HDAC3 activity represses the expression of several transcription factors promoting cDC1 development, thus allowing the expression of genes involved in pDC development. In conclusion, these findings reveals HDAC3 as a key epigenetic regulator of the expression of the transcription factors required for pDC vs cDC1 developmental fate.

      These results are novel and very promising. However, supplementary information and eventual further investigations are required to improve the clarity and the robustness of this article.

      Major points

      1) The gating strategy adopted to identify pDC in the BM and in the spleen should be entirely described and shown, at least as a Supplementary Figure. For the BM the authors indicate in the M & M section that they negatively selected cells for CD8a and B220, but both markers are actually expressed by differentiated pDC. However, in the Figures 1 and 2 pDC has been shown to be gated on CD19- CD11b- CD11c+. What is the precise protocol followed for pDC gating in the different organs and experiments?

      We apologize for not clearly describing the protocols used in this study. Please see the detailed gating strategy for pDC in bone marrow, and for pDC and cDC in spleen (Figure 4 and Figure 5). These information are now added to Figure1−figure supplement 3, The relevant description has been underlined in Page 5 Line 113-116, in revised manuscript.

      We would like to clarify that in our study, we used two different panels of antibody cocktails, one for bone marrow Lin- cells, including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19; the other for DC enrichment, including mAbs to CD3/CD90/TER-119/Ly6G/CD19. We included B220 in the Lineage cocktails to deplete B cells and pDCs, in order to enrich for the progenitor cells from bone marrow. However, when enriching for the pDC and cDC, B220 or CD8a were not included in the cocktail to avoid depletion of pDC and cDC1 subsets . For the flow cytometry analysis of pDCs, we gated pDCs as the CD19−CD11b−CD11c+B220+SiglecH+ population in both bone marrow and spleen. The relevant description has been underlined in the revised manuscript Page 16 Line 431-434.

      2) pDC identified in the BM as SiglecH+ B220+ can actually contain DC precursors, that can express these markers, too. This could explain why the impact of HDAC3 deletion appears stronger in the spleen than in the BM (Figures 1A and 2A). Along the same line, I think that it would important to show the phenotype of pDC in control vs HDAC3-deleted mice for the different pDC markers used (SiglecH, B220, Bst2) and I would suggest to include also Ly6D, taking also in account the results obtained in Figures 4 and 7. Finally, as HDCA3 deletion induces downregulation of CD8a in cDC1 and pDC express CD8a, it would important to analyse the expression of this marker on control vs HDAC3-deleted pDC.

      We agree with the reviewer’s points. In the revised manuscript, we incorporated major surface markers, including Siglec H, B220, Ly6D, and PDCA-1, all of which consistently demonstrated a substantial decrease in the pDC population in Hdac3 deficient mice. Moreover, we did notice that Ly6D+ pDCs showed higher degree of decrease in Hdac3 deficient mice. Additionally, percentage and number of both CD8+ pDC and CD8- pDC were decreased in Hdac3 deficient mice (Author response image 4). These results are shown in Figure1−figure supplement 4 of the revised manuscript. The relevant description has been added and underlined in the revised manuscript Page 5 Line 121-125.

      Author response image 4.

      Bone marrow pDCs in Hdac3 deficient mice revealed by multiple surface markers

      3) How do the authors explain that in the absence of HDAC3 cDC2 development increased in vivo in chimeric mice, but reduced in vitro (Figures 2B and 2E)?

      As shown in the response to the Minor point 5 of Reviewer#1. Briefly, we suggested that the variabilities maybe explained by the timing of anaysis after HDAC3 deletion. In Figure 2C, we analyzed cells from the recipients one week after the final tamoxifen treatment and observed no significant change in the percentage of cDC2 when further pooled all the experiment data. In Figure 2E, where tamoxifen was administered at Day 0 in Flt3L-mediated DC differentiation in vitro, the DC subsets generated were then analyzed at different time points. We observed no significant changes in cDCs and cDC2 at Day 5, but decreases in the percentage of cDC2 were observed at Day 7 and Day 9. This suggested that the cDC subsets at Day 5 might have originated from progenitors at a later stage, while those at Day 7 and Day 9 might originate form the earlier progenitors. Therefore, based on these in vitro and in vivo experiments, we believe that the variation in the cDC2 phenotype might be attributed to the progenitors at different stages that generated these cDCs.

      4) More generally, as reported also by authors (line 207), the reconstitution with HDAC3-deleted cells is poorly efficient. Although cDC seem not to be impacted, are other lymphoid or myeloid cells affected? This should be expected as HDAC3 regulates T and B development, as well as macrophage function. This should be important to know, although this does not call into question the results shown, as obtained in a competitive context.

      In this study, we found no significant influence on T cells, mature B cells or NK cells, but immature B cells were significantly decreased, in Hdac3-ERT2-Cre mice after tamoxifen treatment (Figure 6). However, in the bone marrow chimera experiments, the numbers of major lymphoid cells were decreased due to the impaired reconstitution capacity of Hdac3 deficient progenitors. Consistent with our finding, it has been reported that HDAC3 was required for T cell and B cell generation, in HDAC3-VavCre mice (Summers et al., 2013), and was necessary for T cell maturation (Hsu et al., 2015). Moreover, HDAC3 is also required for the expression of inflammatory genes in macrophages upon activation (Chen et al., 2012; Nguyen et al., 2020).

      5) What are the precise gating strategies used to identify the different hematopoietic precursors in the Figure 4 ? In particular, is there any lineage exclusion performed?

      We apologize for not describing the experimental procedures clearly. In this study we enriched the lineage negative (Lin−) cells from the bone marrow using a Lineage-depleting antibody cocktail including mAbs to CD2/CD3/TER-119/Ly6G/B220/CD11b/CD8/CD19. We also provide the gating strategy implemented for sorting LSK and CDP populations from the Lin− cells in the bone marrow (Author response image 5), shown in the Figure 3A and Figure4−figure supplement 1 of revised manuscript.

      Author response image 5.

      Gating strategy for LSK, CD115+ CDP and CD115− CDP in bone marrow

      6) Moreover, what is the SiglecH+ CD11c- population appearing in the spleen of mice reconstituted with HDAC3-deleted CDP, in Fig 4D?

      We also noticed the appearance of a SiglecH+CD11c− cell population in the spleen of recipient mice reconstituted with HDAC3-deficient CD115−CDPs, while the presence of this population was not as significant in the HDAC3-Ctrl group, as shown in Figure 4D. We speculate that this SiglecH+CD11c− cell population might represent some cells at a differentiation stage earlier than pre-DCs. Alternatively, the relatively increased percentage of this population derived from HDAC3-deficient CD115−CDP might be due to the substantially decreased total numbers of DCs. This could be clarified by further analysis using additional cell surface markers.

      7) Finally, in Fig 4H, how do the authors explain that Hdac3fl/fl express Il7r, while they are supposed to be sorted CD127- cells?

      This is indeed an interesting question. In this study, we confirmed that CD115−CDPs were isolated from the surface CD127− cell population for RNA-seq analysis, and the purity of the sorted cells were checked (Author response image 6), as shown in Figure4−figure supplement 1 in revised manuscript.

      The possible explanation for the expression of Il7r mRNA in some HDAC3fl/fl CD115−CDPs, as revealed in Figure 4H by RNA-seq analysis, could be due to a very low level of cell surface expression of CD127, these cells therefore could not be efficiently excluded by sorting for surface CD127- cells.

      Author response image 6.

      CD115−CDPs sorting from Hdac3-Ctrl and Hdac3-KO mice

      8) What is known about the expression of HDAC3 in the different hematopoietic precursors analysed in this study? This information is available only for a few of them in Supplementary Figure 1. If not yet studied, they should be addressed.

      We conducted additional analysis to address the expression of Hdac3 in various hematopoietic progenitor cells at different stages, based on the RNA-seq analyis. The data revealed a relatively consistent level of Hdac3 expression in progenitor populations, including HSC, MMP4, CLP, CDP and BM pDCs (Author response image 7). That suggests that HDAC3 may play an important role in the regulation of hematopoiesis at multiple stages. This information is now added in Figure1−figure supplement 1B of revised manuscript.

      Author response image 7.

      Hdac3 expression in hematopoietic progenitor cells

      9) It would be highly informative to extend CUT and Tag studies to Irf8 and Tcf4, if this is technically feasible.

      We totally agree with the reviewer. We have indeed attempted using CUT and Tag study to compare the binding sites of IRF8 and TCF4 in wild-type and Hdac3-deficient pDCs. However, it proved that this is technically unfeasible to get reliable results due to the limited number of cells we could obtain from the HDAC3 deficient mice. We are committed to explore alternative approaches or technologies in future studies to address this issue.

    1. Author Response:

      Reviewer #1:

      1) The user manual and tutorial are well documented, although the actual code could do with more explicit documentation and comments throughout. The overall organisation of the code is also a bit messy.

      We have now implemented an ongoing, automated code review via Codacy (https://app.codacy.com/gh/caseypaquola/BigBrainWarp/dashboard). The grade is published as a badge on GitHub. We improved the quality of the code to an A grade by increasing comments and fixing code style issues. Additionally, we standardised the nomenclature throughout the toolbox to improve consistency across scripts and we restructured the bigbrainwarp function.

      2) My understanding is that this toolbox can take maps from BigBrain to MRI space and vice versa, but the maps that go in the direction BigBrain->MRI seem to be confined to those provided in the toolbox (essentially the density profiles). What if someone wants to do some different analysis on the BigBrain data (e.g. looking at cellular morphology) and wants that mapped onto MRI spaces? Does this tool allow for analyses that involve the raw BigBrain data? If so, then at what resolution and with what scripts? I think this tool will have much more impact if that was possible. Currently, it looks as though the 3 tutorial examples are basically the only thing that can be done (although I may be lacking imagination here).

      The bigbrainwarp function allows input of raw BigBrain data in volume and surface forms. For volumetric inputs, the image must be aligned to the full BigBrain or BigBrainSym volume, but the function is agnostic to the input voxel resolution. We have also added an option for the user to specify the output voxel resolution. For example,

      bigbrainwarp --in_space bigbrain --in_vol cellular_morphology_in_bigbrain.nii \ --interp linear --out_space icbm --out_res 0.5 \ --desc cellular_morphology --wd working_directory

      where “cellular_morphology_in_bigbrain.nii” was generated from a BigBrain volume (see Table 2 below for all parameters). The BigBrain volume may be the 100-1000um resolution images provided on the ftp or a resampled version of these images, as long as the full field of view is maintained. For surface-based inputs, the data must contain a value for each vertex of the BigBrain/BigBrainSym mesh. We have clarified these points in the Methods, illustrated the potential transformations in an extended Figure 3 and highlighted the distinctiveness of the tutorial transformations in the Results.

      3) An obvious caveat to bigbrain is that it is a single brain and we know there are sometimes substantial individual variations in e.g. areal definition. This is only slightly touched upon in the discussion. Might be worth commenting on this more. As I see it, there are multiple considerations. For example (i) Surface-to-Surface registration in the presence of morphological idiosyncracies: what parts of the brain can we "trust" and what parts are uncertain? (ii) MRI parcellations mapped onto BigBrain will vary in how accurately they may reflect the BigBrain areal boundaries: if histo boundaries do not correspond with MRI-derived ones, is that because BigBrain is slightly different or is it a genuine divergence between modalities? Of course addressing these questions is out of scope of this manuscript, but some discussion could be useful; I also think this toolbox may be useful for addressing this very concerns!

      We agree that these are important questions and hope that BigBrainWarp will propel further research. Here, we consider these questions from two perspectives; the accuracy of the transformations and the potential influence of individual variation. For the former, we conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for BigBrainWarp users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. For the latter, we expanded our Discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      Figure 2: Evaluating BigBrain-MRI transformations. A) Volume-based transformations i. Jacobian determinant of deformation field shown with a sagittal slice and stratified by lobe. Subcortical+ includes the shape priors (as described in Methods) and the + connotes hippocampus, which is allocortical. Lobe labels were defined based on assignment of CerebrA atlas labels (Manera et al., 2020) to each lobe. ii. Sagittal slices illustrate the overlap of native ICBM2009b and transformed subcortical+ labels. iii. Superior view of anatomical fiducials (Lau et al., 2019). iv. Violin plots show the DICE coefficient of regional overlap (ii) and landmark misregistration (iii) for the BigBrainSym and Xiao et al., approaches. Higher DICE coefficients shown improved registration of subcortical+ regions with Xiao et al., while distributions of landmark misregistration indicate similar performance for alignment of anatomical fiducials. B) Surface-based transformations. i. Inflated BigBrain surface projections and ridgeplots illustrate regional variation in the distortions of the mesh invoked by the modified MSMsulc+curv pipeline. ii. Eighteen anatomical landmarks shown on the inflated BigBrain surface (above) and inflated fsaverage (below). BigBrain landmarks were transformed to fsaverage using the modified MSMsulc+curv pipeline. Accuracy of the transformation was calculated on fsaverage as the geodesic distance between landmarks transformed from BigBrain and the native fsaverage landmarks. iii. Sulcal depth and curvature maps are shown on inflated BigBrain surface. Violin plots show the improved accuracy of the transformation using the modified MSMsulc+curv pipeline, compared to a standard MSMsulc approach.

      Discussion (P.18):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      Discussion (P.19):

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of areal boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject- specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #2:

      This is a nice paper presenting a review of recent developments and research resulting from BigBrain and a tutorial guiding use of the BigBrainWarp toolbox. This toolbox supports registration to, and from, standard MRI volumetric and surface templates, together with mapping derived features between spaces. Examples include projecting histological gradients estimated from BigBrain onto fsaverage (and the ICMB2009 atlas) and projecting Yeo functional parcels onto the BigBrain atlas.

      The key strength of this paper is that it supports and expands on a comprehensive tutorial and docker support available from the website. The tutorials there go into even more detail (with accompanying bash scripts) of how to run the full pipelines detailed in the paper. The docker makes the tool very easy to install but I was also able to install from source. The tutorials are diverse examples of broad possible applications; as such the combined resource has the potential to be highly impactful.

      The minor weaknesses of the paper relate to its clarity and depth. Firstly, I found the motivations of the paper initially unclear from the abstract. I would recommend much more clearly stating that this is a review paper of recent research developments resulting from the BigBrain atlas, and a tutorial to accompany the bash scripts which apply the warps between spaces. The registration methodology is explained elsewhere.

      In the revised Abstract (P.1), we emphasise that the manuscript involves a review of recent literature, the introduction of BigBrainWarp, and easy-to-follow tutorials to demonstrate its utility.

      “Neuroimaging stands to benefit from emerging ultrahigh-resolution 3D histological atlases of the human brain; the first of which is “BigBrain”. Here, we review recent methodological advances for the integration of BigBrain with multi-modal neuroimaging and introduce a toolbox, “BigBrainWarp", that combines these developments. The aim of BigBrainWarp is to simplify workflows and support the adoption of best practices. This is accomplished with a simple wrapper function that allows users to easily map data between BigBrain and standard MRI spaces. The function automatically pulls specialised transformation procedures, based on ongoing research from a wide collaborative network of researchers. Additionally, the toolbox improves accessibility of histological information through dissemination of ready-to-use cytoarchitectural features. Finally, we demonstrate the utility of BigBrainWarp with three tutorials and discuss the potential of the toolbox to support multi-scale investigations of brain organisation.”

      I also found parts of the paper difficult to follow - as a methodologist without comprehensive neuroanatomical terminology, I would recommend the review of past work to be written in a more 'lay' way. In many cases, the figure captions also seemed insufficient at first. For example it was not immediately obvious to me what is meant by 'mesiotemporal confluence' and Fig 1G is not referenced specifically in the text. In Fig 3C it is not immediately clear from the text of the caption that the cortical image is representing the correlation from the plots - specifically since functional connectivity is itself estimated through correlation.

      In the updated manuscript, we have tried to remove neuroanatomical jargon and clearly define uncommon terms at the first instance in text. For example,

      “Evidence has been provided that cortical organisation goes beyond a segregation into areas. For example, large- scale gradients that span areas and cytoarchitectonic heterogeneity within a cortical area have been reported (Amunts and Zilles, 2015; Goulas et al., 2018; Wang, 2020). Such progress became feasible through integration of classical techniques with computational methods, supporting more observer-independent evaluation of architectonic principles (Amunts et al., 2020; Paquola et al., 2019; Schiffer et al., 2020; Spitzer et al., 2018). This paves the way for novel investigations of the cellular landscape of the brain.”

      “Using the proximal-distal axis of the hippocampus, we were able to bridge the isocortical and hippocampal surface models recapitulating the smooth confluence of cortical types in the mesiotemporal lobe, i.e. the mesiotemporal confluence (Figure 1G).”

      “Here, we illustrate how we can track resting-state functional connectivity changes along the latero-medial axis of the mesiotemporal lobe, from parahippocampal isocortex towards hippocampal allocortex, hereafter referred to as the iso-to-allocortical axis.”

      Additionally, we have expanded the captions for clarity. For example, Figure 3:

      “C) Intrinsic functional connectivity was calculated between each voxel of the iso-to-allocortical axis and 1000 isocortical parcels. For each parcel, we calculated the product-moment correlation (r) of rsFC strength with iso-to- allocortical axis position. Thus, positive values (red) indicate that rsFC of that isocortical parcel with the mesiotemporal lobe increases along the iso-to-allocortex axis, whereas negative values (blue) indicate decrease in rsFC along the iso-to-allocortex axis.”

      My minor concern is over the lack of details in relation to the registration pipelines. I understand these are either covered in previous papers or are probably destined for bespoke publications (in the case of the surface registration approach) but these details are important for readers to understand the constraints and limitations of the software. At this time, the details for the surface registration only relate to an OHBM poster and not a publication, which I was unable to find online until I went through the tutorial on the BigBrain website. In general I think a paper should have enough information on key techniques to stand alone without having to reference other publications, so, in my opinion, a high level review of these pipelines should be added here.

      There isn't enough details on the registration. For the surface, what features were used to drive alignment, how was it parameterised (in particular the regularisation - strain, pairwise or areal), how was it pre-processed prior to running MSM - all these details seem to be in the excellent poster. I appreciate that work deserves a stand alone publication but some details are required here for users to understand the challenges, constraints and limitations of the alignment. Similar high level details should be given for the registration work.

      We expanded descriptions of the registration strategies behind BigBrainWarp, especially so for the surface-based registration. Additionally, we created a new Figure to illustrate how the accuracy of the transformations may be evaluated.

      Methods (P.7-8):

      “For the initial BigBrain release (Amunts et al., 2013), full BigBrain volumes were resampled to ICBM2009sym (a symmetric MNI152 template) and MNI-ADNI (an older adult T1-weighted template) (Fonov et al., 2011). Registration of BigBrain to ICBM2009sym, known as BigBrainSym, involved a linear then a nonlinear transformation (available on ftp://bigbrain.loris.ca/BigBrainRelease.2015/). The nonlinear transformation was defined by a symmetric diffeomorphic optimiser [SyN algorithm, (Avants et al., 2008)] that maximised the cross- correlation of the BigBrain volume with inverted intensities and a population-averaged T1-weighted map in ICBM2009sym space. The Jacobian determinant of the deformation field illustrates the degree and direction of distortions on the BigBrain volume (Figure 2Ai top).

      A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, DICE coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE FIGURE 2 in Response to Reviewer #1)

      I would also recommend more guidance in terms of limitations relating to inter-subject variation. My interpretation of the results of tutorial 3, is that topographic variation of the cortex could easily be driving the greater variation of the frontal parietal networks. Either that, or the Yeo parcel has insufficient granularity; however, in that case any attempt to go to finer MRI driven parcellations - for example to the HCP parcellation, would create its own problems due to subject specific variability.

      We agree that inter-individual variation may contribute to the low predictive accuracy of functional communities by cytoarchitecture. We expanded upon this possibility in the revised Discussion (P. 19) and recommend that future studies examine the uncertainty of subject-specific topographies in concert with uncertainties of transformations.

      “These features depict the vast cytoarchitectural heterogeneity of the cortex and enable evaluation of homogeneity within imaging-based parcellations, for example macroscale functional communities (Yeo et al., 2011). The present analysis showed limited predictability of functional communities by cytoarchitectural profiles, even when accounting for uncertainty at the boundaries (Gordon et al., 2016). [...] Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large- scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #3:

      The authors make a point for the importance of considering high-resolution, cell-scale, histological knowledge for the analysis and interpretation of low-resolution MRI data. The manuscript describes the aims and relevance of the BigBrain project. The BigBrain is the whole brain of a single individual, sliced at 20µ and scanned at 1µ resolution. During the last years, a sustained work by the BigBrain team has led to the creation of a precise cell-scale, 3D reconstruction of this brain, together with manual and automatic segmentations of different structures. The manuscript introduces a new tool - BigBrainWarp - which consolidates several of the tools used to analyse BigBrain into a single, easy to use and well documented tool. This tool should make it easy for any researcher to use the wealth of information available in the BigBrain for the annotation of their own neuroimaging data. The authors provide three examples of utilisation of BigBrainWarp, and show the way in which this can provide additional insight for analysing and understanding neuroimaging data. The BigBrainWarp tool should have an important impact for neuroimaging research, helping bridge the multi-scale resolution gap, and providing a way for neuroimaging researchers to include cell-scale phenomena in their study of brain data. All data and code are available open source, open access.

      Main concern:

      One of the longstanding debates in the neuroimaging community concerns the relationship between brain geometry (in particular gyro/sulcal anatomy) and the cytoarchitectonic, connective and functional organisation of the brain. There are various examples of correspondance, but also many analyses showing its absence, particularly in associative cortex (for example, Fischl et al (2008) by some of the co-authors of the present manuscript). The manuscript emphasises the accuracy of their transformations to the different atlas spaces, which may give some readers a false impression. True: towards the end of the manuscript the authors briefly indicate the difficulty of having a single brain as source of histological data. I think, however, that the manuscript would benefit from making this point more clearly, providing the future users of BigBrainWarp with some conceptual elements and references that may help them properly apprise their results. In particular, it would be helpful to briefly describe which aspects of brain organisation where used to lead the deformation to the different templates, if they were only based on external anatomy, or if they took into account some other aspects such as myelination, thickness, …

      We agree with the Reviewer that the accuracy of the transformation and the potential influence of inter-individual variability should be carefully considered in BigBrain-MRI studies. To highlight these issues in the updated manuscript, we first conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. Second, we expanded our discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE Figure 2 in response to previous reviewers)

      Discussion (P.18, 19):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can have implications on interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter- subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Minor:

      1) In the abstract and later in p9 the authors talk about "state-of-the-art" non-linear deformation matrices. This may be confusing for some readers. To me, in brain imaging a matrix is most often a 4x4 affine matrix describing a linear transformation. However, the authors seem to be describing a more complex, non-linear deformation field. Whereas building a deformation matrix (4x4 affine) is not a big challenge, I agree that more sophisticated tools should provide more sophisticated deformation fields. The authors may consider using "deformation field" instead of "deformation matrix", but I leave that to their judgment.

      As suggested, we changed the text to “deformation field” where relevant.

      2) In the results section, p11, the authors highlight the challenge of segmenting thalamic nuclei or different hippocampal regions, and suggest that this should be simplified by the use of the histological BigBrain data. However, the atlases currently provided in the OSF project do not include these more refined parcellation: there's one single "Thalamus" label, and one single "Hippocampus" label (not really single: left and right). This could be explicitly stated to prevent readers from having too high expectations (although I am certain that those finer parcellations should come in the very close future).

      We updated the text to reflect the current state of such parcellations. While subthalamic nuclei are not yet segmented (to our knowledge), one of the present authors has segmented hippocampal subfields (https://osf.io/bqus3/) and we highlight this in the Results (P.11-12):

      “Despite MRI acquisitions at high and ultra-high fields reaching submillimeter resolutions with ongoing technical advances, certain brain structures and subregions remain difficult to identify (Kulaga-Yoskovitz et al., 2015; Wisse et al., 2017; Yushkevich et al., 2015). For example, there are challenges in reliably defining the subthalamic nucleus (not yet released for BigBrain) or hippocampal Cornu Ammonis subfields [manual segmentation available on BigBrain, https://osf.io/bqus3/, (DeKraker et al., 2019)]. BigBrain-defined labels can be transformed to a standard imaging space for further investigation. Thus, this approach can support exploration of the functional architecture of histologically-defined regions of interest.”

    1. Author Response:

      Reviewer #2 (Public Review):

      Summary:

      Frey et al develop an automated decoding method, based on convolutional neural networks, for wideband neural activity recordings. This allows the entire neural signal (across all frequency bands) to be used as decoding inputs, as opposed to spike sorting or using specific LFP frequency bands. They show improved decoding accuracy relative to standard Bayesian decoder, and then demonstrate how their method can find the frequency bands that are important for decoding a given variable. This can help researchers to determine what aspects of the neural signal relate to given variables.

      Impact:

      I think this is a tool that has the potential to be widely useful for neuroscientists as part of their data analysis pipelines. The authors have publicly available code on github and Colab notebooks that make it easy to get started using their method.

      Relation to other methods:

      This paper takes the following 3 methods used in machine learning and signal processing, and combines them in a very useful way. 1) Frequency-based representations based on spectrograms or wavelet decompositions (e.g. Golshan et al, Journal of Neuroscience Methods, 2020; Vilamala et al, 2017 IEEE international workshop on on machine learning for signal processing). This is used for preprocessing the neural data; 2) Convolutional neural networks (many examples in Livezey and Glaser, Briefings in Bioinformatics, 2020). This is used to predict the decoding output; 3) Permutation feature importance, aka a shuffle analysis (https://scikit-learn.org/stable/modules/permutation_importance.htmlhttps://compstat-lmu.github.io/iml_methods_limitations/pfi.html). This is used to determine which input features are important. I think the authors could slightly improve their discussion/referencing of the connection to the related literature.

      Overall, I think this paper is a very useful contribution, but I do have a few concerns, as described below.

      We thank the reviewer for the encouraging feedback and the helpful summary of the approaches we used. We are happy to read that they consider the framework to be a very useful contribution to the field of neuroscience. The reviewer raises several important questions regarding the influence measure/feature importance, the data format of the SVM and how the model can be used on EEG/ECoG datasets. Moreover, they suggest clarifying the general overview of the approach and to connect it more to the related literature. These are very helpful and thoughtful comments and we are grateful to be given the opportunity to address them.

      Concerns:

      1) The interpretability of the method is not validated in simulations. To trust that this method uncovers the true frequency bands that matter for decoding a variable, I feel it's important to show the method discovers the truth when it is actually known (unlike in neural data). As a simple suggestion, you could take an actual wavelet decomposition, and create a simple linear mapping from a couple of the frequency bands to an imaginary variable; then, see whether your method determines these frequencies are the important ones. Even if the model does not recover the ground truth frequency bands perfectly (e.g. if it says correlated frequency bands matter, which is often a limitation of permutation feature importance), this would be very valuable for readers to be aware of.

      2) It's unclear how much data is needed to accurately recover the frequency bands that matter for decoding, which may be an important consideration for someone wanting to use your method. This could be tested in simulations as described above, and by subsampling from your CA1 recordings to see how the relative influence plots change.

      We thank the reviewer for this really interesting suggestion to validate our model using simulations. Accordingly, we have now trained our model on simulated behaviours, which we created via linear mapping to frequency bands. As shown in Figure 3 - Supplement 2B, the frequency bands modulated by the simulated behaviour can be clearly distinguished from the unmodulated frequency bands. To make the synthetic data more plausible we chose different multipliers (betas) for each frequency component which explains the difference between the peak at 58Hz (beta = 2) and the peak at 3750Hz (beta = 1).

      To generate a more detailed understanding of how the detected influence of a variable changes based on the amount of data available, we conducted an additional analysis. Using the real data, we subsampled the training data from 1 to 35 minutes and fully retrained the model using cross-validation. We then used the original feature importance implementation to calculate influence scores across each cross-validation split. To quantify the similarity between the original influence measure and the downsampled influence we calculated the Pearson correlation between the downsampled influence and the one obtained when using the full training set. As can be seen in Figure 3 - Supplement 2A our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06)

      Page 8-9: To further assess the robustness of the influence measure we conducted two additional analyses. First, we tested how results depended on the amount of training data - (1 - 35 minutes, see Methods). We found that our model achieves an accurate representation of the true influence with as little as 5 minutes of training data (mean Pearson's r = 0.89 ± 0.06, Figure 3 - Supplement 2A). Secondly, we assessed influence accuracy on a simulated behaviour in which we varied the ground truth frequency information (see Methods). The model trained on the simulated behaviour is able to accurately represent the ground truth information (modulated frequencies 58 Hz & 3750 Hz, Figure 3 - Supplement 2B)

      Page 20: To evaluate if the influence measure accurately captures the true information content, we used simulated behaviours in which ground truth information was known. We used the preprocessed wavelet transformed data from one animal and created a simulated behaviour ysb using uniform random noise. Two frequency bands were then modulated by the simulated behaviour using fnew = fold * β * ysb. We used β=2 for 58Hz and β=1 for 3750Hz. We then retrained the model using five-fold cross validation and evaluated the influence measure as previously described. We report the proportion of frequency bands that fall into the correct frequencies (i.e. the frequencies we chose to be modulated, 58 Hz & 3750 Hz).

      New supplementary Figure:

      Figure 3 - Supplement 2: Decoding influence for downsampled models and simulations. (A) To measure the robustness of the influence measure we downsampled the training data and retrained the model using cross-validation. We plot the Pearson correlation between the original influence distribution using the full training set and the influence distribution obtained from the downsampled data. Each dot shows one cross-validation split. Inset shows influence plots for two runs, one for 35 minutes of training data, the other in which model training consisted of only 5 minutes of training data. (B) We quantified our influence measure using simulated behaviours. We used the wavelet preprocessed data from one CA1 recording and simulated two behavioural variables which were modulated by two frequencies (58Hz & 3750Hz) using different multipliers (betas 2 & 1). We then trained the model using cross-validation and calculated the influence scores via feature shuffling.

      3)

      a) It is not clear why your method leads to an increase in decoding accuracy (Fig. 1)? Is this simply because of the preprocessing you are using (using the Wavelet coefficients as inputs), or because of your convolutional neural network. Having a control where you provide the wavelet coefficients as inputs into a feedforward neural network would be useful, and a more meaningful comparison than the SVM. Side note - please provide more information on the SVM you are using for comparison (what is the kernel function, are you using regularization?).

      We thank the reviewer for this suggestion and are sorry for the lack of documentation regarding the support vector machine model. The support vector machine was indeed trained on the wavelet transformed data and not on the spike sorted data as we wanted a comparison model which also uses the raw data. The high error of the support vector machine on wavelet transformed data might stem from two problems: (1) The input by design loses all spatial relevant information as the 3-D representation (frequencies x channels x time) needs to be flattened into a 1-D vector in order to train an SVM on it and (2) the SVM therefore needs to deal with a huge number of features. For example, even though the wavelets are downsampled to 30Hz, one sample still consists of (64 timesteps * 128 channels * 26 frequencies) 212992 features, which leads the SVM to be very slow to train and to an overfit on the training set.

      This exact problem would also be present in a feedforward neural network that uses the wavelet coefficients as input. Any hidden layer connected to the input, using a reasonable amount of hidden units will result in a multi-million parameter model (e.g. 512 units will result in 109051904 parameters for just the first layer). These models are notoriously hard to train and won’t fit many consumer-grade GPUs, which is why for most spatial signals including images or higher-dimensional signals, convolutional layers are the preferred and often only option to train these models.

      We have now included more detailed information about the SVM (including kernel function and regularization parameters) in the methods section of the manuscript.

      Page 19:To generate a further baseline measure of performance when decoding using wavelet transformed coefficients, we trained support vector machines to decode position from wavelet transformed CA1 recordings. We used either a linear kernel or a non-linear radial-basis-function (RBF) kernel to train the model, using a regularization factor of C=100. For the non-linear RBF kernel we set gamma to the default 1 / (num_features * var(X)) as implemented in the sklearn framework. The SVM model was trained on the same wavelet coefficients as the convolutional neural network

      b) Relatedly, because the reason for the increase in decoding accuracy is not clear, I don't think you can make the claim that "The high accuracy and efficiency of the model suggest that our model utilizes additional information contained in the LFP as well as from sub-threshold spikes and those that were not successfully clustered." (line 122). Based on the shown evidence, it seems to me that all of the benefits vs. the Bayesian decoder could just be due to the nonlinearities of the convolutional neural network.

      Thanks for raising this interesting point regarding the linear vs. non-linear information contained in the neural data. Indeed, when training the model with a linear activation function for the convolutions and fully connected layers, model performance drops significantly. To quantify this we ran the model with three different configurations regarding its activation functions. We (1) used nonlinear activation functions only in the convolutional layers (2) or the fully connected layers or (3) only used linear activation functions throughout the whole model. As expected the model with only linear activation functions performed the worst (linear activation functions 61.61cm ± 33.85cm, non-linear convolutional layers 22.99cm ± 18.67cm, non-linear fully connected layers 47.03cm ± 29.61cm, all layers non-linear 18.89cm ± 4.66cm). For comparison the Bayesian decoder achieves a decoding accuracy of 23.25cm ± 2.79cm on this data.

      Thus it appears that the reviewer is correct - the advantage of the CNN model comes in part from the non-linearity of the convolutional layers. The corollary of this is that there are likely non-linear elements in the neural data that the CNN but not Bayes decoder can access. However, the CNN does also receive wider-band inputs and thus has the potential to utilize information beyond just detected spikes.

      In response to the reviewers point and to the new analysis regarding the LFP models raised by reviewer 1, we have now reworded this sentence in the manuscript.

      Page 4: The high accuracy and efficiency of the model for these harder samples suggest that the CNN utilizes additional information from sub-threshold spikes and those that were not successfully clustered, as well as nonlinear information which is not available to the Bayesian decoder.

    1. Author Response

      Reviewer #2 (Public Review):

      Portes et al. investigated the nanoscale architecture and dynamics of the osteoclast sealing zone using high-end microscopy techniques. They first use DONALD 3D single molecule localization microscopy on osteoclasts seeded on glass to study the lateral and axial localization of key components of the sealing zone. They show that for some components (vinculin, talin Cterminus), the axial localization was higher when molecules were in close proximity to the actin core while for other components (cortactin, actinin, filamin, paxillin), there was no difference in height as a function of distance from the actin core. They next show that random illumination microscopy (RIM) is a suited microscopy technique to study the sealing zone of osteoclasts on a bone mimetic substrate. They continue to use RIM to show that the dynamics of neighbouring podosomes correlate up to a distance of about 1.5um. They next show that within the sealing zone, groups of podosomes are surrounded by the classical adhesion adaptor proteins such as vinculin, talin and paxillin while actinin is present at the periphery of all single cores. This suggests that the sealing zone has an "intermediate" level of organization and that groups of podosomes form a functional unit within the sealing zone. The authors lastly demonstrate that the fluorescence intensity of the cores within these groups correlate with the intensity of the adaptor proteins that surrounds the group and that also the fluorescence intensity of the cores within one group correlates with each other.

      Strengths:

      The authors use bone slices to evaluate the nanoscale organization of cytoskeletal components in the sealing zone. Podosome conformations in osteoclasts strongly depend on the substrate type and the usage of bone slices accurately mimics the physiological environment in which osteoclasts reside in vivo.

      The authors use state-of-the-art imaging approaches to evaluation the nanoscale organization and dynamics of multiple podosome components in the sealing zone.

      The identification of groups of podosomes that demonstrate correlated dynamics within the sealing zone is a novel finding that is convincingly demonstrated.

      We thank the reviewer for these encouraging comments and the valuable suggestions below.

      Weaknesses:

      The rationale for the analysis performed on the DONALD super-resolution images (explained in Figure S1) is unclear. The analysis is also not properly explained and it is unclear how the data should be interpreted or put into context. Specific comments related to this analysis:

      – The authors make a distinction between towards the internal or external part of the cell when it comes to the height of the investigated proteins but it is unclear why this is done. Also, while the authors make this distinction, no conclusions are derived from this distinction and only the height values from towards the internal part of the cell are mentioned in the text.

      As the sealing zone is usually located near the cell periphery, we wondered whether the proximity of the peripheral plasma membrane could influence the molecular architecture of the structure, and a possible difference in tension between the inner and outer parts, and this is why we distinguished between the inner and outer side of the structures. However, our analyses revealed little difference between these two sides, the most striking being a closer proximity of the vinculin to the cores on the outer side of the belt. We now make this explicit in the manuscript (P3, L113116).

      • It is very much unclear how the distance of the investigated proteins towards the actin core is calculated. From Figure S1, it seems like a rectangle is taken that is centered around a podosome but the rectangle in the example contains more than one core. It seems like this would influence a proper interpretation of the data presented in the figures than contain the height values. The authors should better explain how the analysis was performed and how the analysis deals with the presence of multiple podosome cores in the rectangle of interest.

      We apologize for this omission. In order not to bias the analysis, the protein distance was calculated for all cores present, not just one. This is now specified in the legend of the figure.

      • In the text, the distance of the proteins with respect to the actin core is given (350nm-710nm depending on the specific protein and localization towards the external or internal part of the cell). It is mentioned that the measurements are not shown but it should be better explained how these numbers were derived from the data and the measurements (average, SD/SEM) should be shown.

      These values correspond to the maxima of the distributions of the different podosome markers shown in Figure 1G. Each of these proteins (vinculin, talin, filamin-A and paxillin) has a broad distribution marked by a depletion at the core, and not a peak as suggested by the first version of the manuscript. We propose not to indicate these values in the revised version in order to simplify the manuscript and not to confuse the reader.

      • Related to the previous comment. While it is mentioned that vinculin for example is located at ~500nm from the actin core, the height values (Figure 1E) are binned within 50nm of the core. This does not seem to match. It would be very helpful if the authors would add how many localizations are found so close to the core. Since this is expected to be low it would also be valuable it the authors would discuss what this means for difference in height between the molecules found close by and away from the core.

      Indeed, as shown in Figure 1G, vinculin is much less present in the center of actin cores than at 500 nm from these cores. The graph shown in Figure 1E, which shows the height of vinculin as a function of the distance to the core, without explaining the proportion of molecules detected, can indeed be confusing. This being said, a large number of molecules were detected, 197967 for the vinculin graph, including 5973 within 300 nm around the core, which is far from being negligible. To facilitate the understanding of this graph, as well as that of the graphs corresponding to the heights of the other proteins studied (Figures 1 and S2), we now superimpose on the height distributions, the frequency of the locations (new Figure 1E,F), still compiled in Figure 1G.

      • For cortactin, filamin A and actinin it is found that they reside on average at a height of approximately 150nm, even up to a large distance from the podosome core. It is unclear how these values should be interpreted. 150nm is way above the location where actin is expected to be (and also way above the average actin height that is found by the authors, with approximately 80nm more distant from the cores). The authors should add a discussion of what type of structures cortactin, filamin A and actinin would associate to at this position or how this height can be explained. This should also be included in the final model of Figure 6. In the current cartoon, filamin A for example seems to be associated with the integrins but this does not match with the height position observed by the authors.

      The average heights of cortactin, filamin-A and actinin are indeed around 150 nm, but are actually present over a wider range of heights (0-400nm), as shown in the histograms in Figure 1H. These values are therefore not inconsistent with the distribution of actin, which indeed has a lower average height, but is also present over this entire height (histogram now added in Figure 1H). These analyses suggest that there are different sets of actin filaments and that there is proportionally more cortactin, filamin-A and actinin on the high actin filaments, rather than on those close to the plasma membranes. To fully account for these results, we now point out the potential presence of different sets of actin filaments in the discussion (P7, L266-275) and corrected the model shown in the new Figure 6, placing a population of filamin A on the radial filaments, not just associated with integrins, and added filamin A and actinin in the side view of the model, to appreciate their likely localisation.

      The authors mention that the RIM resolution is 100nm and 300nm in the lateral and axial direction, respectively. This should also be confirmed on the bone slices with beads. It is well conceivable that the optical properties of bone have an effect on the optimal RIM resolution.

      In order to evaluate RIM resolution on osteoclast samples, as suggested by the reviewer, we did some experiments with beads and used the Fourier Ring Correlation Method (Nieuwenhuizen et al., Nat Methods 2013). This consists in making two RIM images with two different speckle illumination sequences, and comparing the correlations of the images in the Fourier space. The following figure shows the correlation curve as a function of spatial frequencies. The FIRE number, when the FRC curve reaches a correlation value of 1/7, gives an estimation of the resolution of the image.

      Using this approach, we evaluated the resolution to be of 125 nm, in average.

      The authors find three specific fluctuation periods (100s/25s/7s) but it is unclear what these periods mean. The authors only very briefly mention that these periods correlate with similar observations in macrophages but they should also add the implications of this finding and suggested a possible molecular mechanism that underlies these different fluctuations.

      We agree with this comment. So far, the mechanisms regulating these oscillations, whether purely mechanical or involving signaling, as well as and their importance for podosome and sealing zone function, are not yet understood. In van den Dries et al. Nat Commun 2013 and Labernadie et al. Nat Commun 2014, it was shown that these oscillations in macrophage podosomes depend on myosin IIA activity. It would thus be interesting to explore the effects of drugs interfering with actin polymerization on both the periodicity and the spatial synchrony properties of the sealing zone. We now discuss this point in the manuscript (P7, L296-300).

      The authors find that actinin-1 is localized around the podosome cores while filamin and vinculin surround groups of podosomes. The current representative images, though, that are chosen to support this difference display a very different density in podosome cores. The filamin and vinculin images seems to have a much denser podosome content compare to the actinin and cortactin images. I would encourage the authors to select images that are more comparable to fully appreciate the difference in localization of the associated proteins.

      This is a good point. Indeed, not all sealing zones are alike, especially with respect to the density of actin cores. This is why we have chosen to show a gallery of different cases (now in Figure S7), and not to intentionally select always the same patterns in the main figures in order not to mislead the reader. It is important to note that whatever the actin density, we find the same locations for the different proteins.

      In Figure 4 and 5, the authors show that the sealing zone is subdivided in groups of podosomes and it is implied that these for functional units within the sealing zone. Yet, it is unclear how persistent these groups are. Considering the dynamic nature of podosomes in other cell types (and as also demonstrated in the supplementary movies) it is well conceivable that these groups continuously fuse and remodel. To better define the nature of these groups of podosomes, the authors should add an analysis on these podosome groups and measure parameters such as group stability, podosome number per group, group size etc. This would very much enhance the novel aspects of the findings in this paper.

      Following the reviewer’s suggestion, we have quantified the number of podosomes per group and the group size. Measurements of these islets of clustered cores showed that they were 2.3 +/-2.1 µm² (average +/-SD) and contained in 7 +/-8 (average +/-SD) cores. These results are now included in the manuscript (P6, L213). Unfortunately, we could not accurately measure the stability of the clusters, as this would require a long, and challenging, time-lapse by RIM of osteoclasts expressing both paxillin-GFP and lifeact-mCherry, which we were able to achieve only on a few cells and on short timescales.

      The authors mention in the discussion that their finding about the groups of podosomes is very different from the "double circle" distribution found in previous publications. Yet, it is unclear what explains these different observations. While the authors use RIM super-resolution in this paper to assess the localization of the adaptor proteins, it is very unlikely that this is the source of this difference since the groups of podosomes would have been easily identified by conventional or confocal microscopy as well. The authors should add an extended discussion on how these differences could be explained and what this means for bone resorption properties.

      Indeed, our observation that the sealing zone is composed of islets of actin cores that are bordered by a network of adhesion complexes diverge from most of the previous studies describing a “double circle” organization. We believe that this difference may come, not only from the high resolution of our images, but mainly from the fact that most studies on the organization of sealing zones have been performed on mouse osteoclasts. We also believe that this particular organisation probably allows an efficient sealing of the osteoclast plasma membrane to the bone surface and maintains the resorption lacuna and the diffusion barrier. We now indicate this in the discussion (L7, P286-288).

    1. Author response:

      Reviewer #1 (Public Review):

      How does the brain respond to the input of different complexity, and does this ability to respond change with age?

      The study by Lalwani et al. tried to address this question by pulling together a number of neuroscientific methodologies (fMRI, MRS, drug challenge, perceptual psychophysics). A major strength of the paper is that it is backed up by robust sample sizes and careful choices in data analysis, translating into a more rigorous understanding of the sensory input as well as the neural metric. The authors apply a novel analysis method developed in human resting-state MRI data on task-based data in the visual cortex, specifically investigating the variability of neural response to stimuli of different levels of visual complexity. A subset of participants took part in a placebo-controlled drug challenge and functional neuroimaging. This experiment showed that increases in GABA have differential effects on participants with different baseline levels of GABA in the visual cortex, possibly modulating the perceptual performance in those with lower baseline GABA. A caveat is that no single cohort has taken part in all study elements, ie visual discrimination with drug challenge and neuroimaging. Hence the causal relationship is limited to the neural variability measure and does not extend to visual performance. Nevertheless, the consistent use of visual stimuli across approaches permits an exceptionally high level of comparability across (computational, behavioural, and fMRI are drawing from the same set of images) modalities. The conclusions that can be made on such a coherent data set are strong.

      The community will benefit from the technical advances, esp. the calculation of BOLD variability, in the study when described appropriately, encouraging further linkage between complementary measures of brain activity, neurochemistry, and signal processing.

      Thank you for your review. We agree that a future study with a single cohort would be an excellent follow-up.

      Reviewer #2 (Public Review):

      Lalwani et al. measured BOLD variability during the viewing of houses and faces in groups of young and old healthy adults and measured ventrovisual cortex GABA+ at rest using MR spectroscopy. The influence of the GABA-A agonist lorazepam on BOLD variability during task performance was also assessed, and baseline GABA+ levels were considered as a mediating variable. The relationship of local GABA to changes in variability in BOLD signal, and how both properties change with age, are important and interesting questions. The authors feature the following results: 1) younger adults exhibit greater task-dependent changes in BOLD variability and higher resting visual cortical GABA+ content than older adults, 2) greater BOLD variability scales with GABA+ levels across the combined age groups, 3) administration of a GABA-A agonist increased condition differences in BOLD variability in individuals with lower baseline GABA+ levels but decreased condition differences in BOLD variability in individuals with higher baseline GABA+ levels, and 4) resting GABA+ levels correlated with a measure of visual sensory ability derived from a set of discrimination tasks that incorporated a variety of stimulus categories.

      Strengths of the study design include the pharmacological manipulation for gauging a possible causal relationship between GABA activity and task-related adjustments in BOLD variability. The consideration of baseline GABA+ levels for interpreting this relationship is particularly valuable. The assessment of feature-richness across multiple visual stimulus categories provided support for the use of a single visual sensory factor score to examine individual differences in behavioral performance relative to age, GABA, and BOLD measurements.

      Weaknesses of the study include the absence of an interpretation of the physiological mechanisms that contribute to variability in BOLD signal, particularly for the chosen contrast that compared viewing houses with viewing faces.

      Whether any of the observed effects can be explained by patterns in mean BOLD signal, independent of variability would be useful to know.

      One of the first pre-processing steps of computing SDBOLD involves subtracting the block-mean from the fMRI signal for each task-condition. Therefore, patterns observed in BOLD signal variability are not driven by the mean-BOLD differences. Moreover, as noted above, to further confirm this, we performed additional mean-BOLD based analysis (See Supplementary Materials Pg 3). Results suggest that ∆⃗ MEANBOLD is actually larger in older adults vs. younger adults (∆⃗ SDBOLD exhibited the opposite pattern), but more importantly ∆⃗ MEANBOLD is not correlated with GABA or with visual performance. This is also consistent with prior research (Garrett et.al. 2011, 2013, 2015, 2020) that found MEANBOLD to be relatively insensitive to behavioral performance.

      The positive correlation between resting GABA+ levels and the task-condition effect on BOLD variability reaches significance at the total group level, when the young and old groups are combined, but not separately within each group. This correlation may be explained by age-related differences since younger adults had higher values than older adults for both types of measurements. This is not to suggest that the relationship is not meaningful or interesting, but that it may be conceptualized differently than presented.

      Thank you for this important point. The relationship between GABA and ∆⃗ SDBOLD shown in Figure 3 is also significant within each age-group separately (Line 386-388). The model used both age-group and GABA as predictors of ∆⃗ SDBOLD and found that both had a significant effect, while the Age-group x GABA interaction was not significant. The effect of age on ∆⃗ SDBOLD therefore does not completely explain the observed relationship between GABA and ∆⃗ SDBOLD because this latter effect is significant in both age-groups individually and in the whole sample even when variance explained by age is accounted for. The revision clarifies this important point (Ln 488-492). Thanks for raising it.

      Two separate dosages of lorazepam were used across individuals, but the details of why and how this was done are not provided, and the possible effects of the dose are not considered.

      Good point. We utilized two dosages to maximize our chances of finding a dosage that had a robust effect. The specific dosage was randomly assigned across participants and the dosage did not differ across age-groups or baseline GABA levels. We also controlled for the drug-dosage when examining the role of drug-related shift in ∆⃗ SDBOLD. We have clarified these points in the revision and highlighted the analysis that found no effect of dosage on drug-related shift in ∆⃗ SDBOLD (Line 407-418).

      The observation of greater BOLD variability during the viewing of houses than faces may be specific to these two behavioral conditions, and lingering questions about whether these effects generalize to other types of visual stimuli, or other non-visual behaviors, in old and young adults, limit the generalizability of the immediate findings.

      We agree that examining the factors that influence BOLD variability is an important topic for future research. In particular, although it is increasingly well known that variability modulation itself can occur in a host of different tasks and research contexts across the lifespan (see Garrett et al., 2013 Waschke et al., 2021), to address the question of whether variability modulation occurs directly in response to stimulus complexity in general, it will be important for future work to examine a range of stimulus categories beyond faces and houses. Doing so is indeed an active area of research in Dr. Garrett’s group, where visual stimuli from many different categories are examined (e.g., for a recent approach, see Waschke et.al.,2023 (biorxiv)). Regardless, only face and house stimuli were available in the current dataset. We therefore exploited the finding that BOLD variability tends to be larger for house stimuli than for face stimuli (in line with the HMAX model output) to demonstrate that the degree to which a given individual modulates BOLD variability in response to stimulus category is related to their age, to GABA levels, and to behavioral performance.

      The observed age-related differences in patterns of BOLD activity and ventrovisual cortex GABA+ levels along with the investigation of GABA-agonist effects in the context of baseline GABA+ levels are particularly valuable to the field, and merit follow-up. Assessing background neurochemical levels is generally important for understanding individualized drug effects. Therefore, the data are particularly useful in the fields of aging, neuroimaging, and vision research.

      Thank you, we agree!

      Reviewer #3 (Public Review):

      The role of neural variability in various cognitive functions is one of the focal contentions in systems and computational neuroscience. In this study, the authors used a largescale cohort dataset to investigate the relationship between neural variability measured by fMRI and several factors, including stimulus complexity, GABA levels, aging, and visual performance. Such investigations are valuable because neural variability, as an important topic, is by far mostly studied within animal neurophysiology. There is little evidence in humans. Also, the conclusions are built on a large-scale cohort dataset that includes multi-model data. Such a dataset per se is a big advantage. Pharmacological manipulations and MRS acquisitions are rare in this line of research. Overall, I think this study is well-designed, and the manuscript reads well. I listed my comments below and hope my suggestions can further improve the paper.

      Strength:

      1). The study design is astonishingly rich. The authors used task-based fMRI, MRS technique, population contrast (aging vs. control), and psychophysical testing. I appreciate the motivation and efforts for collecting such a rich dataset.

      2) The MRS part is good. I am not an expert in MRS so cannot comment on MRS data acquisition and analyses. But I think linking neural variability to GABA in humans is in general a good idea. There has been a long interest in the cause of neural variability, and inhibition of local neural circuits has been hypothesized as one of the key factors. 3. The pharmacological manipulation is particularly interesting as it provides at least evidence for the causal effects of GABA and deltaSDBOLD. I think this is quite novel.

      Weakness:

      1) I am concerned about the definition of neural variability. In electrophysiological studies, neural variability can be defined as Poisson-like spike count variability. In the fMRI world, however, there is no consensus on what neural variability is. There are at least three definitions. One is the variability (e.g., std) of the voxel response time series as used here and in the resting fMRI world. The second is to regress out the stimulusevoked activation and only calculate the std of residuals (e.g., background variability). The third is to calculate variability of trial-by-trial variability of beta estimates of general linear modeling. It currently remains unclear the relations between these three types of variability with other factors. It also remains unclear the links between neuronal variability and voxel variability. I don't think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.

      These are very important points, thank you for raising them. Although we agree that the majority of the single cell electrophysiology world indeed seems to prefer Poisson-like spiking variability as an easy and tractable estimate, it is certainly not the only variability approach in that field (e.g., entropy; see our most recent work in humans where spiking entropy outperforms simple spike counts to predict memory performance; Waschke et al., 2023, bioRxiv). In LFP, EEG/MEG and fMRI, there is indeed no singular consensus on what variability “is”, and in our opinion, that is a good thing. We have reported at length in past work about entire families of measures of signal variability, from simple variance, to power, to entropy, and beyond (see Table 1 in Waschke et al, 2021, Neuron). In principle, these measures are quite complementary, obviating the need to establish any single-measure consensus per se. Rather than viewing the three measures of neural variability that the reviewer mentioned as competing definitions, we prefer to view them as different sources of variance. For example, from each of the three sources of variance the reviewer suggests, any number of variability measures could be computed.

      The current study focuses on using the standard deviation of concatenated blocked time series separately for face and house viewing conditions (this is the same estimation approach used in our very earliest studies on signal variability; Garrett et al., 2010, JNeurosci). In those early studies, and nearly every one thereafter (see Waschke et al., 2021, Neuron), there is no ostensible link between SDBOLD (as we normaly compute it) and average BOLD from either multivariate or GLM models; as such, we do not find any clear difference in SDBOLD results whether or not average “evoked” responses are removed or not in past work. This is perhaps also why removing ERPs from EEG time series rarely influences estimates of variability in our work (e.g., Kloosterman et al., 2020, eLife).

      The third definition the reviewer notes refers to variability of beta estimates over trials. Our most recent work has done exactly this (e.g., Skowron et al., 2023, bioRxiv), calculating the SD even over single time point-wise beta estimates so that we may better control the extraction of time points prior to variability estimation. Although direct comparisons have not yet been published by us, variability over single TR beta estimates and variability over the time series without beta estimation are very highly correlated in our work (in the .80 range; e.g., Kloosterman et al., in prep).

      Re: the reviewer’s point that “It also remains unclear the links between neuronal variability and voxel variability. I don’t think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.” If we understand correctly, the reviewer maybe asking about within-person links between single-cell neuronal variability (to allow Poisson-like spiking variability) and voxel variability in fMRI? No such study has been conducted to date to our knowledge (such data almost don’t exist). Or rather, perhaps the reviewer is noting a more general point regarding the “computational principles” of variability in these different domains? If that is true, then a few points are worth noting. First, there is absolutely no expectation of Poisson distributions in continuous brain imaging-based time series (LFP, E/MEG, fMRI). To our knowledge, such distributions (which have equivalent means and variances, allowing e.g., Fano factors to be estimated) are mathematically possible in spiking because of the binary nature of spikes; when mean rates rise, so too do variances given that activity pushes away from the floor (of no activity). In continuous time signals, there is no effective “zero”, so a mathematical floor does not exist outright. This is likely why means and variances are not well coupled in continuous time signals (see Garrett et al., 2013, NBR; Waschke et al., 2021, Neuron); anything can happen. Regardless, convergence is beginning to be revealed between the effects noted from spiking and continuous time estimates of variability. For example, we show that spiking variability can show a similar, behaviourally relevant coupling to the complexity of visual input (Waschke et al., 2023, bioRxiv) as seen in the current study and in past work (e.g., Garrett et al., 2020, NeuroImage). Whether such convergence reflects common computational principles of variability remains to be seen in future work, despite known associations between single cell recordings and BOLD overall (e.g., Logothetis and colleagues, 2001, 2002, 2004, 2008).

      Given the intricacies of these arguments, we don’t currently include this discussion in the revised text. However, we would be happy to include aspects of this content in the main paper if the reviewer sees fit.

      2) If I understand it correctly, the positive relationship between stimulus complexity and voxel variability has been found in the author's previous work. Thus, the claims in the abstract in lines 14-15, and section 1 in results are exaggerated. The results simply replicate the findings in the previous work. This should be clearly stated.

      Good point. Since this finding was a replication and an extension, we reported these results mostly in the supplementary materials. The stimulus set used for the current study is different than Garrett et.al. 2020 and therefore a replication is important. Moreover, we have extended these findings across young and older adults (previous work was based on older adults alone). We have modified the text to clarify what is a replication and what part are extension/novel about the current study now (Line 14, 345 and 467). Thanks for the suggestion.

      3) It is difficult for me to comprehend the U-shaped account of baseline GABA and shift in deltaSDBOLD. If deltaSDBOLD per se is good, as evidenced by the positive relationship between brainscore and visual sensitivity as shown in Fig. 5b and the discussion in lines 432-440, why the brain should decrease deltaSDBOLD ?? or did I miss something? I understand that "average is good, outliers are bad". But a more detailed theory is needed to account for such effects.

      When GABA levels are increased beyond optimal levels, neuronal firing rates are reduced, effectively dampening neural activity and limiting dynamic range; in the present study, this resulted in reduced ∆⃗ SDBOLD. Thus, the observed drug-related decrease in ∆⃗ SDBOLD was most present in participants with already high levels of GABA. We have now added an explanation for the expected inverted-U (Line 523-546). The following figure tries to explain this with a hypothetical curve diagram and how different parts of Fig 4 might be linked to different points in such a curve.

      Author response image 1.

      Line 523-546 – “We found in humans that the drug-related shift in ∆⃗ SDBOLD could be either positive or negative, while being negatively related to baseline GABA. Thus, boosting GABA activity with drug during visual processing in participants with lower baseline GABA levels and low levels of ∆⃗ SDBOLD resulted in an increase in ∆⃗ SDBOLD (i.e., a positive change in ∆⃗ SDBOLD on drug compared to off drug). However, in participants with higher baseline GABA levels and higher ∆⃗ SDBOLD, when GABA was increased presumably beyond optimal levels, participants experienced no-change or even a decrease in∆⃗ SDBOLD on drug. These findings thus provide the first evidence in humans for an inverted-U account of how GABA may link to variability modulation.

      Boosting low GABA levels in older adults helps increase ∆⃗ SDBOLD, but why does increasing GABA levels lead to reduced ∆⃗ SDBOLD in others? One explanation is that higher than optimal levels of inhibition in a neuronal system can lead to dampening of the entire network. The reduced neuronal firing decreases the number of states the network can visit and decreases the dynamic range of the network. Indeed, some anesthetics work by increasing GABA activity (for example propofol a general anesthetic modulates activity at GABAA receptors) and GABA is known for its sedative properties. Previous research showed that propofol leads to a steeper power spectral slope (a measure of the “construction” of signal variance) in monkey ECoG recordings (Gao et al., 2017). Networks function optimally only when dynamics are stabilized by sufficient inhibition. Thus, there is an inverted-U relationship between ∆⃗ SDBOLD and GABA that is similar to that observed with other neurotransmitters.”

      4) Related to the 3rd question, can you show the relationship between the shift of deltaSDBOLD (i.e., the delta of deltaSDBOLD) and visual performance?

      We did not have data on visual performance from the same participants that completed the drug-based part of the study (Subset1 vs 3; see Figure 1); therefore, we unfortunately cannot directly investigate the relationship between the drug-related shift of ∆⃗ SDBOLD and visual performance. We have now highlighted that this as a limitation of the current study (Line 589-592), where we state: One limitation of the current study is that participants who received the drug-manipulation did not complete the visual discrimination task, thus we could not directly assess how the drug-related change in ∆⃗ SDBOLD impacted visual performance.

      5) Are the dataset openly available?? I didn't find the data availability statement.

      An excel-sheet with all the processed data to reproduce figures and results has been included in source data submitted along with the manuscript along with a data dictionary key for various columns. The raw MRI, MRS and fMRI data used in the current manuscript was collected as a part of a larger (MIND) study and will eventually be made publicly available on completion of the study (around 2027). Before that time, the raw data can be obtained for research purposes upon reasonable request. Processing code will be made available on GitHub.

    1. Author Response:

      Reviewer #1:

      Salehinejad et al. run a battery of tests to investigate the effects of sleep deprivation on cortical excitability using TMS, LTP/LTD-like plasticity using tDCS, EEG-derived measures and behavioral task-performance. The study confirms evidence for sleep deprivation resulting in an increase in cortical excitability, diminishing LTP-like plasticity changes, increase in EEG theta band-power and worse task-performance. Additionally, a protocol usual resulting in LTD-like plasticity results in LTP-like changes in the sleep deprivation condition.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations. In what follows, we addressed the comments one by one, revised the main text accordingly, and pasted the changes here as well.

      1) My main comment is regarding the motivation for executing this specific study setup, which did not become clear to me. It's a robust experimental design, with general approach quite similar to the (in the current manuscript heavily cited) Kuhn et al. 2016 study (which investigates cortical excitability, EEG markers, and changes in LTP mechanisms), with additional inclusion of LTD-plasticity measures. The authors list comprehensiveness as motivation, but the power of a comprehensive study like this would lie in being able to make comparisons across measures to identify new interrelations or interesting subgroups of participants differentially affected by sleep deprivations. These comparisons are presented in l. 322 and otherwise at the end of the supplementary material and the study does not seem to be designed with these as the main motivation in mind. Can the authors could comment on this & clarify their motivation? Maybe the authors can highlight in what way their study constitutes a methodological improvement and incorporates new aspects regarding hypothesis development as compared to e.g. Kuhn et al. 2016; currently, the authors highlight mainly the addition of LTD-plasticity protocols. Similarly, no motivation/context/hypotheses are given for saliva testing. There are a lot of different results, but e.g. the cortical excitability results are not discussed in depth, e.g. there is no effect on IO curve, but on other measures of excitability, the conclusion of that paragraph is only "our results demonstrate that corticocortical and corticospinal excitability are upscaled after sleep deprivation." There are some conflicting results regarding cortical excitability measures in the literature, possibly this could be discussed, so the reader can evaluate in what way the current study constitutes an improvement, for instance methodologically, over previous studies.

      Thank you for your comment/suggestion. The main motivation behind this study was to examine different physiological/behavioral/cognitive measures under sleep conditions and to provide a reasonably complete overview. This approach was not covered in detail by previous work, which is often limited to one or two pieces of behavioral and/or physiological evidence. Our study was not sufficiently powered to identify new interrelations between measures, because this was a secondary aim, although we found some relevant associations in exploratory analyses (i.e., association of motor learning with plasticity, and cortical excitability with memory and attention). Future studies, however, which are sufficiently powered for these comparisons, are needed to explore interrelations between physiological, and cognitive parameters more clearly and we stated this as a limitation (Page 22).

      That said, we agree that specific rationales of the study were not sufficiently clarified in the previous version. We rephrased and clarified respective motivations and rationales here:

      1) By comprehensive, we mean that we obtained measures from basic physiological parameters to behavior and higher-order cognition, which is not sufficiently covered so far. This includes also the exploration of expected associations between behavioral motor learning and plasticity measures, as well as excitability parameters and cognitive functions.

      2) In the Kuhn et al. (2016) study, cortical excitability was obtained by TMS intensity (single- pulse protocol) to elicit a predefined amplitude of the motor-evoked potential, which is a relatively unspecific parameter of corticospinal excitability. In the present study, cortical excitability was monitored by different TMS protocols, which cover not only corticospinal excitability, but also intracortical inhibition, facilitation, I-wave facilitation, and short-latency afferent inhibition, which allow more specific conclusions with respect to the involvement of cortical systems, neurotransmitters, and -modulators.

      3) Furthermore, Kuhn et al (2016) only investigated LTP-like, but not LTD-like plasticity. LTD- like plasticity was also not investigated in previous works to the best of our knowledge. LTD- like plasticity has however relevance for cognitive processing, and furthermore, knowledge about alterations of this kind of plasticity is important for mechanistic understanding of sleep- dependent plasticity alterations: The conversion of LTD-like to LTP-like plasticity under sleep deprivation is crucial for the interpretation of the study results as likely caused by cortical hyperactivity.

      4) Finally, an important motivation was to compare how brain physiology and cognition are differently affected by sleep deprivation, as compared to chronotype-dependent brain physiology, and cognitive performance, especially with respect to brain physiology, and performance at non-preferred times of the day. Our findings regarding the latter were recently published (Salehinejad et al., 2021) and comparisons of the present study with the published one have a novel, and important implications. Specifically, the results of both studies imply that the mechanistic background of sleep deprivation-, and non-optimal time of day performance- dependent reduced performance differs relevantly.

      We clarified these motivations in the introduction and discussion. Please see the revised text below:

      "The number of available studies about the impact of sleep deprivation on human brain physiology relevant for cognitive processes is limited, and knowledge is incomplete. With respect to cortical excitability, Kuhn et al. (2016) showed increased excitability under sleep deprivation via a global measure of corticospinal excitability, the TMS intensity needed to induce motor-evoked potentials of a specific amplitude. Specific information about the cortical systems, including neurotransmitters, and - modulators involved in these effects (e.g. glutamatergic, GABAergic, cholinergic), is however missing. The level of cortical excitability affects neuroplasticity, a relevant physiological derivate of learning, and memory formation. Kuhn and co-workers (2016) describe accordingly a sleep deprivation-dependent alteration of LTP-like plasticity in humans. The effects of sleep deprivation on LTD-like plasticity, which is required for a complete picture, have however not been explored so far. In the present study, we aimed to complete the current knowledge and explored also cognitive performance on those tasks which critically depend on cortical excitability (working memory, and attention), and neuroplasticity (motor learning) to gain mechanistic knowledge about sleep deprivation-dependent performance decline. Finally, we aimed to explore if the impact of sleep deprivation on brain physiology and cognitive performance differs from the effects of non-optimal time of day performance in different chronotypes, which we recently explored in a parallel study with an identical experimental design (Salehinejad et al., 2021). The use of measures of different modalities in this study allows us to comprehensively investigate the impact of sleep deprivation on brain and cognitive functions which is largely missing in the human literature."

      We added more details about the rationale for saliva sampling:

      "We also assessed resting-EEG theta/alpha, as an indirect measure of homeostatic sleep pressure, and examined cortisol and melatonin concentration to see how these are affected under sleep conditions, given the reported mixed effects in previous studies."

      We also rephrased the cortical excitability results. Please see the revised text below:

      "Taken together, our results demonstrate that glutamate-related intracortical excitability is upscaled after sleep deprivation. Moreover, cortical inhibition was decreased or turned into facilitation, which is indicative of enhanced cortical excitability as a result of GABAergic reduction. Corticospinal excitability did only show a trendwise upscaling, indicative for a major contribution of cortical, but not downstream excitability to this sleep deprivation-related enhancement."

      "The increase of cortical excitability parameters and the resultant synaptic saturation following sleep deprivation can explain the respective cognitive performance decline. It is, however, worth noting that our study was not powered to identify these correlations with sufficient reliability, and future studies that are powered for this aim are needed.

      Our findings have several implications. First, they show that sleep and circadian preference (i.e., chronotype) have functionally different impacts on human brain physiology and cognition. The same parameters of brain physiology and cognition were recently investigated at circadian optimal vs non-optimal time of day in two groups of early and late chronotypes (Salehinejad et al., 2021). While we found decreased cortical facilitation and lower neuroplasticity induction (same for both LTP and LTD) at the circadian nonpreferred time in that study (Salehinejad et al., 2021), in the present study we observed upscaled cortical excitability and a functionally different pattern of neuroplasticity alteration (i.e., diminished LTP-like plasticity induction and conversion of LTD- to LTP-like plasticity)."

      2) EEG-measures. In general, I find the presented evidence regarding a link between synaptic strength and human theta-power is weak. In humans, rhythmic theta activity can be found mostly in the form of midfrontal theta. Here, the largest changes seem to be in posterior electrodes (judging according to in Fig 4 bottom row), which will not capture rhythmic midfrontal theta in humans. Can the authors explain the scaling of the Fig. 4 top vs. bottom row, there seems to be a mismatch? No legend is given for the bottom row. The activity captured here is probably related to changes in nonrhythmic 1/f-type activity (which displays large changes relating to arousal: e.g. https://elifesciences.org/articles/55092. It would be of benefit to see a power spectrum for the EEG-measures to see the specific type of power changes across all frequencies & to verify that these are actually oscillatory peaks in individual subjects. As far as I understood, the referenced study Vyazovskiy et al., 2008 contains no information regarding theta as a marker for synaptic potentiation. The evidence that synaptic strength is captured by the specifically used measures needs to be strengthened or statements like "measured synaptic strength via the resting-EEG theta/alpha pattern" need to be more carefully stated.

      Thank you for this comment. We removed the Pz electrode from the figure and instead added F3 and F4 along with Fz and Cz to capture more mid-frontal regions. Please see the revised Figure 4. The top rows now include only midfrontal and midcentral areas (Fz, Cz, F3, F4), and show numerical comparisons of midfrontal theta which is significantly different across conditions (and larger after sleep deprivation). The purpose of the bottom figures, which are removed now, was just to provide an overall visual comparison of theta distribution across sleep conditions. However, we agree that the bottom-row figures are misleading because these just capture average theta band power without specifying midfrontal regions. We removed this part of the figure to prevent confusion. Please see below.

      Regarding the power spectrum, we also added new figures (4 g) showing how different frequency bands of the power spectrum are affected by sleep deprivation. Please see the revised Figure 4 below.

      Updated results, page 12-13:

      "In line with this, we investigated how sleep deprivation affects resting-state brain oscillations at the theta band (4-7 Hz), the beta band (15-30 Hz) as another marker of cortical excitability, vigilance and arousal (Eoh et al., 2005; Fischer et al., 2008) and the alpha band (8-14 Hz) which is important for cognition (e.g. memory, attention) (Klimesch, 2012). To this end, we analyzed EEG spectral power at mid-frontocentral electrodes (Fz, Cz, F3, F4) using a 4×2 mixed ANOVA. For theta activity, significant main effects of location (F1.71=18.68, p<0.001; ηp2=0.40) and sleep condition (F1=17.82, p<0.001; ηp2=0.39), but no interaction was observed, indicating that theta oscillations at frontocentral regions were similarly affected by sleep deprivation. Post hoc tests (paired, p<0.05) revealed that theta oscillations, grand averaged at mid-central electrodes, were significantly increased after sleep deprivation (p<0.001) (Fig. 4a,b). For the alpha band, the main effects of location (F1.49=12.92, p<0.001; ηp2=0.31) and sleep condition (F1=5.03, p=0.033; ηp2=0.15) and their interaction (F2.31=4.60, p=0.010; ηp2=0.14) were significant. Alpha oscillations, grand averaged at mid-frontocentral electrodes, were significantly decreased after sleep deprivation (p=0.033) (Fig. 4c,d). Finally, the analysis of beta spectral power showed significant main effects of location (F1.34=6.73, p=0.008; ηp2=0.19) and sleep condition (F1=6.98, p=0.013; ηp2=0.20) but no significant interaction. Beta oscillations, grand averaged at mid-frontocentral electrodes, were significantly increased after sleep deprivation (p=0.013) (Fig. 4e,f)."

      Fig. 4. Resting-state theta, alpha, and beta oscillations at electrodes Fz, Cz, F3 and F4. a,b Theta band activity was significantly higher after the sleep deprivation vs sufficient sleep condition (tFz=4.61, p<0.001; tCz=2.22, p=0.034; tF3=2.93, p=0.007; tF4=4.78, p<0.001). c,d, Alpha band activity was significantly lower at electrodes Fz and Cz (tFz=2.39, p=0.023; tCz=2.65, p=0.013) after the sleep deprivation vs the sufficient sleep condition. e,f, Beta band activity was significantly higher at electrodes Fz, Cz and F4 after sleep deprivation compared with the sufficient sleep condition (tFz=3.06, p=0.005; tCz=2.38, p= 0.024; tF4=2.25, p=0.032). g, Power spectrum including theta (4-7 Hz), alpha (8-14 Hz), and beta (15-30 Hz) bands at the electrodes Fz, Cz, F3 and F4 respectively. Data of one participant were excluded due to excessive noise. All pairwise comparisons for each electrode were calculated via post hoc Student’s t-tests (paired, p<0.05). n=29. Error bars represent s.e.m. ns = nonsignificant; Asterisks indicate significant differences. Boxes indicate the interquartile range that contains 50% of values (range from the 25th to the 75th percentile) and whiskers show the 1 to 99 percentiles.

      Regarding the reference, unfortunately, we were referring to a different work of the Vyazovskiy team. We meant Vyazovskiy et al. (2005). We removed this reference and the part that needed to be toned down from the introduction and added new relevant references while tuning down the statement about synaptic strength. Please see below:

      Revised text, Results, page 12:

      "So far, we found that sleep deprivation upscales cortical excitability, prevents induction of LTP-like plasticity, presumably due to saturated synaptic potentiation, and converts LTD- into LTP-like plasticity. Previous studies in animals (Vyazovskiy and Tobler, 2005; Leemburg et al., 2010) and humans (Finelli et al., 2000) have shown that EEG theta activity is a marker for homeostatic sleep pressure and increased cortical excitability (Kuhn et al., 2016)."

      3) In general, the authors generally do a good job pointing out multiple comparison corrected tests. In some cases, e.g. for their correlational analyses across measures, significant results are reported, but without a clearer discussion on what other tests were computed and how correction was applied, the evidence strength of these are hard to evaluate. Please check for all presented correlations.

      Thank you for your comment. For correlational analyses, no correction for multiple comparisons was computed, because these were secondary exploratory analyses. We state this now clearly in the manuscript. For the other analyses, the description of multiple comparisons is included below:

      Methods, pages 35-37:

      "For the TMS protocols with a double-pulse condition (i.e., SICI-ICF, I-wave facilitation, SAI), the resulting mean values were normalized to the respective single-pulse condition. First, mean values were calculated individually and then inter-individual means were calculated for each condition. For the I-O curves, absolute MEP values were used. To test for statistical significance, repeated-measures ANOVAs were performed with ISIs, TMS intensity (in I-O curve only), and condition (sufficient sleep vs sleep deprivation) as within-subject factors and MEP amplitude as the dependent variable. In case of significant results of the ANOVA, post hoc comparisons were performed using Bonferroni-corrected t-tests to compare mean MEP amplitudes of each condition against the baseline MEP and to contrast sufficient sleep vs sleep deprivation conditions. To determine if individual baseline measures differed within and between sessions, SI1mV and Baseline MEP were entered as dependent variables in a mixed-model ANOVA with session (4 levels) and condition (sufficient sleep vs sleep deprivation) as within-subject factors, and group (anodal vs cathodal) as between-subject factor. The mean MEP amplitude for each measurement time-point was normalized to the session’s baseline (individual quotient of the mean from the baseline mean) resulting in values representing either increased (> 1.0) or decreased (< 1.0) excitability. Individual averages of the normalized MEP from each time-point were then calculated and entered as dependent variables in a mixed-model ANOVA with repeated measures with stimulation condition (active, sham), time-point (8 levels), and sleep condition (normal vs deprivation) as within-subject factors and group (anodal vs cathodal) as between-subject factor. In case of significant ANOVA results, post hoc comparisons of MEP amplitudes at each time point were performed using Bonferroni-corrected t-tests to examine if active stimulation resulted in a significant difference relative to sham (comparison 1), baseline (comparison 2), the respective stimulation condition at sufficient sleepvs sleep deprivation (comparison 3), and the between-group comparisons at respective timepoints (comparison 4).

      The mean RT, RT variability and accuracy of blocks were entered as dependent variables in repeated-measures ANOVAs with block (5, vs 6, 6 vs 7) and condition (sufficient sleep vs sleep deprivation) as within-subject factors. Because the RT differences between blocks 5 vs 6 and 6 vs 7 were those of major interest, post hoc comparisons were performed on RT differences between these blocks using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons. For 3-back, Stroop and AX-CPT tasks, mean and standard deviation of RT and accuracy were calculated and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. For significant ANOVA results, post hoc comparisons of dependent variables were performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      For the resting-state data, brain oscillations at mid-central electrodes (Fz, Cz, F3, F4) were analyzed with a 4×2 ANOVA with location (Fz, Cz, F3, F4) and sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factors. For all tasks, individual ERP means were grand-averaged and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. Post hoc comparisons of grand-averaged amplitudes was performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      To assess the relationship between induced neuroplasticity and motor sequence learning, and the relationship between cortical excitability and cognitive task performance, we calculated Pearson correlations. For the first correlation, we used individual grand-averaged MEP amplitudes obtained from anodal and cathodal tDCS pooled for the time-points between 0, and 20 min after interventions, and individual motor learning performance (i.e. BL6-5 and BL6-7 RT difference) across sleep conditions. For the second correlation, we used individual grand-averaged MEP amplitudes obtained from each TMS protocol and individual accuracy/RT obtained from each task across sleep conditions. No correction for multiple comparisons was done for correlational analyses as these were secondary exploratory analyses."

      There are also inconsistencies like: " The average levels of cortisol and melatonin were lower after sleep deprivation vs sufficient sleep (cortisol: 3.51{plus minus}2.20 vs 4.85{plus minus}3.23, p=0.05; melatonin 10.50{plus minus}10.66 vs 16.07{plus minus}14.94, p=0.16)"

      The p-values are not significant here?

      Thank you for your comment. The p-value was only marginally significant for the cortisol level changes. We clarified this in the revision. Please see below:

      Revised text, page 19:

      "The average levels of cortisol and melatonin were numerically lower after sleep deprivation vs sufficient sleep (cortisol: 3.51±2.20 vs 4.85±3.23, p=0.056; melatonin 10.50±10.66 vs 16.07±14.94, p=0.16), but these differences were only marginally significant for the cortisol level and showed only a trendwise reduction for melatonin."

      Reviewer #2:

      This study represents the currently most comprehensive characterization of indices of synaptic plasticity and cognition in humans in the context of sleep deprivation. It provides further support for an interplay between the time course of synaptic strength/cortical excitability (homeostatic plasticity) and the inducibility of associative synaptic LTP- LTD-like plasticity. The study is of great interest, the translation of findings is of potential clinical relevance, the methods appear to be solid and the results are mostly convincing. I believe that the writing of the manuscript should be improved (e.g. quality of referencing), clearer framework and hypothesis, reduction of redundancies, and more precise discussion. However, all of these points can be addressed since the overall concept, design, conduct and findings are convincing and of great interest to the field of sleep research, but also more broader to the neurosciences, to clinicians and the public.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Lujan and colleagues describes a series of cellular phenotypes associated with the depletion of TANGO2, a poorly characterized gene product but relevant to neurological and muscular disorders. The authors report that TANGO2 associates with membrane-bound organelles, mainly mitochondria, impacting in lipid metabolism and the accumulation of reactive-oxygen species. Based on these observations the authors speculate that TANGO2 function in Acyl-CoA metabolism.

      The observations are generally convincing and most of the conclusions appear logical. While the function of TANGO2 remains unclear, the finding that it interferes with lipid metabolism is novel and important. This observation was not developed to a great extent and based on the data presented, the link between TANGO2 and acyl-CoA, as proposed by the authors, appears rather speculative.

      We thank you for your advice and now include additional data that lends support to the role of TANGO2 in lipid metabolism. We have changed the title accordingly.

      1) The data with overexpressed TANGO2 looks convincing but I wonder if the authors analyzed the localization of endogenous TANGO2 by immunofluorescence using the antibody described in Figure S2. The idea that TANGO2 localizes to membrane contact sites between mitochondria and the ER and LDs would also be strengthened by experiments including multiple organelle markers.

      We agree that most of the data on TANGO2 localization are based on the overexpression of the protein. As suggested by the reviewer and despite the lack of commercial antibodies for immunofluorescence-based evaluation, see the following chart, we tested the commercial antibody described in Figure 2 on HepG2 and U2OS cells. Moreover, we used Förster resonance energy transfer (FRET) technology to analyze the proximity of TANGO2 and Tom20, a specific outer mitochondrial membrane protein. In addition, we visualized cells expressing tagged TANGO2 and tagged VAP-B, an integral ER protein in the mitochondria-associated membranes (doi:10.1093/hmg/ddr559) or tagged TANGO2 and tagged GPAT4-Hairpin, an integral LD protein (doi:10.1016/j.devcel.2013.01.013). These data strengthen our proposal and are presented in the revised manuscript.

      As suggested by the reviewer, we have also visualized two additional cell lines (HepG2 and U2OS) with the anti-TANGO2( from Novus Biologicals) that have been used for western blot (see chart above). As shown in the following figure, the commercial antibody shows a lot of staining in addition to mitochondria, especially in U2OS cells, where it also appears to label the nucleus.

      2) The changes in LD size in TANGO2-depleted cells are very interesting and consistent with the role of TANGO2 in lipid metabolism. From the lipidomics analysis, it seems that the relative levels of the main neutral lipids in TANGO2-depleted cells remain unaltered (TAG) or even decrease (CE). Therefore, it would be interesting to explore further the increase in LD size for example analyze/display the absolute levels of neutral lipids in the various conditions.

      We agree with the reviewer and now present the absolute levels of lipids of interest in the various conditions of the lipidomics analyses (Figure S 3).

      3) Most of the lipidomics changes in TANGO2-depleted cells are observed in lipid species present in very low amounts while the relative abundance of major phospholipids (PC, PE PI) remains mostly unchanged. It would be good to also display the absolute levels of the various lipids analyzed. This is an important point to clarify as it would be unlikely that these major phospholipids are unaffected by an overall defect in Acyl-CoA metabolism, as proposed by the authors.

      As stated above, we have now included the absolute levels of lipids of interest in the various conditions of the lipidomics analyses (Figure S 3).

    1. Author Response:

      Reviewer #1 (Public Review):

      In this article, Bollmann and colleagues demonstrated both theoretically and experimentally that blood vessels could be targeted at the mesoscopic scale with time-of-flight magnetic resonance imaging (TOF-MRI). With a mathematical model that includes partial voluming effects explicitly, they outline how small voxels reduce the dependency of blood dwell time, a key parameter of the TOF sequence, on blood velocity. Through several experiments on three human subjects, they show that increasing resolution improves contrast and evaluate additional issues such as vessel displacement artifacts and the separation of veins and arteries.

      The overall presentation of the main finding, that small voxels are beneficial for mesoscopic pial vessels, is clear and well discussed, although difficult to grasp fully without a good prior understanding of the underlying TOF-MRI sequence principles. Results are convincing, and some of the data both raw and processed have been provided publicly. Visual inspection and comparisons of different scans are provided, although no quantification or statistical comparison of the results are included.

      Potential applications of the study are varied, from modeling more precisely functional MRI signals to assessing the health of small vessels. Overall, this article reopens a window on studying the vasculature of the human brain in great detail, for which studies have been surprisingly limited until recently.

      In summary, this article provides a clear demonstration that small pial vessels can indeed be imaged successfully with extremely high voxel resolution. There are however several concerns with the current manuscript, hopefully addressable within the study.

      Thank you very much for this encouraging review. While smaller voxel sizes theoretically benefit all blood vessels, we are specifically targeting the (small) pial arteries here, as the inflow-effect in veins is unreliable and susceptibility-based contrasts are much more suited for this part of the vasculature. (We have clarified this in the revised manuscript by substituting ‘vessel’ with ‘artery’ wherever appropriate.) Using a partial-volume model and a relative contrast formulation, we find that the blood delivery time is not the limiting factor when imaging pial arteries, but the voxel size is. Taking into account the comparatively fast blood velocities even in pial arteries with diameters ≤ 200 µm (using t_delivery=l_voxel/v_blood), we find that blood dwell times are sufficiently long for the small voxel sizes considered here to employ the simpler formulation of the flow-related enhancement effect. In other words, small voxels eliminate blood dwell time as a consideration for the blood velocities expected for pial arteries.

      We have extended the description of the TOF-MRA sequence in the revised manuscript, and all data and simulations/analyses presented in this manuscript are now publicly available at https://osf.io/nr6gc/ and https://gitlab.com/SaskiaB/pialvesseltof.git, respectively. This includes additional quantifications of the FRE effect for large vessels (adding to the assessment for small vessels already included), and the effect of voxel size on vessel segmentations.

      Main points:

      1) The manuscript needs clarifying through some additional background information for a readership wider than expert MR physicists. The TOF-MRA sequence and its underlying principles should be introduced first thing, even before discussing vascular anatomy, as it is the key to understanding what aspects of blood physiology and MRI parameters matter here. MR physics shorthand terms should be avoided or defined, as 'spins' or 'relaxation' are not obvious to everybody. The relationship between delivery time and slab thickness should be made clear as well.

      Thank you for this valuable comment that the Theory section is perhaps not accessible for all readers. We have adapted the manuscript in several locations to provide more background information and details on time-of-flight contrast. We found, however, that there is no concise way to first present the MR physics part and then introduce the pial arterial vasculature, as the optimization presented therein is targeted towards this structure. To address this comment, we have therefore opted to provide a brief introduction to TOF-MRA first in the Introduction, and then a more in-depth description in the Theory section.

      Introduction section:

      "Recent studies have shown the potential of time-of-flight (TOF) based magnetic resonance angiography (MRA) at 7 Tesla (T) in subcortical areas (Bouvy et al., 2016, 2014; Ladd, 2007; Mattern et al., 2018; Schulz et al., 2016; von Morze et al., 2007). In brief, TOF-MRA uses the high signal intensity caused by inflowing water protons in the blood to generate contrast, rather than an exogenous contrast agent. By adjusting the imaging parameters of a gradient-recalled echo (GRE) sequence, namely the repetition time (T_R) and flip angle, the signal from static tissue in the background can be suppressed, and high image intensities are only present in blood vessels freshly filled with non-saturated inflowing blood. As the blood flows through the vasculature within the imaging volume, its signal intensity slowly decreases. (For a comprehensive introduction to the principles of MRA, see for example Carr and Carroll (2012)). At ultra-high field, the increased signal-to-noise ratio (SNR), the longer T_1 relaxation times of blood and grey matter, and the potential for higher resolution are key benefits (von Morze et al., 2007)."

      Theory section:

      "Flow-related enhancement

      Before discussing the effects of vessel size, we briefly revisit the fundamental theory of the flow-related enhancement effect used in TOF-MRA. Taking into account the specific properties of pial arteries, we will then extend the classical description to this new regime. In general, TOF-MRA creates high signal intensities in arteries using inflowing blood as an endogenous contrast agent. The object magnetization—created through the interaction between the quantum mechanical spins of water protons and the magnetic field—provides the signal source (or magnetization) accessed via excitation with radiofrequency (RF) waves (called RF pulses) and the reception of ‘echo’ signals emitted by the sample around the same frequency. The T1-contrast in TOF-MRA is based on the difference in the steady-state magnetization of static tissue, which is continuously saturated by RF pulses during the imaging, and the increased or enhanced longitudinal magnetization of inflowing blood water spins, which have experienced no or few RF pulses. In other words, in TOF-MRA we see enhancement for blood that flows into the imaging volume."

      "Since the coverage or slab thickness in TOF-MRA is usually kept small to minimize blood delivery time by shortening the path-length of the vessel contained within the slab (Parker et al., 1991), and because we are focused here on the pial vasculature, we have limited our considerations to a maximum blood delivery time of 1000 ms, with values of few hundreds of milliseconds being more likely."

      2) The main discussion of higher resolution leading to improvements rather than loss presented here seems a bit one-sided: for a more objective understanding of the differences it would be worth to explicitly derive the 'classical' treatment and show how it leads to different conclusions than the present one. In particular, the link made in the discussion between using relative magnetization and modeling partial voluming seems unclear, as both are unrelated. One could also argue that in theory higher resolution imaging is always better, but of course there are practical considerations in play: SNR, dynamics of the measured effect vs speed of acquisition, motion, etc. These issues are not really integrated into the model, even though they provide strong constraints on what can be done. It would be good to at least discuss the constraints that 140 or 160 microns resolution imposes on what is achievable at present.

      Thank you for this excellent suggestion. We found it instructive to illustrate the different effects separately, i.e. relative vs. absolute FRE, and then partial volume vs. no-partial volume effects. In response to comment R2.8 of Reviewer 2, we also clarified the derivation of the relative FRE vs the ‘classical’ absolute FRE (please see R2.8). Accordingly, the manuscript now includes the theoretical derivation in the Theory section and an explicit demonstration of how the classical treatment leads to different conclusions in the Supplementary Material. The important insight gained in our work is that only when considering relative FRE and partial-volume effects together, can we conclude that smaller voxels are advantageous. We have added the following section in the Supplementary Material:

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect employed in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the implications of these two effects, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm or 2 000 µm (i.e. no partial-volume effects at the centre of the vessel). The absolute FRE expression explicitly takes the voxel volume into account, and so instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      "Note that the division by M_zS^tissue⋅l_voxel^3 to obtain the relative FRE from this expression removes the contribution of the total voxel volume (l_voxel^3). Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      In addition, we have also clarified the contribution of the two definitions and their interaction in the Discussion section. Following the suggestion of Reviewer 2, we have extended our interpretation of relative FRE. In brief, absolute FRE is closely related to the physical origin of the contrast, whereas relative FRE is much more concerned with the “segmentability” of a vessel (please see R2.8 for more details):

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 2). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      Note that our formulation of the FRE—even without considering SNR—does not suggest that higher resolution is always better, but instead should be matched to the size of the target arteries:

      "Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      Further, we have also extended the concluding paragraph of the Imaging limitation section to also include a practical perspective:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and/or larger acquisition volumes to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      3) The article seems to imply that TOF-MRA is the only adequate technique to image brain vasculature, while T2 mapping, UHF T1 mapping (see e.g. Choi et al., https://doi.org/10.1016/j.neuroimage.2020.117259) phase (e.g. Fan et al., doi:10.1038/jcbfm.2014.187), QSM (see e.g. Huck et al., https://doi.org/10.1007/s00429-019-01919-4), or a combination (Bernier et al., https://doi.org/10.1002/hbm.24337​, Ward et al., https://doi.org/10.1016/j.neuroimage.2017.10.049) all depict some level of vascular detail. It would be worth quickly reviewing the different effects of blood on MRI contrast and how those have been used in different approaches to measure vasculature. This would in particular help clarify the experiment combining TOF with T2 mapping used to separate arteries from veins (more on this question below).

      We apologize if we inadvertently created the impression that TOF-MRA is a suitable technique to image the complete brain vasculature, and we agree that susceptibility-based methods are much more suitable for venous structures. As outlined above, we have revised the manuscript in various sections to indicate that it is the pial arterial vasculature we are targeting. We have added a statement on imaging the venous vasculature in the Discussion section. Please see our response below regarding the use of T2* to separate arteries and veins.

      "The advantages of imaging the pial arterial vasculature using TOF-MRA without an exogenous contrast agent lie in its non-invasiveness and the potential to combine these data with various other structural and functional image contrasts provided by MRI. One common application is to acquire a velocity-encoded contrast such as phase-contrast MRA (Arts et al., 2021; Bouvy et al., 2016). Another interesting approach utilises the inherent time-of-flight contrast in magnetization-prepared two rapid acquisition gradient echo (MP2RAGE) images acquired at ultra-high field that simultaneously acquires vasculature and structural data, albeit at lower achievable resolution and lower FRE compared to the TOF-MRA data in our study (Choi et al., 2020). In summary, we expect high-resolution TOF-MRA to be applicable also for group studies to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. In addition, imaging of the pial venous vasculature—using susceptibility-based contrasts such as T2-weighted magnitude (Gulban et al., 2021) or phase imaging (Fan et al., 2015), susceptibility-weighted imaging (SWI) (Eckstein et al., 2021; Reichenbach et al., 1997) or quantitative susceptibility mapping (QSM) (Bernier et al., 2018; Huck et al., 2019; Mattern et al., 2019; Ward et al., 2018)—would enable a comprehensive assessment of the complete cortical vasculature and how both arteries and veins shape brain hemodynamics.*"

      4) The results, while very impressive, are mostly qualitative. This seems a missed opportunity to strengthen the points of the paper: given the segmentations already made, the amount/density of detected vessels could be compared across scans for the data of Fig. 5 and 7. The minimum distance between vessels could be measured in Fig. 8 to show a 2D distribution and/or a spatial map of the displacement. The number of vessels labeled as veins instead of arteries in Fig. 9 could be given.

      We fully agree that estimating these quantitative measures would be very interesting; however, this would require the development of a comprehensive analysis framework, which would considerably shift the focus of this paper from data acquisition and flow-related enhancement to data analysis. As noted in the discussion section Challenges for vessel segmentation algorithms, ‘The vessel segmentations presented here were performed to illustrate the sensitivity of the image acquisition to small pial arteries’, because the smallest arteries tend to be concealed in the maximum intensity projections. Further, the interpretation of these measures is not straightforward. For example, the number of detected vessels for the artery depicted in Figure 5 does not change across resolutions, but their length does. We have therefore estimated the relative increase in skeleton length across resolutions for Figures 5 and 7. However, these estimates are not only a function of the voxel size but also of the underlying vasculature, i.e. the number of arteries with a certain diameter present, and may thus not generalise well to enable quantitative predictions of the improvement expected from increased resolutions. We have added an illustration of these analyses in the Supplementary Material, and the following additions in the Methods, Results and Discussion sections.

      "For vessel segmentation, a semi-automatic segmentation pipeline was implemented in Matlab R2020a (The MathWorks, Natick, MA) using the UniQC toolbox (Frässle et al., 2021): First, a brain mask was created through thresholding which was then manually corrected in ITK-SNAP (http://www.itksnap.org/) (Yushkevich et al., 2006) such that pial vessels were included. For the high-resolution TOF data (Figures 6 and 7, Supplementary Figure 4), denoising to remove high frequency noise was performed using the implementation of an adaptive non-local means denoising algorithm (Manjón et al., 2010) provided in DenoiseImage within the ANTs toolbox, with the search radius for the denoising set to 5 voxels and noise type set to Rician. Next, the brain mask was applied to the bias corrected and denoised data (if applicable). Then, a vessel mask was created based on a manually defined threshold, and clusters with less than 10 or 5 voxels for the high- and low-resolution acquisitions, respectively, were removed from the vessel mask. Finally, an iterative region-growing procedure starting at each voxel of the initial vessel mask was applied that successively included additional voxels into the vessel mask if they were connected to a voxel which was already included and above a manually defined threshold (which was slightly lower than the previous threshold). Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied. The Matlab code describing the segmentation algorithm as well as the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in our github repository (https://gitlab.com/SaskiaB/pialvesseltof.git). To assess the data quality, maximum intensity projections (MIPs) were created and the outline of the segmentation MIPs were added as an overlay. To estimate the increased detection of vessels with higher resolutions, we computed the relative increase in the length of the segmented vessels for the data presented in Figure 5 (0.8 mm, 0.5 mm, 0.4 mm and 0.3 mm isotropic voxel size) and Figure 7 (0.16 mm and 0.14 mm isotropic voxel size) by computing the skeleton using the bwskel Matlab function and then calculating the skeleton length as the number of voxels in the skeleton multiplied by the voxel size."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, as long as the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE does not change with resolution (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to detect smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not the blood delivery time, which determines whether vessels can be resolved."

      "Indeed, the reduction in voxel volume by 33 % revealed additional small branches connected to larger arteries (see also Supplementary Figure 8). For this example, we found an overall increase in skeleton length of 14 % (see also Supplementary Figure 9)."

      "We therefore expect this strategy to enable an efficient image acquisition without the need for additional venous suppression RF pulses. Once these challenges for vessel segmentation algorithms are addressed, a thorough quantification of the arterial vasculature can be performed. For example, the skeletonization procedure used to estimate the increase of the total length of the segmented vasculature (Supplementary Figure 9) exhibits errors particularly in the unwanted sinuses and large veins. While they are consistently present across voxel sizes, and thus may have less impact on relative change in skeleton length, they need to be addressed when estimating the absolute length of the vasculature, or other higher-order features such as number of new branches. (Note that we have also performed the skeletonization procedure on the maximum intensity projections to reduce the number of artefacts and obtained comparable results: reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 % (3D) vs 37 % (2D), reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 % (3D) vs 26 % (2D), reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 % (3D) vs 16 % (2D), and reducing the voxel size from 0.16 mm to 0.14 mm isotropic increases the skeleton length by 14 % (3D) vs 24 % (2D).)"

      Supplementary Figure 9: Increase of vessel skeleton length with voxel size reduction. Axial maximum intensity projections for data acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm (TOP) (corresponding to Figure 5) and 0.16 mm to 0.14 mm isotropic (corresponding to Figure 7) are shown. Vessel skeletons derived from segmentations performed for each resolution are overlaid in red. A reduction in voxel size is accompanied by a corresponding increase in vessel skeleton length.

      Regarding further quantification of the vessel displacement presented in Figure 8, we have estimated the displacement using the Horn-Schunck optical flow estimator (Horn and Schunck, 1981; Mustafa, 2016) (https://github.com/Mustafa3946/Horn-Schunck-3D-Optical-Flow). However, the results are dominated by the larger arteries, whereas we are mostly interested in the displacement of the smallest arteries, therefore this quantification may not be helpful.

      Because the theoretical relationship between vessel displacement and blood velocity is well known (Eq. 7), and we have also outlined the expected blood velocity as a function of arterial diameter in Figure 2, which provided estimates of displacements that matched what was found in our data (as reported in our original submission), we believe that the new quantification in this form does not add value to the manuscript. What would be interesting would be to explore the use of this displacement artefact as a measure of blood velocities. This, however, would require more substantial analyses in particular for estimation of the arterial diameter and additional validation data (e.g. phase-contrast MRA). We have outlined this avenue in the Discussion section. What is relevant to the main aim of this study, namely imaging of small pial arteries, is the insight that blood velocities are indeed sufficiently fast to cause displacement artefacts even in smaller arteries. We have clarified this in the Results section:

      "Note that correction techniques exist to remove displaced vessels from the image (Gulban et al., 2021), but they cannot revert the vessels to their original location. Alternatively, this artefact could also potentially be utilised as a rough measure of blood velocity."

      "At a delay time of 10 ms between phase encoding and echo time, the observed displacement of approximately 2 mm in some of the larger vessels would correspond to a blood velocity of 200 mm/s, which is well within the expected range (Figure 2). For the smallest arteries, a displacement of one voxel (0.4 mm) can be observed, indicative of blood velocities of 40 mm/s. Note that the vessel displacement can be observed in all vessels visible at this resolution, indicating high blood velocities throughout much of the pial arterial vasculature. Thus, assuming a blood velocity of 40 mm/s (Figure 2) and a delay time of 5 ms for the high-resolution acquisitions (Figure 6), vessel displacements of 0.2 mm are possible, representing a shift of 1–2 voxels."

      Regarding the number of vessels labelled as veins, please see our response below to R1.5.

      In the main quantification given, the estimation of FRE increase with resolution, it would make more sense to perform the segmentation independently for each scan and estimate the corresponding FRE: using the mask from the highest resolution scan only biases the results. It is unclear also if the background tissue measurement one voxel outside took partial voluming into account (by leaving a one voxel free interface between vessel and background). In this analysis, it would also be interesting to estimate SNR, so you can compare SNR and FRE across resolutions, also helpful for the discussion on SNR.

      The FRE serves as an indicator of the potential performance of any segmentation algorithm (including manual segmentation) (also see our discussion on the interpretation of FRE in our response to R1.2). If we were to segment each scan individually, we would, in the ideal case, always obtain the same FRE estimate, as FRE influences the performance of the segmentation algorithm. In practice, this simply means that it is not possible to segment the vessel in the low-resolution image to its full extent that is visible in the high-resolution image, because the FRE is too low for small vessels. However, we agree with the core point that the reviewer is making, and so to help address this, a valuable addition would be to compare the FRE for the section of a vessel that is visible at all resolutions, where we found—within the accuracy of the transformations and resampling across such vastly different resolutions—that the FRE does not increase any further with higher resolution if the vessel is larger than the voxel size (page 18 and Figure 5). As stated in the Methods section, and as noted by the reviewer, we used the voxels immediately next to the vessel mask to define the background tissue signal level. Any resulting potential partial-volume effects in these background voxels would affect all voxel sizes, introducing a consistent bias that would not impact our comparison. However, inspection of the image data in Figure 5 showed partial-volume effects predominantly within those voxels intersecting the vessel, rather than voxels surrounding the vessel, in agreement with our model of FRE.

      "All imaging data were slab-wise bias-field corrected using the N4BiasFieldCorrection (Tustison et al., 2010) tool in ANTs (Avants et al., 2009) with the default parameters. To compare the empirical FRE across the four different resolutions (Figure 5), manual masks were first created for the smallest part of the vessel in the image with the highest resolution and for the largest part of the vessel in the image with the lowest resolution. Then, rigid-body transformation parameters from the low-resolution to the high-resolution (and the high-resolution to the low-resolution) images were estimated using coregister in SPM (https://www.fil.ion.ucl.ac.uk/spm/), and their inverse was applied to the vessel mask using SPM’s reslice. To calculate the empirical FRE (Eq. (3)), the mean of the intensity values within the vessel mask was used to approximate the blood magnetization, and the mean of the intensity values one voxel outside of the vessel mask was used as the tissue magnetization."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, if the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE remains constant across resolutions (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not blood delivery time, which determines whether vessels can be resolved."

      Figure 5: Effect of voxel size on flow-related vessel enhancement. Thin axial maximum intensity projections containing a small artery acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic are shown. The FRE is estimated using the mean intensity value within the vessel masks depicted on the left, and the mean intensity values of the surrounding tissue. The small insert shows a section of the artery as it lies within a single slice. A reduction in voxel size is accompanied by a corresponding increase in FRE (red mask), whereas no further increase is obtained once the voxel size is equal or smaller than the vessel size (blue mask).

      After many internal discussions, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters in practice. In detail, we have reduced the voxel size but at the same time increased the acquisition time by increasing the number of encoding steps—which we have now also highlighted in the manuscript. We have, however, added additional considerations about balancing SNR and segmentation performance. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive.

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      5) The separation of arterial and venous components is a bit puzzling, partly because the methodology used is not fully explained, but also partly because the reasons invoked (flow artefact in large pial veins) do not match the results (many small vessels are included as veins). This question of separating both types of vessels is quite important for applications, so the whole procedure should be explained in detail. The use of short T2 seemed also sub-optimal, as both arteries and veins result in shorter T2 compared to most brain tissues: wouldn't a susceptibility-based measure (SWI or better QSM) provide a better separation? Finally, since the T2* map and the regular TOF map are at different resolutions, masking out the vessels labeled as veins will likely result in the smaller veins being left out.

      We agree that while the technical details of this approach were provided in the Data analysis section, the rationale behind it was only briefly mentioned. We have therefore included an additional section Inflow-artefacts in sinuses and pial veins in the Theory section of the manuscript. We have also extended the discussion of the advantages and disadvantages of the different susceptibility-based contrasts, namely T2, SWI and QSM. While in theory both T2 and QSM should allow the reliable differentiation of arterial and venous blood, we found T2* to perform more robustly, as QSM can fail in many places, e.g., due to the strong susceptibility sources within superior sagittal and transversal sinuses and pial veins and their proximity to the brain surface, dedicated processing is required (Stewart et al., 2022). Further, we have also elaborated in the Discussion section why the interpretation of Figure 9 regarding the absence or presence of small veins is challenging. Namely, the intensity-based segmentation used here provides only an incomplete segmentation even of the larger sinuses, because the overall lower intensity found in veins combined with the heterogeneity of the intensities in veins violates the assumptions made by most vascular segmentation approaches of homogenous, high image intensities within vessels, which are satisfied in arteries (page 29f) (see also the illustration below). Accordingly, quantifying the number of vessels labelled as veins (R1.4a) would provide misleading results, as often only small subsets of the same sinus or vein are segmented.

      "Inflow-artefacts in sinuses and pial veins

      Inflow in large pial veins and the sagittal and transverse sinuses can cause flow-related enhancement in these non-arterial vessels. One common strategy to remove this unwanted signal enhancement is to apply venous suppression pulses during the data acquisition, which saturate bloods spins outside the imaging slab. Disadvantages of this technique are the technical challenges of applying these pulses at ultra-high field due to constraints of the specific absorption rate (SAR) and the necessary increase in acquisition time (Conolly et al., 1988; Heverhagen et al., 2008; Johst et al., 2012; Maderwald et al., 2008; Schmitter et al., 2012; Zhang et al., 2015). In addition, optimal positioning of the saturation slab in the case of pial arteries requires further investigation, and in particular supressing signal from the superior sagittal sinus without interfering in the imaging of the pial arteries vasculature at the top of the cortex might prove challenging. Furthermore, this venous saturation strategy is based on the assumption that arterial blood is traveling head-wards while venous blood is drained foot-wards. For the complex and convoluted trajectory of pial vessels this directionality-based saturation might be oversimplified, particularly when considering the higher-order branches of the pial arteries and veins on the cortical surface. Inspired by techniques to simultaneously acquire a TOF image for angiography and a susceptibility-weighted image for venography (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008), we set out to explore the possibility of removing unwanted venous structures from the segmentation of the pial arterial vasculature during data postprocessing. Because arteries filled with oxygenated blood have T2-values similar to tissue, while veins have much shorter T2-values due to the presence of deoxygenated blood (Pauling and Coryell, 1936; Peters et al., 2007; Uludağ et al., 2009; Zhao et al., 2007), we used this criterion to remove vessels with short T2* values from the segmentation (see Data Analysis for details). In addition, we also explored whether unwanted venous structures in the high-resolution TOF images—where a two-echo acquisition is not feasible due to the longer readout—can be removed based on detecting them in a lower-resolution image."

      "Removal of pial veins

      Inflow in large pial veins and the superior sagittal and transverse sinuses can cause a flow-related enhancement in these non-arterial vessels (Figure 9, left). The higher concentration of deoxygenated haemoglobin in these vessels leads to shorter T2 values (Pauling and Coryell, 1936), which can be estimated using a two-echo TOF acquisition (see also Inflow-artefacts in sinuses and pial veins). These vessels can be identified in the segmentation based on their T2 values (Figure 9, left), and removed from the angiogram (Figure 9, right) (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008). In particular, the superior and inferior sagittal and the transversal sinuses and large veins which exhibited an inhomogeneous intensity profile and a steep loss of intensity at the slab boundary were identified as non-arterial (Figure 9, left). Further, we also explored the option of removing unwanted venous vessels from the high-resolution TOF image (Figure 7) using a low-resolution two-echo TOF (not shown). This indeed allowed us to remove the strong signal enhancement in the sagittal sinuses and numerous larger veins, although some small veins, which are characterised by inhomogeneous intensity profiles and can be detected visually by experienced raters, remain."

      Figure 9: Removal of non-arterial vessels in time-of-flight imaging. LEFT: Segmentation of arteries (red) and veins (blue) using T_2^ estimates. RIGHT: Time-of-flight angiogram after vein removal.*

      Our approach also assumes that the unwanted veins are large enough that they are also resolved in the low-resolution image. If we consider the source of the FRE effect, it might indeed be exclusively large veins that are present in TOF-MRA data, which would suggest that our assumption is valid. Fundamentally, the FRE depends on the inflow of un-saturated spins into the imaging slab. However, small veins drain capillary beds in the local tissue, i.e. the tissue within the slab. (Note that due to the slice oversampling implemented in our acquisition, spins just above or below the slab will also be excited.) Thus, small veins only contain blood water spins that have experienced a large number of RF pulses due to the long transit time through the pial arterial vasculature, the capillaries and the intracortical venules. Hence, their longitudinal magnetization would be similar to that of stationary tissue. To generate an FRE effect in veins, “pass-through” venous blood from outside the imaging slab is required. This is only available in veins that are passing through the imaging slab, which have much larger diameters. These theoretical considerations are corroborated by the findings in Figure 9, where large disconnected vessels with varying intensity profiles were identified as non-arterial. Due to the heterogenous intensity profiles in large veins and the sagittal and transversal sinuses, the intensity-based segmentation applied here may only label a subset of the vessel lumen, creating the impression of many small veins. This is particularly the case for the straight and inferior sagittal sinus in the bottom slab of Figure 9. Nevertheless, future studies potentially combing anatomical prior knowledge, advanced segmentation algorithms and susceptibility measures would be capable of removing these unwanted veins in post-processing to enable an efficient TOF-MRA image acquisition dedicated to optimally detecting small arteries without the need for additional venous suppression RF pulses.

      6) A more general question also is why this imaging method is limited to pial vessels: at 140 microns, the larger intra-cortical vessels should be appearing (group 6 in Duvernoy, 1981: diameters between 50 and 240 microns). Are there other reasons these vessels are not detected? Similarly, it seems there is no arterial vasculature detected in the white matter here: it is due to the rather superior location of the imaging slab, or a limitation of the method? Likewise, all three results focus on a rather homogeneous region of cerebral cortex, in terms of vascularisation. It would be interesting for applications to demonstrate the capabilities of the method in more complex regions, e.g. the densely vascularised cerebellum, or more heterogeneous regions like the midbrain. Finally, it is notable that all three subjects appear to have rather different densities of vessels, from sparse (participant II) to dense (participant I), with some inhomogeneities in density (frontal region in participant III) and inconsistencies in detection (sinuses absent in participant II). All these points should be discussed.

      While we are aware that the diameter of intracortical arteries has been suggested to be up to 240 µm (Duvernoy et al., 1981), it remains unclear how prevalent intracortical arteries of this size are. For example, note that in a different context in the Duvernoy study (in teh revised manuscript), the following values are mentioned (which we followed in Figure 1):

      “Central arteries of the Iobule always have a large diameter of 260 µ to 280 µ, at their origin. Peripheral arteries have an average diameter of 150 µ to 180 µ. At the cortex surface, all arterioles of 50 µ or less, penetrate the cortex or form anastomoses. The diameter of most of these penetrating arteries is approximately 40 µ.”

      Further, the examinations by Hirsch et al. (2012) (albeit in the macaque brain), showed one (exemplary) intracortical artery belonging to group 6 (Figure 1B), whose diameter appears to be below 100 µm. Given these discrepancies and the fact that intracortical arteries in group 5 only reach 75 µm, we suspect that intracortical arteries with diameters > 140 µm are a very rare occurrence, which we might not have encountered in this data set.

      Similarly, arteries in white matter (Nonaka et al., 2003) and the cerebellum (Duvernoy et al., 1983) are beyond our resolution at the moment. The midbrain is an interesting suggesting, although we believe that the cortical areas chosen here with their gradual reduction in diameter along the vascular tree, provide a better illustration of the effect of voxel size than the rather abrupt reduction in vascular diameter found in the midbrain. We have added the even higher resolution requirements in the discussion section:

      "In summary, we expect high-resolution TOF-MRA to be applicable also for group studies, to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. Notably, we have focused on imaging pial arteries of the human cerebrum; however, other brain structures such as the cerebellum, subcortex and white matter are of course also of interest. While the same theoretical considerations apply, imaging the arterial vasculature in these structures will require even smaller voxel sizes due to their smaller arterial diameters (Duvernoy et al., 1983, 1981; Nonaka et al., 2003)."

      Regarding the apparent sparsity of results from participant II, this is mostly driven by the much smaller coverage in this subject (19.6 mm in Participant II vs. 50 mm and 58 mm in Participant I and III, respectively). The reduction in density in the frontal regions might indeed constitute difference in anatomy or might be driven by the presence or more false-positive veins in Participant I than Participant III in these areas. Following the depiction in Duvernoy et al. (1981), one would not expect large arteries in frontal areas, but large veins are common. Thus, the additional vessels in Participant I in the frontal areas might well be false-positive veins, and their removal would result in similar densities for both participants. Indeed, as pointed out in section Future directions, we would expect a lower arterial density in frontal and posterior areas than in middle areas. The sinuses (and other large false-positive veins) in Participant II have been removed as outlined and discussed in sections Removal of pial veins and Challenges for vessel segmentation algorithms, respectively.

      7) One of the main practical limitations of the proposed method is the use of a very small imaging slab. It is mentioned in the discussion that thicker slabs are not only possible, but beneficial both in terms of SNR and acceleration possibilities. What are the limitations that prevented their use in the present study? With the current approach, what would be the estimated time needed to acquire the vascular map of an entire brain? It would also be good to indicate whether specific processing was needed to stitch together the multiple slab images in Fig. 6-9, S2.

      Time-of-flight acquisitions are commonly performed with thin acquisition slabs, following initial investigations by Parker et al. (1991) to maximise vessel sensitivity and minimize noise. We therefore followed this practice for our initial investigations but wanted to point out in the discussion that thicker slabs might provide several advantages that need to be evaluated in future studies. This would include theoretical and empirical evaluations balancing SNR gains from larger excitation volumes and SNR losses due to more acceleration. For this study, we have chosen the slab thickness such as to keep the acquisition time at a reasonable amount to minimize motion artefacts (as outlined in the Discussion). In addition, due to the extreme matrix sizes in particular for the 0.14 mm acquisition, we were also limited in the number of data points per image that can be indexed. This would require even more substantial changes to the sequence than what we have already performed. With 16 slabs, assuming optimal FOV orientation, full-brain coverage including the cerebellum of 95 % of the population (Mennes et al., 2014) could be achieved with an acquisition time of (16  11 min 42 s = 3 h 7 min 12 s) at 0.16 mm isotropic voxel size. No stitching of the individual slabs was performed, as subject motion was minimal. We have added a corresponding comment in the Data Analysis.

      "Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied as subject motion was minimal. The Matlab code describing the segmentation algorithm as well es the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in the github repository (https://gitlab.com/SaskiaB/pialvesseltof.git)."

      8) Some researchers and clinicians will argue that you can attain best results with anisotropic voxels, combining higher SNR and higher resolution. It would be good to briefly mention why isotropic voxels are preferred here, and whether anisotropic voxels would make sense at all in this context.

      Anisotropic voxels can be advantageous if the underlying object is anisotropic, e.g. an artery running straight through the slab, which would have a certain diameter (imaged using the high-resolution plane) and an ‘infinite’ elongation (in the low-resolution direction). However, the vessels targeted here can have any orientation and curvature; an anisotropic acquisition could therefore introduce a bias favouring vessels with a particular orientation relative to the voxel grid. Note that the same argument applies when answering the question why a further reduction slab thickness would eventually result in less increase in FRE (section Introducing a partial-volume model). We have added a corresponding comment in our discussion on practical imaging considerations:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and a larger field-of-view to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      Reviewer #2 (Public Review):

      Overview

      This paper explores the use of inflow contrast MRI for imaging the pial arteries. The paper begins by providing a thorough background description of pial arteries, including past studies investigating the velocity and diameter. Following this, the authors consider this information to optimize the contrast between pial arteries and background tissue. This analysis reveals spatial resolution to be a strong factor influencing the contrast of the pial arteries. Finally, experiments are performed on a 7T MRI to investigate: the effect of spatial resolution by acquiring images at multiple resolutions, demonstrate the feasibility of acquiring ultrahigh resolution 3D TOF, the effect of displacement artifacts, and the prospect of using T2* to remove venous voxels.

      Impression

      There is certainly interest in tools to improve our understanding of the architecture of the small vessels of the brain and this work does address this. The background description of the pial arteries is very complete and the manuscript is very well prepared. The images are also extremely impressive, likely benefiting from motion correction, 7T, and a very long scan time. The authors also commit to open science and provide the data in an open platform. Given this, I do feel the manuscript to be of value to the community; however, there are concerns with the methods for optimization, the qualitative nature of the experiments, and conclusions drawn from some of the experiments.

      Specific Comments :

      1) Figure 3 and Theory surrounding. The optimization shown in Figure 3 is based fixing the flip angle or the TR. As is well described in the literature, there is a strong interdependency of flip angle and TR. This is all well described in literature dating back to the early 90s. While I think it reasonable to consider these effects in optimization, the language needs to include this interdependency or simply reference past work and specify how the flip angle was chosen. The human experiments do not include any investigation of flip angle or TR optimization.

      We thank the reviewer for raising this valuable point, and we fully agree that there is an interdependency between these two parameters. To simplify our optimization, we did fix one parameter value at a time, but in the revised manuscript we clarified that both parameters can be optimized simultaneously. Importantly, a large range of parameter values will result in a similar FRE in the small artery regime, which is illustrated in the optimization provided in the main text. We have therefore chosen the repetition time based on encoding efficiency and then set a corresponding excitation flip angle. In addition, we have also provided additional simulations in the supplementary material outlining the interdependency for the case of pial arteries.

      "Optimization of repetition time and excitation flip angle

      As the main goal of the optimisation here was to start within an already established parameter range for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007), we only needed to then further tailor these for small arteries by considering a third parameter, namely the blood delivery time. From a practical perspective, a TR of 20 ms as a reference point was favourable, as it offered a time-efficient readout minimizing wait times between excitations but allowing low encoding bandwidths to maximize SNR. Due to the interdependency of flip angle and repetition time, for any one blood delivery time any FRE could (in theory) be achieved. For example, a similar FRE curve at 18 ° flip angle and 5 ms TR can also be achieved at 28 ° flip angle and 20 ms TR; or the FRE curve at 18 ° flip angle and 30 ms TR is comparable to the FRE curve at 8 ° flip angle and 5 ms TR (Supplementary Figure 3 TOP). In addition, the difference between optimal parameter settings diminishes for long blood delivery times, such that at a blood delivery time of 500 ms (Supplementary Figure 3 BOTTOM), the optimal flip angle at a TR of 15 ms, 20 ms or 25 ms would be 14 °, 16 ° and 18 °, respectively. This is in contrast to a blood delivery time of 100 ms, where the optimal flip angles would be 32 °, 37 ° and 41 °. In conclusion, in the regime of small arteries, long TR values in combination with low flip angles ensure flow-related enhancement at blood delivery times of 200 ms and above, and within this regime there are marginal gains by further optimizing parameter values and the optimal values are all similar."

      Supplementary Figure 3: Optimal imaging parameters for small arteries. This assessment follows the simulations presented in Figure 3, but in addition shows the interdependency for the corresponding third parameter (either flip angle or repetition time). TOP: Flip angles close to the Ernst angle show only a marginal flow-related enhancement; however, the influence of the blood delivery time decreases further (LEFT). As the flip angle increases well above the values used in this study, the flow-related enhancement in the small artery regime remains low even for the longer repetition times considered here (RIGHT). BOTTOM: The optimal excitation flip angle shows reduced variability across repetition times in the small artery regime compared to shorter blood delivery times.

      "Based on these equations, optimal T_R and excitation flip angle values (θ) can be calculated for the blood delivery times under consideration (Figure 3). To better illustrate the regime of small arteries, we have illustrated the effect of either flip angle or T_R while keeping the other parameter values fixed to the value that was ultimately used in the experiments; although both parameters can also be optimized simultaneously (Haacke et al., 1990). Supplementary Figure 3 further delineates the interdependency between flip angle and T_R within a parameter range commonly used for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007). Note how longer T_R values still provide an FRE effect even at very long blood delivery times, whereas using shorter T_R values can suppress the FRE effect (Figure 3, left). Similarly, at lower flip angles the FRE effect is still present for long blood delivery times, but it is not available anymore at larger flip angles, which, however, would give maximum FRE for shorter blood delivery times (Figure 3, right). Due to the non-linear relationships of both blood delivery time and flip angle with FRE, the optimal imaging parameters deviate considerably when comparing blood delivery times of 100 ms and 300 ms, but the differences between 300 ms and 1000 ms are less pronounced. In the following simulations and measurements, we have thus used a T_R value of 20 ms, i.e. a value only slightly longer than the readout of the high-resolution TOF acquisitions, which allowed time-efficient data acquisition, and a nominal excitation flip angle of 18°. From a practical standpoint, these values are also favorable as the low flip angle reduces the specific absorption rate (Fiedler et al., 2018) and the long T_R value decreases the potential for peripheral nerve stimulation (Mansfield and Harvey, 1993)."

      2) Figure 4 and Theory surrounding. A major limitation of this analysis is the lack of inclusion of noise in the analysis. I believe the results to be obvious that the FRE will be modulated by partial volume effects, here described quadratically by assuming the vessel to pass through the voxel. This would substantially modify the analysis, with a shift towards higher voxel volumes (scan time being equal). The authors suggest the FRE to be the dominant factor effecting segmentation; however, segmentation is limited by noise as much as contrast.

      We of course agree with the reviewer that contrast-to-noise ratio is a key factor that determines the detection of vessels and the quality of the segmentation, however there are subtleties regarding the exact inter-relationship between CNR, resolution, and segmentation performance.

      The main purpose of Figure 4 is not to provide a trade-off between flow-related enhancement and signal-to-noise ratio—in particular as SNR is modulated by many more factors than voxel size alone, e.g. acquisition time, coil geometry and instrumentation—but to decide whether the limiting factor for imaging pial arteries is the reduction in flow-related enhancement due to long blood delivery times (which is the explanation often found in the literature (Chen et al., 2018; Haacke et al., 1990; Masaryk et al., 1989; Mut et al., 2014; Park et al., 2020; Parker et al., 1991; Wilms et al., 2001; Wright et al., 2013)) or due to partial volume effects. Furthermore, when reducing voxel size one will also likely increase the number of encoding steps to maintain the imaging coverage (i.e., the field-of-view) and so the relationship between voxel size and SNR in practice is not straightforward. Therefore, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study, namely that it provides an expression for how relative FRE contrast changes with voxel size with some assumptions that apply for imaging pial arteries.

      Further, depending on the definition of FRE and whether partial-volume effects are included (see also our response to R2.8), larger voxel volumes have been found to be theoretically advantageous even when only considering contrast (Du et al., 1996; Venkatesan and Haacke, 1997), which is not in line with empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007).

      The notion that vessel segmentation algorithms perform well on noisy data but poorly on low-contrast data was mainly driven by our own experiences. However, we still believe that the assumption that (all) segmentation algorithms are linearly dependent on contrast and noise (which the formulation of a contrast-to-noise ratio presumes) is similarly not warranted. Indeed, the necessary trade-off between FRE and SNR might be specific to the particular segmentation algorithm being used than a general property of the acquisition. Please also note that our analysis of the FRE does not suggest that an arbitrarily high resolution is needed. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive. But we take the reviewer’s point and also acknowledge that these intricacies need to be mentioned, and therefore we have rephrased the statement in the discussion in the following way:

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      3) Page 11, Line 225. "only a fraction of the blood is replaced" I think the language should be reworded. There are certainly water molecules in blood which have experience more excitation B1 pulses due to the parabolic flow upstream and the temporal variation in flow. There is magnetization diffusion which reduces the discrepancy; however, it seems pertinent to just say the authors assume the signal is represented by the average arrival time. This analysis is never verified and is only approximate anyways. The "blood dwell time" is also an average since voxels near the wall will travel more slowly. Overall, I recommend reducing the conjecture in this section.

      We fully agree that our treatment of the blood dwell time does not account for the much more complex flow patterns found in cortical arteries. However, our aim was not do comment on these complex patterns, but to help establish if, in the simplest scenario assuming plug flow, the often-mentioned slow blood flow requires multiple velocity compartments to describe the FRE (as is commonly done for 2D MRA (Brown et al., 2014a; Carr and Carroll, 2012)). We did not intend to comment on the effects of laminar flow or even more complex flow patterns, which would require a more in-depth treatment. However, as the small arteries targeted here are often just one voxel thick, all signals are indeed integrated within that voxel (i.e. there is no voxel near the wall that travels more slowly), which may average out more complex effects. We have clarified the purpose and scope of this section in the following way:

      "In classical descriptions of the FRE effect (Brown et al., 2014a; Carr and Carroll, 2012), significant emphasis is placed on the effect of multiple “velocity segments” within a slice in the 2D imaging case. Using the simplified plug-flow model, where the cross-sectional profile of blood velocity within the vessel is constant and effects such as drag along the vessel wall are not considered, these segments can be described as ‘disks’ of blood that do not completely traverse through the full slice within one T_R, and, thus, only a fraction of the blood in the slice is replaced. Consequently, estimation of the FRE effect would then need to accommodate contribution from multiple ‘disks’ that have experienced 1 to k RF pulses. In the case of 3D imaging as employed here, multiple velocity segments within one voxel are generally not considered, as the voxel sizes in 3D are often smaller than the slice thickness in 2D imaging and it is assumed that the blood completely traverses through a voxel each T_R. However, the question arises whether this assumption holds for pial arteries, where blood velocity is considerably lower than in intracranial vessels (Figure 2). To answer this question, we have computed the blood dwell time , i.e. the average time it takes the blood to traverse a voxel, as a function of blood velocity and voxel size (Figure 2). For reference, the blood velocity estimates from the three studies mentioned above (Bouvy et al., 2016; Kobari et al., 1984; Nagaoka and Yoshida, 2006) have been added in this plot as horizontal white lines. For the voxel sizes of interest here, i.e. 50–300 μm, blood dwell times are, for all but the slowest flows, well below commonly used repetition times (Brown et al., 2014a; Carr and Carroll, 2012; Ladd, 2007; von Morze et al., 2007). Thus, in a first approximation using the plug-flow model, it is not necessary to include several velocity segments for the voxel sizes of interest when considering pial arteries, as one might expect from classical treatments, and the FRE effect can be described by equations (1) – (3), simplifying our characterization of FRE for these vessels. When considering the effect of more complex flow patterns, it is important to bear in mind that the arteries targeted here are only one-voxel thick, and signals are integrated across the whole artery."

      4) Page 13, Line 260. "two-compartment modelling" I think this section is better labeled "Extension to consider partial volume effects" The compartments are not interacting in any sense in this work.

      Thank you for this suggestion. We have replaced the heading with Introducing a partial-volume model (page 14) and replaced all instances of ‘two-compartment model’ with ‘partial-volume model’.

      5) Page 14, Line 284. "In practice, a reduction in slab …." "reducing the voxel size is a much more promising avenue" There is a fair amount on conjecture here which is not supported by experiments. While this may be true, the authors also use a classical approach with quite thin slabs.

      The slab thickness used in our experiments was mainly limited by the acquisition time and the participants ability to lie still. We indeed performed one measurement with a very experienced participant with a thicker slab, but found that with over 20 minutes acquisition time, motion artefacts were unavoidable. The data presented in Figure 5 were acquired with similar slab thickness, supporting the statement that reducing the voxel size is a promising avenue for imaging small pial arteries. However, we indeed have not provided an empirical comparison of the effect of slab thickness. Nevertheless, we believe it remains useful to make the theoretical argument that due to the convoluted nature of the pial arterial vascular geometry, a reduction in slab thickness may not reduce the acquisition time if no reduction in intra-slab vessel length can be achieved, i.e. if the majority of the artery is still contained in the smaller slab. We have clarified the statement and removed the direct comparison (‘much more’ promising) in the following way:

      "In theory, a reduction in blood delivery time increases the FRE in both regimes, and—if the vessel is smaller than the voxel—so would a reduction in voxel size. In practice, a reduction in slab thickness―which is the default strategy in classical TOF-MRA to reduce blood delivery time―might not provide substantial FRE increases for pial arteries. This is due to their convoluted geometry (see section Anatomical architecture of the pial arterial vasculature), where a reduction in slab thickness may not necessarily reduce the vessel segment length if the majority of the artery is still contained within the smaller slab. Thus, given the small arterial diameter, reducing the voxel size is a promising avenue when imaging the pial arterial vasculature."

      6) Figure 5. These image differences are highly exaggerated by the lack of zero filling (or any interpolation) and the fact that the wildly different. The interpolation should be addressed, and the scan time discrepancy listed as a limitation.

      We have extended the discussion around zero-filling by including additional considerations based on the imaging parameters in Figure 5 and highlighted the substantial differences in voxel volume. Our choice not to perform zero-filling was driven by the open question of what an ‘optimal’ zero-filling factor would be. We have also highlighted the substantial differences in acquisition time when describing the results.

      Changes made to the results section:

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result."

      Changes made to the discussion section:

      "Nevertheless, slight qualitative improvements in image appearance have been reported for higher zero-filling factors (Du et al., 1994), presumably owing to a smoother representation of the vessels (Bartholdi and Ernst, 1973). In contrast, Mattern et al. (2018) reported no improvement in vessel contrast for their high-resolution data. Ultimately, for each application, e.g. visual evaluation vs. automatic segmentation, the optimal zero-filling factor needs to be determined, balancing image appearance (Du et al., 1994; Zhu et al., 2013) with loss in statistical independence of the image noise across voxels. For example, in Figure 5, when comparing across different voxel sizes, the visual impression might improve with zero-filling. However, it remains unclear whether the same zero-filling factor should be applied for each voxel size, which means that the overall difference in resolution remains, namely a nearly 20-fold reduction in voxel volume when moving from 0.8-mm isotropic to 0.3-mm isotropic voxel size. Alternatively, the same ’zero-filled’ voxel sizes could be used for evaluation, although then nearly 94 % of the samples used to reconstruct the image with 0.8-mm voxel size would be zero-valued for a 0.3-mm isotropic resolution. Consequently, all data presented in this study were reconstructed without zero-filling."

      7) Figure 7. Given the limited nature of experiment may it not also be possible the subject moved more, had differing brain blood flow, etc. Were these lengthy scans acquired in the same session? Many of these differences could be attributed to other differences than the small difference in spatial resolution.

      The scans were acquired in the same session using the same prospective motion correction procedure. Note that the acquisition time of the images with 0.16 mm isotropic voxel size was comparatively short, taking just under 12 minutes. Although the difference in spatial resolution may seem small, it still amounts to a 33% reduction in voxel volume. For comparison, reducing the voxel size from 0.4 mm to 0.3 mm also ‘only’ reduces the voxel volume by 58 %—not even twice as much. Overall, we fully agree that additional validation and optimisation of the imaging parameters for pial arteries are beneficial and have added a corresponding statement to the Discussion section.

      Changes made to the results section (also in response to Reviewer 1 (R1.22))

      "We have also acquired one single slab with an isotropic voxel size of 0.16 mm with prospective motion correction for this participant in the same session to compare to the acquisition with 0.14 mm isotropic voxel size and to test whether any gains in FRE are still possible at this level of the vascular tree."

      Changes made to the discussion section:

      "Acquiring these data at even higher field strengths would boost SNR (Edelstein et al., 1986; Pohmann et al., 2016) to partially compensate for SNR losses due to acceleration and may enable faster imaging and/or smaller voxel sizes. This could facilitate the identification of the ultimate limit of the flow-related enhancement effect and identify at which stage of the vascular tree does the blood delivery time become the limiting factor. While Figure 7 indicates the potential for voxel sizes below 0.16 mm, the singular nature of this comparison warrants further investigations."

      8) Page 22, Line 395. Would the analysis be any different with an absolute difference? The FRE (Eq 6) divides by a constant value. Clearly there is value in the difference as other subtractive inflow imaging would have infinite FRE (not considering noise as the authors do).

      Absolutely; using an absolute FRE would result in the highest FRE for the largest voxel size, whereas in our data small vessels are more easily detected with the smallest voxel size. We also note that relative FRE would indeed become infinite if the value in the denominator representing the tissue signal was zero, but this special case highlights how relative FRE can help characterize “segmentability”: a vessel with any intensity surrounded by tissue with an intensity of zero is trivially/infinitely segmentatble. We have added this point to the revised manuscript as indicated below.

      Following the suggestion of Reviewer 1 (R1.2), we have included additional simulations to clarify the effects of relative FRE definition and partial-volume model, in which we show that only when considering both together are smaller voxel sizes advantageous (Supplementary Material).

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the effect of these two definitions, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm and 2 000 µm (i.e. no partial-volume effects). The absolute FRE explicitly takes the voxel volume into account, i.e. instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      Note that the division by

      to obtain the relative FRE removes the contribution of the total voxel volume

      "Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      Following the established literature (Brown et al., 2014a; Carr and Carroll, 2012; Haacke et al., 1990) and because we would ultimately derive a relative measure, we have omitted the effect of voxel volume on the longitudinal magnetization in our derivations, which make it appear as if we are dividing by a constant in Eq. 6, as the effect of total voxel volume cancels out for the relative FRE. We have now made this more explicit in our derivation of the partial volume model.

      "Introducing a partial-volume model

      To account for the effect of voxel volume on the FRE, the total longitudinal magnetization M_z needs to also consider the number of spins contained within in a voxel (Du et al., 1996; Venkatesan and Haacke, 1997). A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:"

      A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:

      Eq. (4)

      For simplicity, we assume a single vessel is located at the center of the voxel and approximate it to be a cylinder with diameter d_vessel and length l_voxel of an assumed isotropic voxel along one side. The relative volume fraction of blood V_rel^blood is the ratio of vessel volume within the voxel to total voxel volume (see section Estimation of vessel-volume fraction in the Supplementary Material), and the tissue volume fraction V_rel^tissue is the remainder that is not filled with blood, or

      Eq. (5)

      We can now replace the blood magnetization in equation Eq. (3) with the total longitudinal magnetization of the voxel to compute the FRE as a function of vessel-volume fraction:

      Eq. (6)

      Based on your suggestion, we have also extended our interpretation of relative and absolute FRE. Indeed, a subtractive flow technique where no signal in the background remains and only intensities in the object are present would have infinite relative FRE, as this basically constitutes a perfect segmentation (bar a simple thresholding step).

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 9). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      9) Page 22, Line 400. "The appropriateness of " This also ignores noise. The absolute enhancement is the inherent magnetization available. The results in Figure 5, 6, 7 don't readily support a ratio over and absolute difference accounting for partial volume effects.

      We hope that with the additional explanations on the effects of relative FRE definition in combination with a partial-volume model and the interpretation of relative FRE provided in the previous response (R2.8) and that Figures 5, 6 and 7 show smaller arteries for smaller voxels, we were able to clarify our argument why only relative FRE in combination with a partial volume model can explain why smaller voxel sizes are advantageous for depicting small arteries.

      While we appreciate that there exists a fundamental relationship between SNR and voxel volume in MR (Brown et al., 2014b), this relationship is also modulated by many more factors (as we have argued in our responses to R2.2 and R1.4b).

      We hope that the additional derivations and simulations provided in the previous response have clarified why a relative FRE model in combination with a partial-volume model helps to explain the enhanced detectability of small vessels with small voxels.

      10) Page 24, Line 453. "strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact" These do observe flow related distortions as well, just not typically called displacement.

      Yes, this is a helpful point, as these methods will also experience a degradation of spatial accuracy due to flow effects, which will propagate into errors in the segmentation.

      As the reviewer suggests, flow-related artefacts in radial and spiral acquisitions usually manifest as a slight blur, and less as the prominent displacement found in Cartesian sampling schemes. We have added a corresponding clarification to the Discussion section:

      "Other encoding strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact because phase and frequency encoding take place in the same instant; although a slight blur might be observed instead (Nishimura et al., 1995, 1991). However, both trajectories pose engineering challenges and much higher demands on hardware and reconstruction algorithms than the Cartesian readouts employed here (Kasper et al., 2018; Shu et al., 2016); particularly to achieve 3D acquisitions with 160 µm isotropic resolution."

      11) Page 24, Line 272. "although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated" This is certainly a potential source of bias in the comparisons.

      We apologize if this section was written in a misleading way. For the comparison presented in Figure 7, we acquired one additional slab in the same session at 0.16 mm voxel size using the same prospective motion correction procedure as for the 0.14 mm data. For the images shown in Figure 6 and Supplementary Figure 4 at 0.16 mm voxel size, we did not use a motion correction system and, thus, had to discard a portion of the data. We have clarified that for the comparison of the high-resolution data, prospective motion correction was used for both resolutions. We have clarified this in the Discussion section:

      "This allowed for the successful correction of head motion of approximately 1 mm over the 60-minute scan session, showing the utility of prospective motion correction at these very high resolutions. Note that for the comparison in Figure 7, one slab with 0.16 mm voxel size was acquired in the same session also using the prospective motion correction system. However, for the data shown in Figure 6 and Supplementary Figure 4, no prospective motion correction was used, and we instead relied on the experienced participants who contributed to this study. We found that the acquisition of TOF data with 0.16 mm isotropic voxel size in under 12 minutes acquisition time per slab is possible without discernible motion artifacts, although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated."

      12) Page 25, Line 489. "then need to include the effects of various analog and digital filters" While the analysis may benefit from some of this, most is not at all required for analysis based on optimization of the imaging parameters.

      We have included all four correction factors for completeness, given the unique acquisition parameter and contrast space our time-of-flight acquisition occupies, e.g. very low bandwidth of only 100 Hz, very large matrix sizes > 1024 samples, ideally zero SNR in the background (fully supressed tissue signal). However, we agree that probably the most important factor is the non-central chi distribution of the noise in magnitude images from multiple-channel coil arrays, and have added this qualification in the text:

      "Accordingly, SNR predictions then need to include the effects of various analog and digital filters, the number of acquired samples, the noise covariance correction factor, and—most importantly—the non-central chi distribution of the noise statistics of the final magnitude image (Triantafyllou et al., 2011)."

      Al-Kwifi, O., Emery, D.J., Wilman, A.H., 2002. Vessel contrast at three Tesla in time-of-flight magnetic resonance angiography of the intracranial and carotid arteries. Magnetic Resonance Imaging 20, 181–187. https://doi.org/10.1016/S0730-725X(02)00486-1

      Arts, T., Meijs, T.A., Grotenhuis, H., Voskuil, M., Siero, J., Biessels, G.J., Zwanenburg, J., 2021. Velocity and Pulsatility Measures in the Perforating Arteries of the Basal Ganglia at 3T MRI in Reference to 7T MRI. Frontiers in Neuroscience 15. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight j 2, 1–35. Bae, K.T., Park, S.-H., Moon, C.-H., Kim, J.-H., Kaya, D., Zhao, T., 2010. Dual-echo arteriovenography imaging with 7T MRI: CODEA with 7T. J. Magn. Reson. Imaging 31, 255–261. https://doi.org/10.1002/jmri.22019

      Bartholdi, E., Ernst, R.R., 1973. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9–19. https://doi.org/10.1016/0022-2364(73)90076-0

      Bernier, M., Cunnane, S.C., Whittingstall, K., 2018. The morphology of the human cerebrovascular system. Human Brain Mapping 39, 4962–4975. https://doi.org/10.1002/hbm.24337

      Bouvy, W.H., Biessels, G.J., Kuijf, H.J., Kappelle, L.J., Luijten, P.R., Zwanenburg, J.J.M., 2014. Visualization of Perivascular Spaces and Perforating Arteries With 7 T Magnetic Resonance Imaging: Investigative Radiology 49, 307–313. https://doi.org/10.1097/RLI.0000000000000027

      Bouvy, W.H., Geurts, L.J., Kuijf, H.J., Luijten, P.R., Kappelle, L.J., Biessels, G.J., Zwanenburg, J.J.M., 2016. Assessment of blood flow velocity and pulsatility in cerebral perforating arteries with 7-T quantitative flow MRI: Blood Flow Velocity And Pulsatility In Cerebral Perforating Arteries. NMR Biomed. 29, 1295–1304. https://doi.org/10.1002/nbm.3306

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014a. Chapter 24 - MR Angiography and Flow Quantification, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 701–737. https://doi.org/10.1002/9781118633953.ch24

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014b. Chapter 15 - Signal, Contrast, and Noise, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 325–373. https://doi.org/10.1002/9781118633953.ch15

      Carr, J.C., Carroll, T.J., 2012. Magnetic resonance angiography: principles and applications. Springer, New York. Cassot, F., Lauwers, F., Fouard, C., Prohaska, S., Lauwers-Cances, V., 2006. A Novel Three-Dimensional Computer-Assisted Method for a Quantitative Study of Microvascular Networks of the Human Cerebral Cortex. Microcirculation 13, 1–18. https://doi.org/10.1080/10739680500383407

      Chen, L., Mossa-Basha, M., Balu, N., Canton, G., Sun, J., Pimentel, K., Hatsukami, T.S., Hwang, J.-N., Yuan, C., 2018. Development of a quantitative intracranial vascular features extraction tool on 3DMRA using semiautomated open-curve active contour vessel tracing: Comprehensive Artery Features Extraction From 3D MRA. Magn. Reson. Med 79, 3229–3238. https://doi.org/10.1002/mrm.26961

      Choi, U.-S., Kawaguchi, H., Kida, I., 2020. Cerebral artery segmentation based on magnetization-prepared two rapid acquisition gradient echo multi-contrast images in 7 Tesla magnetic resonance imaging. NeuroImage 222, 117259. https://doi.org/10.1016/j.neuroimage.2020.117259

      Conolly, S., Nishimura, D., Macovski, A., Glover, G., 1988. Variable-rate selective excitation. Journal of Magnetic Resonance (1969) 78, 440–458. https://doi.org/10.1016/0022-2364(88)90131-X

      Deistung, A., Dittrich, E., Sedlacik, J., Rauscher, A., Reichenbach, J.R., 2009. ToF-SWI: Simultaneous time of flight and fully flow compensated susceptibility weighted imaging. J. Magn. Reson. Imaging 29, 1478–1484. https://doi.org/10.1002/jmri.21673

      Detre, J.A., Leigh, J.S., Williams, D.S., Koretsky, A.P., 1992. Perfusion imaging. Magnetic Resonance in Medicine 23, 37–45. https://doi.org/10.1002/mrm.1910230106

      Du, Y., Parker, D.L., Davis, W.L., Blatter, D.D., 1993. Contrast-to-Noise-Ratio Measurements in Three-Dimensional Magnetic Resonance Angiography. Investigative Radiology 28, 1004–1009. Du, Y.P., Jin, Z., 2008. Simultaneous acquisition of MR angiography and venography (MRAV). Magn. Reson. Med. 59, 954–958. https://doi.org/10.1002/mrm.21581

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., 1994. Reduction of partial-volume artifacts with zero-filled interpolation in three-dimensional MR angiography. J. Magn. Reson. Imaging 4, 733–741. https://doi.org/10.1002/jmri.1880040517

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., Buswell, H.R., Goodrich, K.C., 1996. Experimental and theoretical studies of vessel contrast-to-noise ratio in intracranial time-of-flight MR angiography. Journal of Magnetic Resonance Imaging 6, 99–108. https://doi.org/10.1002/jmri.1880060120

      Duvernoy, H., Delon, S., Vannson, J.L., 1983. The Vascularization of The Human Cerebellar Cortex. Brain Research Bulletin 11, 419–480. Duvernoy, H.M., Delon, S., Vannson, J.L., 1981. Cortical blood vessels of the human brain. Brain Research Bulletin 7, 519–579. https://doi.org/10.1016/0361-9230(81)90007-1

      Eckstein, K., Bachrata, B., Hangel, G., Widhalm, G., Enzinger, C., Barth, M., Trattnig, S., Robinson, S.D., 2021. Improved susceptibility weighted imaging at ultra-high field using bipolar multi-echo acquisition and optimized image processing: CLEAR-SWI. NeuroImage 237, 118175. https://doi.org/10.1016/j.neuroimage.2021.118175

      Edelstein, W.A., Glover, G.H., Hardy, C.J., Redington, R.W., 1986. The intrinsic signal-to-noise ratio in NMR imaging. Magn. Reson. Med. 3, 604–618. https://doi.org/10.1002/mrm.1910030413

      Fan, A.P., Govindarajan, S.T., Kinkel, R.P., Madigan, N.K., Nielsen, A.S., Benner, T., Tinelli, E., Rosen, B.R., Adalsteinsson, E., Mainero, C., 2015. Quantitative oxygen extraction fraction from 7-Tesla MRI phase: reproducibility and application in multiple sclerosis. J Cereb Blood Flow Metab 35, 131–139. https://doi.org/10.1038/jcbfm.2014.187

      Fiedler, T.M., Ladd, M.E., Bitz, A.K., 2018. SAR Simulations & Safety. NeuroImage 168, 33–58. https://doi.org/10.1016/j.neuroimage.2017.03.035

      Frässle, S., Aponte, E.A., Bollmann, S., Brodersen, K.H., Do, C.T., Harrison, O.K., Harrison, S.J., Heinzle, J., Iglesias, S., Kasper, L., Lomakina, E.I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F.H., Raman, S., Schöbi, D., Toussaint, B., Weber, L.A., Yao, Y., Stephan, K.E., 2021. TAPAS: An Open-Source Software Package for Translational Neuromodeling and Computational Psychiatry. Front. Psychiatry 12. https://doi.org/10.3389/fpsyt.2021.680811

      Gulban, O.F., Bollmann, S., Huber, R., Wagstyl, K., Goebel, R., Poser, B.A., Kay, K., Ivanov, D., 2021. Mesoscopic Quantification of Cortical Architecture in the Living Human Brain. https://doi.org/10.1101/2021.11.25.470023

      Haacke, E.M., Masaryk, T.J., Wielopolski, P.A., Zypman, F.R., Tkach, J.A., Amartur, S., Mitchell, J., Clampitt, M., Paschal, C., 1990. Optimizing blood vessel contrast in fast three-dimensional MRI. Magn. Reson. Med. 14, 202–221. https://doi.org/10.1002/mrm.1910140207

      Helthuis, J.H.G., van Doormaal, T.P.C., Hillen, B., Bleys, R.L.A.W., Harteveld, A.A., Hendrikse, J., van der Toorn, A., Brozici, M., Zwanenburg, J.J.M., van der Zwan, A., 2019. Branching Pattern of the Cerebral Arterial Tree. Anat Rec 302, 1434–1446. https://doi.org/10.1002/ar.23994

      Heverhagen, J.T., Bourekas, E., Sammet, S., Knopp, M.V., Schmalbrock, P., 2008. Time-of-Flight Magnetic Resonance Angiography at 7 Tesla. Investigative Radiology 43, 568–573. https://doi.org/10.1097/RLI.0b013e31817e9b2c

      Hirsch, S., Reichold, J., Schneider, M., Székely, G., Weber, B., 2012. Topology and Hemodynamics of the Cortical Cerebrovascular System. J Cereb Blood Flow Metab 32, 952–967. https://doi.org/10.1038/jcbfm.2012.39

      Horn, B.K.P., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17, 185–203. https://doi.org/10.1016/0004-3702(81)90024-2

      Huck, J., Wanner, Y., Fan, A.P., Jäger, A.-T., Grahl, S., Schneider, U., Villringer, A., Steele, C.J., Tardif, C.L., Bazin, P.-L., Gauthier, C.J., 2019. High resolution atlas of the venous brain vasculature from 7 T quantitative susceptibility maps. Brain Struct Funct 224, 2467–2485. https://doi.org/10.1007/s00429-019-01919-4

      Johst, S., Wrede, K.H., Ladd, M.E., Maderwald, S., 2012. Time-of-Flight Magnetic Resonance Angiography at 7 T Using Venous Saturation Pulses With Reduced Flip Angles. Investigative Radiology 47, 445–450. https://doi.org/10.1097/RLI.0b013e31824ef21f

      Kang, C.-K., Park, C.-A., Kim, K.-N., Hong, S.-M., Park, C.-W., Kim, Y.-B., Cho, Z.-H., 2010. Non-invasive visualization of basilar artery perforators with 7T MR angiography. Journal of Magnetic Resonance Imaging 32, 544–550. https://doi.org/10.1002/jmri.22250

      Kasper, L., Engel, M., Barmet, C., Haeberlin, M., Wilm, B.J., Dietrich, B.E., Schmid, T., Gross, S., Brunner, D.O., Stephan, K.E., Pruessmann, K.P., 2018. Rapid anatomical brain imaging using spiral acquisition and an expanded signal model. NeuroImage 168, 88–100. https://doi.org/10.1016/j.neuroimage.2017.07.062

      Klepaczko, A., Szczypiński, P., Deistung, A., Reichenbach, J.R., Materka, A., 2016. Simulation of MR angiography imaging for validation of cerebral arteries segmentation algorithms. Computer Methods and Programs in Biomedicine 137, 293–309. https://doi.org/10.1016/j.cmpb.2016.09.020

      Kobari, M., Gotoh, F., Fukuuchi, Y., Tanaka, K., Suzuki, N., Uematsu, D., 1984. Blood Flow Velocity in the Pial Arteries of Cats, with Particular Reference to the Vessel Diameter. J Cereb Blood Flow Metab 4, 110–114. https://doi.org/10.1038/jcbfm.1984.15

      Ladd, M.E., 2007. High-Field-Strength Magnetic Resonance: Potential and Limits. Top Magn Reson Imaging 18, 139–152. Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G., 2009. A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes. Medical Image Analysis 13, 819–845. https://doi.org/10.1016/j.media.2009.07.011

      Maderwald, S., Ladd, S.C., Gizewski, E.R., Kraff, O., Theysohn, J.M., Wicklow, K., Moenninghoff, C., Wanke, I., Ladd, M.E., Quick, H.H., 2008. To TOF or not to TOF: strategies for non-contrast-enhanced intracranial MRA at 7 T. Magn Reson Mater Phy 21, 159. https://doi.org/10.1007/s10334-007-0096-9

      Manjón, J.V., Coupé, P., Martí‐Bonmatí, L., Collins, D.L., Robles, M., 2010. Adaptive non-local means denoising of MR images with spatially varying noise levels. Journal of Magnetic Resonance Imaging 31, 192–203. https://doi.org/10.1002/jmri.22003

      Mansfield, P., Harvey, P.R., 1993. Limits to neural stimulation in echo-planar imaging. Magn. Reson. Med. 29, 746–758. https://doi.org/10.1002/mrm.1910290606

      Masaryk, T.J., Modic, M.T., Ross, J.S., Ruggieri, P.M., Laub, G.A., Lenz, G.W., Haacke, E.M., Selman, W.R., Wiznitzer, M., Harik, S.I., 1989. Intracranial circulation: preliminary clinical results with three-dimensional (volume) MR angiography. Radiology 171, 793–799. https://doi.org/10.1148/radiology.171.3.2717754

      Mattern, H., Sciarra, A., Godenschweger, F., Stucht, D., Lüsebrink, F., Rose, G., Speck, O., 2018. Prospective motion correction enables highest resolution time-of-flight angiography at 7T: Prospectively Motion-Corrected TOF Angiography at 7T. Magn. Reson. Med 80, 248–258. https://doi.org/10.1002/mrm.27033

      Mattern, H., Sciarra, A., Lüsebrink, F., Acosta‐Cabronero, J., Speck, O., 2019. Prospective motion correction improves high‐resolution quantitative susceptibility mapping at 7T. Magn. Reson. Med 81, 1605–1619. https://doi.org/10.1002/mrm.27509

      Mennes, M., Jenkinson, M., Valabregue, R., Buitelaar, J.K., Beckmann, C., Smith, S., 2014. Optimizing full-brain coverage in human brain MRI through population distributions of brain size. NeuroImage 98, 513–520. https://doi.org/10.1016/j.neuroimage.2014.04.030 Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of methods, datasets and evaluation metrics. Computer Methods and Programs in Biomedicine 158, 71–91. https://doi.org/10.1016/j.cmpb.2018.02.001

      Mustafa, M.A.R., 2016. A data-driven learning approach to image registration. Mut, F., Wright, S., Ascoli, G.A., Cebral, J.R., 2014. Morphometric, geographic, and territorial characterization of brain arterial trees. International Journal for Numerical Methods in Biomedical Engineering 30, 755–766. https://doi.org/10.1002/cnm.2627

      Nagaoka, T., Yoshida, A., 2006. Noninvasive Evaluation of Wall Shear Stress on Retinal Microcirculation in Humans. Invest. Ophthalmol. Vis. Sci. 47, 1113. https://doi.org/10.1167/iovs.05-0218

      Nishimura, D.G., Irarrazabal, P., Meyer, C.H., 1995. A Velocity k-Space Analysis of Flow Effects in Echo-Planar and Spiral Imaging. Magnetic Resonance in Medicine 33, 549–556. https://doi.org/10.1002/mrm.1910330414

      Nishimura, D.G., Jackson, J.I., Pauly, J.M., 1991. On the nature and reduction of the displacement artifact in flow images. Magnetic Resonance in Medicine 22, 481–492. https://doi.org/10.1002/mrm.1910220255

      Nonaka, H., Akima, M., Hatori, T., Nagayama, T., Zhang, Z., Ihara, F., 2003. Microvasculature of the human cerebral white matter: Arteries of the deep white matter. Neuropathology 23, 111–118. https://doi.org/10.1046/j.1440-1789.2003.00486.x

      North, D.O., 1963. An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proceedings of the IEEE 51, 1016–1027. https://doi.org/10.1109/PROC.1963.2383

      Park, C.S., Hartung, G., Alaraj, A., Du, X., Charbel, F.T., Linninger, A.A., 2020. Quantification of blood flow patterns in the cerebral arterial circulation of individual (human) subjects. Int J Numer Meth Biomed Engng 36. https://doi.org/10.1002/cnm.3288

      Parker, D.L., Goodrich, K.C., Roberts, J.A., Chapman, B.E., Jeong, E.-K., Kim, S.-E., Tsuruda, J.S., Katzman, G.L., 2003. The need for phase-encoding flow compensation in high-resolution intracranial magnetic resonance angiography. J. Magn. Reson. Imaging 18, 121–127. https://doi.org/10.1002/jmri.10322

      Parker, D.L., Yuan, C., Blatter, D.D., 1991. MR angiography by multiple thin slab 3D acquisition. Magn. Reson. Med. 17, 434–451. https://doi.org/10.1002/mrm.1910170215

      Pauling, L., Coryell, C.D., 1936. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proceedings of the National Academy of Sciences 22, 210–216. https://doi.org/10.1073/pnas.22.4.210

      Payne, S.J., 2017. Cerebral Blood Flow And Metabolism: A Quantitative Approach. World Scientific. Peters, A.M., Brookes, M.J., Hoogenraad, F.G., Gowland, P.A., Francis, S.T., Morris, P.G., Bowtell, R., 2007. T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging 25, 748–753. https://doi.org/10.1016/j.mri.2007.02.014

      Pfeifer, R.A., 1930. Grundlegende Untersuchungen für die Angioarchitektonik des menschlichen Gehirns. Berlin: Julius Springer. Phellan, R., Forkert, N.D., 2017. Comparison of vessel enhancement algorithms applied to time-of-flight MRA images for cerebrovascular segmentation. Medical Physics 44, 5901–5915. https://doi.org/10.1002/mp.12560

      Pohmann, R., Speck, O., Scheffler, K., 2016. Signal-to-Noise Ratio and MR Tissue Parameters in Human Brain Imaging at 3, 7, and 9.4 Tesla Using Current Receive Coil Arrays. Magn. Reson. Med. 75, 801–809. https://doi.org/10.1002/mrm.25677

      Reichenbach, J.R., Venkatesan, R., Schillinger, D.J., Kido, D.K., Haacke, E.M., 1997. Small vessels in the human brain: MR venography with deoxyhemoglobin as an intrinsic contrast agent. Radiology 204, 272–277. https://doi.org/10.1148/radiology.204.1.9205259 Schmid, F., Barrett, M.J.P., Jenny, P., Weber, B., 2019. Vascular density and distribution in neocortex. NeuroImage 197, 792–805. https://doi.org/10.1016/j.neuroimage.2017.06.046

      Schmitter, S., Bock, M., Johst, S., Auerbach, E.J., Uğurbil, K., Moortele, P.-F.V. de, 2012. Contrast enhancement in TOF cerebral angiography at 7 T using saturation and MT pulses under SAR constraints: Impact of VERSE and sparse pulses. Magnetic Resonance in Medicine 68, 188–197. https://doi.org/10.1002/mrm.23226

      Schulz, J., Boyacioglu, R., Norris, D.G., 2016. Multiband multislab 3D time-of-flight magnetic resonance angiography for reduced acquisition time and improved sensitivity. Magn Reson Med 75, 1662–8. https://doi.org/10.1002/mrm.25774

      Shu, C.Y., Sanganahalli, B.G., Coman, D., Herman, P., Hyder, F., 2016. New horizons in neurometabolic and neurovascular coupling from calibrated fMRI, in: Progress in Brain Research. Elsevier, pp. 99–122. https://doi.org/10.1016/bs.pbr.2016.02.003

      Stamm, A.C., Wright, C.L., Knopp, M.V., Schmalbrock, P., Heverhagen, J.T., 2013. Phase contrast and time-of-flight magnetic resonance angiography of the intracerebral arteries at 1.5, 3 and 7 T. Magnetic Resonance Imaging 31, 545–549. https://doi.org/10.1016/j.mri.2012.10.023

      Stewart, A.W., Robinson, S.D., O’Brien, K., Jin, J., Widhalm, G., Hangel, G., Walls, A., Goodwin, J., Eckstein, K., Tourell, M., Morgan, C., Narayanan, A., Barth, M., Bollmann, S., 2022. QSMxT: Robust masking and artifact reduction for quantitative susceptibility mapping. Magnetic Resonance in Medicine 87, 1289–1300. https://doi.org/10.1002/mrm.29048

      Stucht, D., Danishad, K.A., Schulze, P., Godenschweger, F., Zaitsev, M., Speck, O., 2015. Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction. PLoS ONE 10, e0133921. https://doi.org/10.1371/journal.pone.0133921

      Szikla, G., Bouvier, G., Hori, T., Petrov, V., 1977. Angiography of the Human Brain Cortex. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81145-6

      Triantafyllou, C., Polimeni, J.R., Wald, L.L., 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55, 597–606. https://doi.org/10.1016/j.neuroimage.2010.11.084

      Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging 29, 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

      Uludağ, K., Müller-Bierl, B., Uğurbil, K., 2009. An integrative model for neuronal activity-induced signal changes for gradient and spin echo functional imaging. NeuroImage 48, 150–165. https://doi.org/10.1016/j.neuroimage.2009.05.051

      Venkatesan, R., Haacke, E.M., 1997. Role of high resolution in magnetic resonance (MR) imaging: Applications to MR angiography, intracranial T1-weighted imaging, and image interpolation. International Journal of Imaging Systems and Technology 8, 529–543. https://doi.org/10.1002/(SICI)1098-1098(1997)8:6<529::AID-IMA5>3.0.CO;2-C

      von Morze, C., Xu, D., Purcell, D.D., Hess, C.P., Mukherjee, P., Saloner, D., Kelley, D.A.C., Vigneron, D.B., 2007. Intracranial time-of-flight MR angiography at 7T with comparison to 3T. J. Magn. Reson. Imaging 26, 900–904. https://doi.org/10.1002/jmri.21097

      Ward, P.G.D., Ferris, N.J., Raniga, P., Dowe, D.L., Ng, A.C.L., Barnes, D.G., Egan, G.F., 2018. Combining images and anatomical knowledge to improve automated vein segmentation in MRI. NeuroImage 165, 294–305. https://doi.org/10.1016/j.neuroimage.2017.10.049

      Wilms, G., Bosmans, H., Demaerel, Ph., Marchal, G., 2001. Magnetic resonance angiography of the intracranial vessels. European Journal of Radiology 38, 10–18. https://doi.org/10.1016/S0720-048X(01)00285-6

      Wright, S.N., Kochunov, P., Mut, F., Bergamino, M., Brown, K.M., Mazziotta, J.C., Toga, A.W., Cebral, J.R., Ascoli, G.A., 2013. Digital reconstruction and morphometric analysis of human brain arterial vasculature from magnetic resonance angiography. NeuroImage 82, 170–181. https://doi.org/10.1016/j.neuroimage.2013.05.089

      Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015

      Zhang, Z., Deng, X., Weng, D., An, J., Zuo, Z., Wang, B., Wei, N., Zhao, J., Xue, R., 2015. Segmented TOF at 7T MRI: Technique and clinical applications. Magnetic Resonance Imaging 33, 1043–1050. https://doi.org/10.1016/j.mri.2015.07.002

      Zhao, J.M., Clingman, C.S., Närväinen, M.J., Kauppinen, R.A., van Zijl, P.C.M., 2007. Oxygenation and hematocrit dependence of transverse relaxation rates of blood at 3T. Magn. Reson. Med. 58, 592–597. https://doi.org/10.1002/mrm.21342

      Zhu, X., Tomanek, B., Sharp, J., 2013. A pixel is an artifact: On the necessity of zero-filling in fourier imaging. Concepts Magn. Reson. 42A, 32–44. https://doi.org/10.1002/cmr.a.21256

    1. Author Response

      Reviewer #2 (Public Review):

      I have only one concern with the study. I am not fully convinced that the disruption of behavioral updating is specifically due to NA signaling within OFC. In the first two studies, they observed non-specific anatomical effect likely due to the ablation of fibers of passage through OFC. The DREADD experiment is claimed to allay this concern. However, the DCZ was injected systemically. This means that any collaterals of LC NA neurons outside OFC will also be suppressed. While the lack of effect with the mPFC projection is interesting, this does not preclude an effect mediated in other target regions. Overall, I believe that none of the experiments truly demonstrate a specific effect of NA in OFC. A few experimental options that could be considered are injection of DCZ directly in OFC, optogenetic inhibition of fibers in OFC, or pharmacological disruption of NA signaling in OFC.

      The other options are to measure the effect of the toxin ablations from experiments 1 and 2 not just in mPFC but in other regions. If the non-specific effect is truly only in mPFC outside of OFC, that would lead to more confidence that mPFC projection is the only other viable pathway mediating the effect.

      As requested, we have quantified the effect of toxin ablations in neighbouring cortical regions known to be involved in the goal directed behavior, namely the insular cortex (IC, e.g., Balleine & Dickinson, 2000; Parkes & Balleine, 2013) the medial orbitofrontal cortex (MO, e.g., Bradfield et al., 2015; Gourley et al., 2016) and secondary motor cortex (M2, Gremel et al., 2016). Briefly, we found that injection of the saporin toxin in the VO and LO (Experiment 1) led to a significant decrease in NA fiber density in all examined regions. Injection of 6-OHDA also produced significant loss of NA fibres in MO and M2 but not insular cortex. These results are presented in Suppl. Figures 1 and 3 (pages 28 and 30) and the statistics are reported in the main text (page 6 and page 11)

      We have also added the following to our discussion on the reason for the off-target depletions that we observed and acknowledged the potential role of collateral LC neurons:

      Page 21, line starting 374: “The use of the saporin toxin led to a dramatic decrease of NA fiber density in all analysed cortical areas (Suppl Fig 1). This may be due to diffusion of the toxin from the injection site, the existence of collateral LC neurons and/or fibers passing through the ventral portion of the OFC but targeting other cortical areas (Cerpa et al 2019). However, injection of 6OHDA led to much less offsite NA depletion suggesting that a large part of the previous observation is toxin-specific. Indeed, no significant loss of NA fibers was visible in the insular cortex, which has been previously implicated in goal-directed behaviour (Balleine & Dickinson, 2000; Parkes et al., 2013; 2015; 2017). We did nevertheless observe an offsite depletion in more proximal prefrontal areas (prelimbic and medial orbitofrontal cortices) albeit a more modest depletion that what was observed using the saporin toxin. Several studies have described the projection pattern of LC cells. These studies, using various techniques, indicate that LC cells mainly target a single region, and that only a small proportion of LC neurons collateralize to minor targets (Plummer et al., 2020, Kebschull et al 2016, Uematsu et al 2017, Chandler et al 2014). Therefore, even if the OFC noradrenergic innervation is presumably specific (Chandler et al 2013), we cannot rule out a possible collateralization of some neurons toward neighbouring prefrontal areas (PL and MO). We have previously discussed that the posterior ventral portion of the OFC is an entry point for LC fibers en passant, which ultimately target other prefrontal areas (Cerpa et al 2019).

      To achieve a greater anatomical selectivity we used a CAV-2 vector carrying the noradrenergic promoter PRS to target either the LC:A32 or the LC:OFC pathways (Hayat et al., 2020; Hirschberg et al., 2017). It has been shown that the CAV-2 vector can infect axons-of-passage, however the vector does not spread more than 200 µm from the injection site (Schwarz et al 2015). Therefore, when targeting the OFC we injected anteriorly to the level where the highest density of fibers of passage is expected (Cerpa et al 2019) in order to minimize infection of such fibers and restrict inhibition to our pathway of interest.

      Overall, the current behavioural results are in line with our previous work showing that the ability to associate new outcomes to previously acquired actions is impaired following chemogenetic inhibition of the VO and LO (Parkes et al., 2018) or disconnection of the VO and LO from the submedius thalamic nucleus (Fresno et al 2019). These results point to a necessary role of the ventral and lateral parts of the OFC and its noradrenergic innervation for updating A-O associations. However, it is worth mentioning that different subregions of the OFC, both along the medio-lateral and antero-posterior axes of OFC, display clear functional heterogeneities (Dalton et al 2016, Izquierdo 2017, Panayi & Killcross, 2018, Bradfield et al 2018, Barreiros et al 2021). Therefore, while we have previously focused on the anatomical heterogeneity of the noradrenergic innervation in these prefrontal subregions (Cerpa et al 2019), a thorough characterization of its functional role in each of these subregions still needs to be addressed.”

      One last concern is that the lack of the effect due to disruption of the mPFC projection is not guaranteed to not be from experimental issues. If the authors have some evidence that the mPFC projection disruption produced some other behavioral effect, that would make the lack of effect in this case more convincing.

      Unfortunately, we do not provide evidence in the current paper that disrupting the LC:mPFC (now termed LC:A32 in the current study, based on the recommendation of reviewer 1) projection produces some other behavioural effect. However, in an on-going series of experiments, using the same tools as the current study, we found that inhibiting the LC:A32, but not LC:OFC, pathway impairs Pavlovian contingency degradation as shown in the figure below. We therefore believe that the failure of LC:mPFC pathway inhibition to effect outcome identity reversal in the present study is not due to experimental issues. Please note that in the figure below mPFC is referred to as area 32 (A32), as requested by reviewer 1.

      Figure 1. A) Experimental timeline for the Pavlovian contingency degradation procedure. Prior to behavioural training, rats were injected with CAV2-PRS-hM4D-mCherry into either the vlOFC or area 32 (A32). Number of food port entries during the non-degraded CS and degraded CS for rats injected with vehicle and rats injected with DCZ during degradation training (B, D) and the test in extinction (C, E). Inhibition of the LC:vlOFC had no effect on Pavlovian contingency degradation, whereas inhibition of LC:A32 during degradation training rendered rats insensitive to the change in the causal relationship between the CS and the US.

      Reviewer #3 (Public Review):

      I would be curious about the authors' thoughts regarding the recent Duan ... Robbins Neuron paper (https://pubmed.ncbi.nlm.nih.gov/34171290/), in which marmosets displayed paradoxical responses to VLO inactivation and stimulation in contingency degradation tasks. Are there ways to reconcile these reports?

      We previously argued that the updating processes underlying changes in causal contingency versus outcome identity may be supported by different prefrontal regions (Cerpa et al., 2021, Behav Neurosci). Unfortunately, the tasks used in the current study do not allow us to test if our rats are sensitive to changes in the action-outcome contingency. In fact, the effect of inactivation (or overactivation) of the ventral and lateral regions of OFC on an instrumental contingency degradation task similar to that used in Duan et al (2022) has not yet been examined in rats.

      Indeed, while it is stated in Duan et al (2022) that rats with lesions of lateral OFC are insensitive to contingency degradation, none of the citations provided support this conclusion (Balleine & Dickinson, 1998; Corbit & Balleine, 2003; Ostlund and Balleine, 2007; Yin et al., 2005). Balleine and Dickinson (1998) assessed the effect of prelimbic and insular cortex lesions (insular anteroposterior coordinate +1.2), with only the former affecting instrumental contingency degradation. Ostlund and Balleine (2007) assessed the effect of orbitofrontal lesions on Pavlovian contingency degradation (degradation of the S-O contingency) not instrumental contingency degradation. Finally, Corbit and Balleine (2003) and Yin et al (2005) assessed the effect of prelimbic and dorsomedial striatum lesions, respectively. Nevertheless, there are some reports on the effect of chemogenetic inhibition of VO/LO on degradation in a nose-poke response task but the results are conflicting (e.g., Whyte et al., 2019; Zimmerman et al., 2017; 2018). It would be very interesting to study the impact of both inactivation and overactivation of VO and LO in rats to compare with the results found in marmosets, using comparable tasks.

      We have added the following to our discussion, which cites Duan et al (2022) and the need to better understand the role of VO and LO in contingency degradation.

      Page 24, line starting 450: “However, it is not yet clear if the NA-OFC system is also involved in detecting the causal relationship between an action and its outcome (see Cerpa et al., 2021 for a discussion). Some have reported impaired adaptation to contingency changes following inhibition of VO and LO or BDNF-knockdown in these regions (Whyte et al., 2019; Zimmerman et al., 2017), while another study shows that inhibition of VO/LO leaves sensitivity to degradation intact, at least during an initial test (Zimmerman et al., 2018). Interestingly, a recent paper in marmosets demonstrates that inactivation of anterior OFC (area 11) improves instrumental contingency degradation, whereas overactivation impairs degradation (Duan et al., 2022). The potential role of the rodent ventral and lateral regions of OFC, and the NA innervation of OFC, in adapting to degradation of instrumental contingencies requires further investigation.”

    1. Author Response:

      Reviewer #1 (Public Review):

      There is growing precedent for the utility of GWAS-type analyses in elucidating otherwise cryptic genotypic associations with specific Mtb phenotypes, most commonly drug resistance. This study represents the latest instalment of this type of approach, utilizing a large set of WGS data from clinical Mtb isolates and refining the search for DR-associated alleles by restricting the set to those predicted (or known) to be phenotypically DR. This revealed a number of potential candidate mutations, including some in nucleotide excision repair (uvrA, uvrB), in base excision repair (mutY), and homologous recombination (recF). In validating these leads functional assays, the authors present evidence supporting the impact of the identified mutations on antibiotic susceptibility in vitro and in macrophage and animal infection models. These results extend the number of candidate mutations associated with Mtb drug resistance, however the following must be considered:

      (i) The GWAS analysis is the basis of this study, yet the description of the approach used and presentation of results obtained is occasionally obscure; for example, the authors report the use of known drug resistance phenotypes (where available) or inferences of drug-resistance from genotypic data to enhance the potential to identify other mutations that might be implicated in enabling the DR mutations, yet their list of known DR mutations seem to be predominantly rare or unusual mutations, not those commonly associated with clinical DR-TB. In addition, the distribution of the identified resistance-associated mutations across the different lineages need to be explained more clearly.

      In the revised manuscript, we have performed the phylogenetic analysis of the strains used. A phylogenetic tree was generated using Mycobacterium canetti as an outgroup (Figure 1b). The phylogeny analysis suggests the clustering of the strains in lineage 1, 2, 3, and 4. Lineages 2,3 and 4 are clustering together, and lineage 1 is monophyletic, as reported previously. The genome sequence data of 2773 clinical strains were downloaded from NCBI. These strains were also part of the GWAS analysis performed by Coll et al (https://pubmed.ncbi.nlm.nih.gov/29358649/) and Manson et al. (https://pubmed.ncbi.nlm.nih.gov/28092681/). The phenotype of the strains used for the association analysis was reported in the previous studies. We have not performed other predictions. The supplementary table provides the lineage origin of each strain used in the study (Supplementary File 1 & 2). The distributions of resistance-associated mutations in different strains is shown (Figure 2-figure supplement 6a-h). As suggested, we have performed an analysis wherein we looked for the direct target mutations that harbor mutations in the DNA repair genes (Figure 2-figure supplement 6i-k).

      We identified mostly the rare mutations due to the following reasons;

      1. We looked for the mutations that were present only in the multidrug resistant strains as compared to the susceptible strains for association mapping. This strategy exclusively gave most variants associated with multidrug resistant phenotype.

      2. We have used Mixed Linear Model (MLM) for association analysis. MLM removes all the population-specific SNPs based on PCA and kinship corrections. The false discovery rate (FDR) adjusted p-values in the GAPIT software are stringent as it corrects the effects of each marker based on the population structure (Q) as well as kinship (K) values. Therefore the probability of identifying the false-positive SNP is very low. We combined it with the Bonferroni corrections to identify markers associated with the drug resistant phenotype.

      (ii) By combining target gene deletions with different complementation alleles, the authors provide compelling microbiological evidence supporting the inferred role of the mutY and uvrB mutations in enhanced survival under antibiotic treatment. The experimental work, however, is limited to assessments of competitive survival in various models, with/without antibiotic selection, or to mutant frequency analyses; there is no direct evidence provided in support of the proposed mechanism.

      To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro), was mixed in equal proportions before library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-Figure Supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      (iii) The low drug concentrations used (especially of rifampicin against M. smegmatis) suggest the identified mutations confer low-level resistance to multiple antimycobacterial agents - in turn implying tolerance rather than resistance. If correct, it would be interesting to know how broadly tolerant strains containing these mutations are; that is, whether susceptibility is decreased to a broad range of antibiotics with different mechanisms of action (including both cidal and static agents), and whether the extent of the decrease be determined quantitatively (for example, as change in MIC value).

      To evaluate the effect of different drugs on the survival of RvDmutY or RvDmutY::mutYR262Q, we performed killing kinetics in the presence and absence of isoniazid, rifampicin, ciprofloxacin, and ethambutol (Figure 4a). In the absence of antibiotics, the growth kinetics of Rv, RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q were similar (Figure 4b). In the presence of isoniazid, ~2 log-fold decreases in bacterial survival was observed on day 3 in Rv and RvDmutY:mutY; however, in RvDmutY and RvDmutY::mutY-R262Q, the difference was limited to ~1.5 log-fold (Figure 4c). A similar trend was apparent on days 6 and 9, suggesting a ~5-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4c). Interestingly, in the presence of ethambutol, we did not observe any significant difference (Figure 4d). In the presence of rifampicin and ciprofloxacin, we observed a ~10-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4e-f). Thus results suggest that the absence of mutY or the presence of mutY variant aids in subverting the antibiotic stress.

      Reviewer #2 (Public Review):

      This interesting manuscript uses a collection of whole genome sequences of TB isolates to associate specific sequence polymorphisms with MDR/XDR strains, and having found certain mutations in DNA repair pathways, does a detailed analysis of several mutations. The evaluation of the MutY polymorphism reveals it is loss of function and TB strains carrying this mutation have a higher mutation frequency and enhanced survival in serial passage in macrophages. The strengths of the manuscript are the leveraging of a large sequence dataset to derive interesting candidate mutations in DNA repair pathway and the demonstration that at least one of these mutations has a detectable effect on mutagenicity and pathogenesis. The weaknesses of the manuscript are a lack of experimental exploration of the mechanism by which loss of a DNA repair pathway would enhance survival in vivo. The model presented is that these phenotypes are due to hypermutagenicity and thereby evolution of enhanced pathogenesis, but this is not actually directly tested or investigated. There are also some technical concerns for some of the experimental data which can be strengthened.

      This paper presents the following data:

      • Analyzed whole-genome sequences 2773 clinical strains: 160 000 SNPs identified
      • 1815 drug-susceptible/422 MDR/XDR strains: 188 mutations correlated with Drug resistance.
      • Novel mutations associated with the drug resistance have been found in base excision repair (BER), nucleotide excision repair (NER), and homologous recombination (HR) pathway genes (mutY, uvrA, uvrB, and recF).
      • Specific mutations mutY-R262Q and uvrB-A524V were studied.
      • mutY-R262Q and uvrB-A524V mutations behave as loss of function alleles in vivo, as measured by non-complementation of the increased mutation frequency measured by resistance to Rif and INH.
      • The mutY deletion and the mutY-R262Q mutation increase Mtb survival over WT in macrophages when Mtb has not been submitted to previous rounds of macrophage infection.
      • This advantage is exacerbated in presence of antibiotic (Rif and Cipro but not INH).
      • The MutY deletion and the MutY-R262Q mutation result in an enhanced survival of Mtb during guinea pig infection.

      Major issues:

      The finding that mutations in MutY confers an advantage during macrophage infection is convincing based on the macrophage experiments, but it is premature to conclude that the mechanism of this effect is due to hypermutagenesis and selection of fitter bacterial clones. It is described in E. coli (Foti et al., 2012) and recently in mycobacteria (Dupuy et al., 2020) that the MutY/MutM excision pathways can increase the lethality of antibiotic treatment because of double-strand breaks caused by Adenine/oxoG excisions. The higher survival of the mutY mutant during antibiotic treatment could more be due to lower Adenine/oxoG excision in the mutant rather than acquisition of advantageous mutations, or some other mechanism. The same hypothesis cannot be excluded for the Guinea pig experiments (no antibiotics, but oxidative stress mediated by host defenses could also increase oxoG) and should at least be discussed. Experiments that would support the idea that the in vivo advantage is due to hypermutagenesis would be whole genome sequencing of the output vs input populations to directly document increased mutagenesis. Similarly, is the ΔmutY survival advantage after rounds of macrophage infections dependent on macrophage environment? What happens if the ΔmutY strain is cultivated in vitro in 7H9 (same number of generations) before infecting macrophages?

      We thank the reviewer for the insightful comments. To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. For the analysis, only those SNPs that were present in >20% of reads were retained. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotic, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-figure supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      • It would be useful to present more data about the strain relatedness and genome characteristics of the DNA repair mutant strains in the GWAS. For example, the model would suggest that strains carrying DNA repair mutations should have higher SNP load than control strains. Additionally, it would be helpful to know whether the identified DNA repair pathway mutations are from epidemiologically linked strains in the collection to deduce whether these events are arising repeatedly or are a founder effect of a single mutant since for each mutation, the number of strains is small.

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be similar. The marginal difference, if any were not statistically significant.

      We agree with the reviewer that these strains might be epidemiologically linked. In the present study, all the strains harboring mutation in mutY belong to lineage 4. We observed that all the mutY mutationcontaining strains were either MDR or pre-XDR compared with drug susceptible strains of the same clade.

      • Some of the mutation frequency, survival and competition data could be strengthened by more experimental replicates. Data Lines 370-372 (mutation frequency), lines 387-388 (Survival of strains ex vivo), line 394 (competition experiment) : "Two biologically independent experiments were performed. Each experiment was performed in technical triplicates. Data represent one of the two biological experiments." Two biological replicates is insufficient for the phenotypes presented and all replicates should be included in the analysis. In addition, the definition of "technical triplicates" should be given, does this mean the same culture sampled in triplicate?

      We thank the reviewer for the comment. We performed at least two independent experiments with biological triplicates (not technical triplicates). We apologize for writing this incorrectly. We have reported data from one independent experiment consisting of at least biological triplicates. For mutation rate analysis, we have performed experiment using six independent colonies. These points are mentioned in the methods and legends of the revised manuscript.

      • MutY phenotypes. One caveat to the conclusion that the MutY R262Q mutant is nonfunctional is the lack of examination of the expression of the complementing protein. I would be informative to comment on the location of this mutation in relation to the known structures of MutY proteins. Similarly, for the UvrB polymorphism, this null strain has a clear UV sensitivity phenotype in the literature, so a fuller interrogation for UV killing would be informative re: the A524V mutation.

      We have now included the western blot data on both complementation strains (Figure 3-figure supplement 1). We agree with the reviewer that the uvrB null mutant may have UV sensitivity phenotype, but we have not performed the experiment in the present study.

      Reviewer #3 (Public Review):

      STRENGTHS

      • This ambitious study is broad in scope, beginning with a bacterial GWAS study and extending all the way to in vivo guinea pig infection models.

      • Numerous reports have attempted to identify Mtb strains with elevated mutation rates, and the results are conflicting. The present study sets out to thoroughly evaluate one such mutation that may produce a mutator phenotype, mutY-Arg262Gln.

      WEAKNESSES

      • While the authors follow-up experiments with the mutY-Arg262Gln allele are all consistent with the conclusion that this mutation elevates the mutation rate in Mtb and thus could promote the evolution of drug resistance, further work is needed to unambiguously demonstrate this link.

      • The authors highlight five mutations in genes associated with DNA replication and or repair from their GWAS analysis:

      o dnaA-Arg233Gln: as the authors note in the Discussion, Hicks et al. associate SNPs in dnaA with low-level isoniazid resistance, as a result of lowered katG expression. Since this is unrelated to their focus on DNA repair genes whose mutation could elevate mutation rates, I would consider removing this allele from the Table.

      As suggested, we have removed the dnaA from Table 3.

      o mutY-Arg262Gln: querying publicly available whole genome sequences of clinical Mtb isolates, this SNP appears to be restricted to lineage 4.3 (L4.3). All of these L4.3 strains appear to be drug-resistant. How many times did the mutY-Arg262Gln mutation evolve in the authors dataset? If there is evidence of homoplastic evolution, this would strengthen their case. If not, it doesn't mean the authors findings are incorrect, but does elevate that risk that this mutation could be a passenger (i.e. not driver) mutation. To address this, the authors could attempt to date when the mutY-Arg262Gln arose. If it was before the evolution of drug-resistance conferring alleles in these L4.3 strains, that is consistent with (but not proof of) a driver mutation. If mutY-Arg262Gln arose after, this is much more consistent with a passenger mutation.

      As pointed out by the reviewer, the mutY-Arg262Gln mutation is restricted to lineage 4. We have checked the mutY gene sequence from the strains harboring mutY Arg262Gln mutation and sensitive strains of the same clade. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that could be used for performing molecular clock analysis. To ascertain whether it is a passenger or a driver mutation, we have performed multiple experiments that suggest that identified mutation aids in the acquisition of drug resistance.

      o uvrB-Ala524Val: curiously we don't see this SNP in our dataset of publicly available whole genome sequences of clinical Mtb isolates (~45,000 genomes).

      We have rechecked this SNP in our dataset. This SNP was present in 87 drug-resistant strains that belong to lineage 2.

      o uvrA-Gln135Lys: this SNP also appears to be restricted to lineage 4.3. Same question as for mutY-Arg262Gln.

      As pointed out by the reviewer, uvrA-Gln135lys mutation is restricted to lineage 4. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that can be used for performing molecular clock analysis

      o recF-Gly269Gly: this is a very common mutation, is it unique to lineage 2.2.1? Same question as for mutY-Arg262Gln.

      RecF-Gly269Gly mutation was present in the lineage 2 strains. Here also, we identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation could be used for performing molecular clock analysis.

      • The CRYPTIC consortium recently published a number of preprints on biorxiv detailing very large GWAS studies in Mtb. Did any of these reports also associate drug resistance with mutY? If yes, this should be stated. If not, the potential reasons for this discrepancy should be discussed.

      We have checked the recently published CRYPTIC consortium article (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001721#sec012) for mutY-Arg262Gln. We did not find the mutY-Arg262Gln mutation in their analysis; this is due to the different strains used in the study. However, we identified recF Gly269Gly mutation in their datase

      • Based on the authors follow-up studies in vivo, MutY-Arg262Gln is presumed to be a loss-of-function allele. If the authors could convincingly demonstrate this biochemically with recombinant proteins, this would significantly strengthen their case.

      Experiments performed in Msm and Mtb mutant strains suggest that MutY variant is a loss-of-function allele. We have not performed in vitro assays to confirm the same.

      • If the authors are correct and mutY-Arg262Gln strains have elevated mutation rates, presumably there would be evidence of this in the clinical strain sequencing data. Do mutY-Arg262Gln containing strains have elevated C→G or C→A mutations in their genomes? Presumably such strains would also have a higher number of SNPs than closely related strains WT for mutY- is this the case?

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be higher. We have also looked for the CàT and CàG mutations in the same strains. CàT mutations are higher in the strains harboring mutY variant compared with the susceptible strains (Figure 2-figure supplement 6 l). However, we could not perform statistical analysis as the number of strains that harbor mutY variant is limited to 8. Thus data suggest that empirically the strains harboring mutY variant show higher SNPs elsewhere and CàT mutations. We are not stating these conclusions strongly in the manuscript as the data is not statistically significant

      • While more work, mutation rates as measured by Luria-Delbruck fluctuation analysis are more accurate than mutation frequencies. I would recommend repeating key experiments by Luria-Delbruck fluctuation analysis. It is also important to report both drug-resistant colony counts and total CFU in these sorts of experiments. Given the clumpy nature of mycobacteria, mutation rates can appear to be artificially elevated due to low total CFU and not an increase in the number of drug-resistant colonies.

      As suggested, we determined the mutation rate in the presence of isoniazid, rifampicin, and ciprofloxacin (Figure 3g-j). The fold increase in the mutation rate relative to Rv for RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q was 2.90, 0.76, and 3.0 in the presence of isoniazid and 5.62, 1.13, and 5.10 or 9.14, 1.57, and 8.71 in the presence of rifampicin and ciprofloxacin respectively (Figure 3).

      • Figure 4 would appear to measuring drug tolerance not resistance? Are the elevated CFU in the presence of drugs in the mutY-Arg262Gln strain due to an increase in the number of drug resistant strains or drug sensitive strains? This could be assessed by quantifying resulting CFU in the presence or absence the indicated drugs.

      To ascertain better survival is due to the acquisition of mutations in the direct target of antibiotics or drug tolerance. We performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY::mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY-R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY::mutY (Figure 2-figure supplement 6). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

    1. Author Response

      Reviewer #1 (Public Review):

      The data support the claims, and the manuscript does not have significant weaknesses in its present form. Key strengths of the paper include using a creative HR-based reporter system combining different inducible DSB positions along a chromosome arm and testing plasmid-based and chromosomal donor sequences. Combining that system with the visualization of specific chromosomal sites via microscopy is powerful. Overall, this work will constitute a timely and helpful contribution to the field of DSB/genome mobility in DNA repair, especially in yeast, and may inform similar mechanisms in other organisms. Importantly, this study also reconciles some of the apparent contradictions in the field.

      We thank the reviewer for these positive comments on the quality of the THRIV system, in helping us to understand global mobility and to reconcile the different studies in the field. The possibility that these mobilities also exist in other organisms is attractive because they could be a way to anticipate the position of the damage in the genome and its possible outcome.

      Reviewer #2 (Public Review):

      The authors are clarifying the role of global mobility in homologous recombination (HR). Global mobility is positively correlated with recombinant product formation in some reports. However, some studies argue the contrary and report that global mobility is not essential for HR. To characterize the role of global chromatin mobility during HR, the authors set up a system in haploid yeast cells that allows simultaneously tracking of HR at the single-cell level and allows the analysis of different positions of the DSB induction. By moving the position of the DSB within their system, the authors postulate that the chromosomal conformation surrounding a DNA break affects the global mobility response. Finally, the authors assessed the contributions of H2A(X) phosphorylation, checkpoint progression and Rad51 in the mobility response.

      One of the strengths of the manuscript is the development of "THRIV" as an efficient method for tracking homologous recombination in vivo. The authors take advantage of the power of yeast genetics and use gene deletions and as well as mutations to test the contribution of H2A(X) phosphorylation, checkpoint progression and Rad51 to the mobility response in their THRIV system.

      A major weakness in the manuscript is the lack of a marker to indicate that DSB formation has occurred (or is occurring)? Although at 6 hours there is 80% I-SceI cutting, around 20% of the cells are uncut and cannot be distinguished from the ones that are cut (or have already been repaired). Thus, the MSD analysis is done in the blind with respect to cells actually undergoing DSB repair.

      The authors clearly outlined their aims and have substantial evidence to support their conclusions. They discovered new features of global mobility that may clear up some of the controversies in the field. They overinterpreted some of their observations, but these criticisms can be easily addressed.

      The authors addressed conflicting results concerning the importance of global mobility to HR and their results aid in reconciling some of the controversies in the field. A key strength of this manuscript is the analysis of global mobility in response to breaks at different locations within chromosomes? They identified two types of DSB-induced global chromatin mobility involved in HR and postulate that they differ based on the position of the DSB. For example, DSBs close to the centromere exhibit increased global mobility that is not essential for repair and depends solely on H2A(X) phosphorylation. However, if the DSB is far away from the centromere, then global mobility is essential for HR and is dependent on H2A(X) phosphorylation, checkpoint progression as well as the Rad51 recombinase.

      The Bloom lab had previously identified differences in mobility based on the position of the tracked site. However, in the study reported here, the mobility response is analyzed after inducing DSBs located at different positions along the chromosome.

      They also addressed the question of the importance of the Rad51 protein in increased global mobility in haploid cells. Previous studies used DNA damaging agents that induce DSBs randomly throughout the genome, where it would have been rare to induce DSBs near the centromere. In the studies reported in this manuscript, they find no increase in global mobility in a rad51∆ background for breaks induced near the centromere (proximal), but find that breaks induced near the telomeres (distal), are dependent on both gamma-H2A(X) spreading and the Rad51 recombinase.

      We thank the referee for his constructive comments on the strength of our system to accurately determine the impact of a DSB according to its position in the genome. Concerning the issue of damaged cells that were not detected, it is a very important and exciting issue because it confronts our data with the question of biological heterogeneity. We provide evidence on the consistency of our findings despite the lack of detection of undamaged cells.

      Reviewer #3 (Public Review):

      In this study, Garcia Fernandez et al. employ a variety of genetic constructs to define the mechanism underlying the global chromatin mobility elicited in response to a single DNA double-strand break (DSB). Such local and global chromatin mobility increases have been described a decade ago by the Gasser and Rothstein laboratories, and a number of determinants have been identified: one epistasis group results in H2A-S129 phosphorylation via Rad9 and Mec1 activation. The mechanism is thought to be due to chromatin rigidification (Herbert 2017; Miné-Hattab 2017) or general eviction of histones (Cheblal 2020). More enigmatic, global chromatin mobility increase also depends on Rad51, a central recombination protein downstream of checkpoint activation (Smith & Rothstein 2017), which is also required for local DSB mobility (Dion .. Gasser 2012). The authors set out to address this difficulty in the field.

      A premise of their study is the convergence of two types of observations: First, the H2A phosphorylation ChIP profile matches that of Rad51, with both spreading in trans on other chromosomes at the level of centromeres when a DSB occurs in the vicinity of one of them (Renkawitz 2014). Second, global mobility depends on H2A phosphorylation and on Rad51 (their previous study Herbert 2017). They thus address whether the Rad51-ssDNA filament (and associated proteins) marks the chromatin engaged during the homology search. They found that the extent of the mobility depends on the residency time of the filament in a particular genomic and nuclear region, which can be induced at an initially distant trans site by providing a region of homology. Unfortunately, these findings are not clearly apparent from the title and the abstract, and in fact somewhat misrepresented in the manuscript, which would call for a rewrite (see points below).

      The main goal of our study was to understand the role of global mobility in the repair by homologous recombination, depending on the location of the damage. We found distinct global mobility mechanisms, in particular in the involvement of the Rad51 nucleofilament, depending on whether the DSB was pericentromeric or not. It is thus likely that when the DSB is far from the pericentromere, the residence time of the Rad51 nucleofilament with the donor has an impact on global mobility. Thus, if our experiments were not designed to answer directly the question of the residence time of the nucleofilament, we now discuss in more detail the causes and consequences of the global mobility.

      To this end, they induce the formation of a site-specific DSB in either of two regions: a centromere-proximal region and a telomere-proximal region, and measure the mobility of an undamaged site near the centromere on another chromosome (with a LacO-LacI-GFP system). This system reveals that only the centromere-proximal DSB induces the mobility of the centromere-proximal undamaged site, in a Rad9- and Rad51-independent manner. Providing a homologous donor in the vicinity of the LacO array (albeit in trans) restores its mobility when the DSB is located in a subtelomeric region, in a Rad9- and Rad51-dependent fashion. These genetic requirements are the same as those described for local DSB mobility (Dion & Gasser 2012), drawing a link between the two types of mobility, which to my knowledge was not described. The authors should focus their message (too scattered in the current manuscript), on these key findings and the diffusive "painting" model, in which the canvas is H2A, the moving paintbrush Mec1, and the hand the Rad51-ssDNA filament whose movement depends on Rad9. In the absence of Rad51-Rad9 the hand stays still, only decorating H2A in its immediate environment. The amount of paint deposited depends on the residency time of the Rad51-ssDNA-Mec1 filament in a given nuclear region. This synthesis is in agreement with the data presented and contrasts with their proposal that "two types of global mobility" exist.

      The brush model is very useful in explaining the distal mobility, which indeed is linked to local mobility genetic requirements, but it is also helpful to think of different model than the brush model when pericentromeric damage occurs. To stay in the terms of painting technique, this model would be similar to the pouring technique, when oil paint is deposited on water and spreads in a multidirectional manner. It is likely that Mec1 or Tel1 are the factors responsible for this spreading pattern. We therefore propose to maintain the notion of two distinct types of mobilities. Without going into pictorial techniques in the text, we have attempted to clarify these two models in the manuscript.

      The rest of the manuscript attempts to define a role in DSB repair of this phosphor-H2A-dependent mobility, using a fluorescence recovery assay upon DSB repair. They correlate a defect in the centromere-proximal mobility (in the rad9 or h2a-s129a mutant) when a DSB is distantly induced in the subtelomere with a defect in repairing the DSB. Repair efficiency is not affected by these mutations when the donor is located initially close to the DSB site. This part is less convincing, as repair failure specifically at a distant donor in the rad9 and H2A-S129A mutants may result from other defects relating to chromatin than its mobility (i.e. affecting homology sampling, DNA strand invasion, D-loop extension, D-loop disruption, etc), which could be partially alleviated by repeated DSB-donor encounters when the two are spatially close. In fact, suggesting that undamaged site mobility is required for the early step of the homology search directly contradicts the fact that the centromere-proximal mobility induced by a subtelomeric DSB depends on the presence of a donor near the centromere: mobility is thus a product of homology identification and increased Rad51-ssDNA filament residency in the vicinity of the centromere, and so downstream of homology search. This is a major pitfall in their interpretation and model.

      We thank the referee for helping to clarify the question of the cause and consequence of global mobility. As he pointed out, the fact that a donor is required to observe both H2A phosphorylation and distal mobility implicates the recombination process itself, as well as the residence time of the Rad51 nucleofilament, in the ƴ--‐H2A(X) spreading and indicates that recombination would be the cause of distal mobility. In contrast, the fact that proximal mobility can exist independently of homologous recombination suggests that in this particular configuration, HR would then be a consequence of proximal mobility.

      In conclusion, I think the data presented are of importance, as they identify a link between local and global chromatin mobility. The authors should rewrite their manuscript and reorganize the figures to focus on the painter model that their data support. I propose experiments that will help bolster the manuscript conclusions.

      1) Attempt dual-color tracking of the DSB (i.e. Rad52-mCherry or Ddc1-mCherry) and the donor site, and track MSD as a function of proximity between the DSB and the Lac array (with DSB +/-dCen). The expectation is that only upon contact (or after getting in close range) should the MSD at the centromere-proximal LacO array increase with a DSB at a subtelomere. Furthermore, this approach will help distinguish MSDs in cells bearing a DSB (Rad52 foci) from undamaged ones (no Rad52 foci)(see Mine-Hattab & Rothstein 2012). This would help overcome the inefficient DSB induction of their system (less than 50% at 1 hr post-galactose addition, and reaching 80% at 6 hr). For the reader to have a better appreciation of the data distribution, replace the whisker plots of MSD at 10 seconds with either scatter dot plot or violin plots, whichever conveys most clearly the distribution of the data: indeed, a bimodal distribution is expected in the current data, with undamaged cells having lower, and damaged cells having higher MSDs.

      The reviewer raises two points here.

      The first point concerns the residence time of the Rad51 filament with the donor when a subtelomeric DSB happens. Measuring the DSBs as a function of the distance between donor and Rad52mCherry (or Ddc1--‐mCherry) would allow deciding on the cause or the consequence of the global mobility. Thus, if mobility is the consequence of (stochastic) contact, leading to a better efficiency of homologous recombination, we would see an increase in MSDs only when the distance between donor and filament would be small. Conversely, if global mobility is the cause of contact, the increase in mobility would be visible even when the distance between donor and filament is large. It would be necessary to have a labelling system with 3 different fluorophores — the one for the global mobility, the one for the donor and the one allowing following the filament. This triple labelling is still to be developed.

      The second point concerns the important question of the heterogeneity of a population, a central challenge in biology. Here we wish to distinguish between undamaged and damaged cells. Even if a selection of the damaged cells had been made, this would not solve entirely the inherent cell to cell variation: at a given time, it is possible that a cell, although damaged, moves little and conversely that a cell moves more, even if not damaged. The question of heterogeneity is therefore important and the subject of intense research that goes beyond the framework of our work (Altschuler and Wu, 2010). However, in order to start to clarify if a bias could exist when considering a mixed population (20% undamaged and 80% damaged), we analyzed MSDs, using a scatter plot. We considered two population of cells where the damage is the best controlled, i.e. i) the red population which we know has been repaired and, importantly, has lost the cut site and will be not cut again (undamaged--‐only population) and ii) the white population, blocked in G2/M, because it is damaged and not repaired (damaged--‐only population). These two populations show very significant differences in their median MSDs. We artificially mixed the MSDs values obtained from these two populations at a rate of 20% of undamaged--‐only cells and 80% of damaged--‐only cells. We observed that the mean MSDs of the damaged--‐only and undamaged--‐only cells were significantly different. Yet, the mean MSD of damaged--‐only cells was not statistically different from the mean MSD from the 20%--‐80% mixed cell population. Thus, the conclusions based on the average MSDs of all cells remain consistent.

      Scatter plot showing the MSD at 10 seconds of the damaged-­‐only population (in white), the repaired-­‐only population (in red), or the 20%-­‐80% mixed population

      2) Perform the phospho-H2A ChIP-qPCR in the C and S strains in the absence of Rad51 and Rad9, to strengthen the painter model.

      ChIP experiments in mutant backgrounds as well as phosphorylation/dephosphorylation kinetics would corroborate the mobility data described here, but are beyond the scope of this manuscript. Yet, a phospho--‐ H2A ChIP experiment was performed in a Δrad51 mutant in Renkawitz et al. 2013. In that case, γH2A propagation was restricted only to the region around the DSB, corroborating both the requirement for Rad51 in distal mobility and the lack of requirement for Rad51 in proximal mobility.

      3) Their data at least partly run against previously published results, or fail to account for them. For instance, it is hard to see how their model (or the painter model), could explain the constitutively activated global mobility increase observed by Smith .. Rothstein 2018 in a rad51 rad52 mutant. Furthermore, the gasser lab linked the increased chromatin mobility to a general loss of histones genome-wide, which would be inconsistent with the more localized mechanism proposed here. Do they represent an independent mechanism? These conflicting observations need to be discussed in detail.

      Apart from the fact that the mechanisms in place in a haploid or a diploid cell are not necessarily comparable, it is not clear to us that our data are inconsistent with that of Smith et al. (Smith et al., 2018). Indeed, it is not known by which mechanisms the increase in global mobility is constitutively activated in a Δrad51 Δrad52 mutant. But according to their hypothesis the induction of a checkpoint is likely and so is the phosphorylation of H2A. It would be interesting to verify γH2A in such a context. This question is now mentioned in the main text.

      Concerning histone loss, it appears to be different depending on the number of DSBs. Upon multiple DNA damage following genotoxic treatment with Zeocin, Susan Gasser's group has clearly established that nucleosome loss occurs (Cheblal et al., 2020; Hauer et al., 2017). Nucleosome loss, like H2A phosphorylation as we have shown (Garcia Fernandez et al., 2021; Herbert et al., 2017), leads to increased global mobility. The state of chromatin following these histone losses or modifications is not yet fully understood, but could coexist. In the case of a single DSB by HO, it is the local mobility of the MAT locus that is examined (Fig3B in (Cheblal et al., 2020). In this case, the increase in mobility is indeed dependent on Arp8 which controls histone degradation and correlates with a polymer pattern consistent with normal chromatin. It is likely that histone degradation occurs locally when a single DSB occurs. Concerning histone loss genome wide, the question remains open. If histone eviction nevertheless occurred globally upon a single DSB, both types of modifications could be possible. This aspect is now mentioned in the discussion.

    1. Author Response:

      Reviewer #3 (Public Review):

      INaR is related to an alternative inactivation mode of voltage activated sodium channels. It was suggested that an intracellular charged particle blocks the sodium channel alpha subunit from the intracellular space in addition to the canonical fast inactivation pathway. Putative particles revealed were sodium channel beta4 subunit and Fibroblast growth factor 14. However, abolishing the expression of neither protein does eliminate INaR. Therefore as recently suggested by several authors it is conceivable that INaR is not mediated by a particle driven mechanism at all. Instead, these and other proteins might bind to the pore forming alpha subunit and endow it with an alternative inactivation pathway as envisioned in this paper by the authors.

      The main experimental findings were (1) The amplitude of INaR is independent of the voltage of the preceding step. (2) The peak amplitudes of INaR are dependent on the time of the depolarizing step but independent of the sodium driving force. (3) INaT and INaR are differential sensitive to recovery from inactivation. According to their experimental data the authors put forward a kinetic scheme that was fitted to their voltage-clamp patch-clamp recordings of freshly isolated Purkinje cells. The kinetic model proposed here has one open state and three inactivated states, two states related to fast inactivation (IF1, IF2) and one state related to a slower process (IS). Notably IS and IF are not linked directly in the kinetic scheme.

      In my humble opinion, the proposed kinetic model fails to explain important experimental aspects and falls short to be related to the molecular machinery of sodium channels as outlined below. Still it is due time to advance the concepts of INaR. The new experimental findings of the authors are important in this respect and some ideas of the new model might be integrated in future kinetics schemes. In addition, the framework of INaR is not easy to get hold on with lots of experimental findings in the literature. Likely, my review falls also short in some aspects. Discussion is much needed and appreciated.

      INaT & INaR decay The authors stated that decay speed of INaT and INaR is different and hence different mechanisms are involved. However at a given voltage (-45 mV) they have nicely illustrated (Fig. 2D and in the simulation Fig. 3H) that this is not the case. This statement is also not compatible with the used Markov model. That is because (at a given voltage) the decay of both current identities proceed from the same open state. Apparent inactivation time constants might be different, though, due to the transition to the on state.

      We apologize that the language used was confusing. Our suggestion that there is more than one pathway for inactivation (from an open/conducting state) is the observation that the decay of INaT being biexponential at steady-state voltages. In the revised manuscript, we point out (lines 546-549) that, at some voltages, the slower of the two decay time constants (of INaT) is identical to the time constant of INaR decay. We also discuss how this observation was previously (Raman and Bean, 2001) interpreted.

      Accumulation in the IS state after INaT inactivation in IF1 and IF2 has to proceed through closed states. How is this compatible with current NaV models? The authors have addressed this issue in the discussion. The arguments they have brought forward are not convincing for me since toxins and mutations are grossly impairing channel function.

      Thank you for this comment. We would like to point out that, in our Markov model, Nav channels may accumulate in IS through either the closed state or open state. This requires, of course, that Nav channels can recover from inactivation prior to deactivation. While we agree that toxins and mutations can grossly impair channel function, we think these studies remain crucial in revealing the potential gating mechanisms of Nav channel pore-forming subunits, and how these mechanisms may vary across cell types that express different combinations of accessory proteins.

      Fast inactivation - parallel inactivation pathways Related to the comment above the motivation to introduce a second fast-inactivated state IF2 is not clear. Using three states for inactivation would imply three inactivation time constants (O->IF1, IF1->IF2, O->IS) which are indeed partially visible in the simulation (Fig. 3). However, experimental data of INaT inactivation seldom require more than one time constant for fast inactivation. Importantly the authors do not provide data on INaT inactivation of the model in Fig. 3. Fast Inactivation is mapped to the binding of the IFM particle. In this model at slightly negative potential IF1 and IF2 reverse from absorbing states to dissipating states. How is this compatible with the IFM mechanism? Additionally, the statements in the discussion are not helpful, either a second time constants is required for IF (two distinct states, with two time constants) or not.

      We thank this Reviewer for this comment. We tried to developed the model based on previous data on Nav channel inactivation. Indeed, much experimental data exists for the fast inactivation pathway (O -> IF1). As we noted in the discussion, without the inclusion of the IF2 state, we were unable to fully reproduce our experimental data, which led us to add the IF2 state. As with all model development, we balanced the need to faithfully reproduce the experimental data with efforts to limit the complexity of the model structure. In addition, as noted in the Methods section, our routine is an automatic parameter optimization routine that seeks to minimize the error between simulation and experiments. We can never be sure that we have found an absolute minimum, or that the optimization got stuck at a local minimum when simulating without inclusion of IF2. In other words, there may be a parameter set that sufficiently fits the data without inclusion of IF2, but we were unable to find it. As a safeguard against local minima, we used multistarts of the optimization routine with different initial parameter sets. In each case, we were unable to find a sufficiently acceptable parameter set.

      We agree with this Reviewer that at slightly negative potentials (compared to strong depolarizations), channels exit the IF1 state at different rates, although we would point out that channels dissipate from the IF1 state (accumulating into IS1) under both conditions (see Figure 8B-C). This requires the binding and unbinding of the IFM motif to occur with some voltagesensitivity. We believe this to be a possibility in light of evidence that suggests IFM binding (and fast-inactivation) is an allosteric effect (Yan et al., 2017) and evidence showing that mutations in the pore-lining S6 segments can give rise to shifts of the voltage-dependence of fast inactivation without correlated shifts in the voltage-dependence of activation (Cervenka et al., 2018). However, it remains unclear how voltage-sensing in the Nav channel interact with fast- and slow-inactivation processes.

      Due to space constraints in Figure 3, we did not show a plot of INaT voltage dependence. However, below, please find the experimental data (points), and simulated (line) INaT in our model.

      Differential recovery of INaT & INaR Different kinetics for INaR and INaR are a very interesting finding. In my opinion, this data is not compatible with the proposed Markov model (and the authors do not provide data on the simulation). If INaT1 and INaT2 (Fig. 5 A) have the same amplitude the occupancy of the open state must be the same. I think there is no way to proceed differentially to the open state of INaR in subsequent steps unless e.g. slow inactivated states are introduced.

      Thank you for bringing up this important point. The differential recovery of INaT and INaR indicates there are distinct Nav channel populations underlying the Nav currents in Purkinje neurons. We make this point on lines 632-635 of the revised manuscript. Because our Markov model is used to simulate a single channel population, we do not expect the model to reproduce the results shown in Figure 5. We have now added this point to the Discussion section on lines 637-640.

      Kinetic scheme Comparison with the Raman-Bean model is a bit unfair unless the parameters are fitted to the same dataset used in this study. However, the authors have an important point in stating that this model could not reproduce all aspects of INaR. A more detailed discussion (and maybe analysis) of the states required for the models would be ideal including recent literature (e.g., J Physiol. 2020 Jan;598(2):381-40). Could the Raman-Bean model perform better if an additional inactivated state is introduced? Are alternative connections possible in the proposed model? How ambiguous is the model? Is given my statements above a second open state required? Finally, a better link of the introduced states to NaV structure-function relationship would be beneficial.

      These are all excellent points. We absolutely agree; it was/is not our intention to “prove” that the Raman-Bean model does not fit our dataset (as you mention, with proper refinement of the parameters, some of the data may be well fit). In fact, qualitatively we found the Raman-Bean model quite consistent with our dataset (which is an excellent validation of both the model, and our data). It was our intention to show (in Figure 7) that there is good agreement between the Raman-Bean model and our experimental data for steady state inactivation (C), availability (D), and recovery from inactivation (E). While we find the magnitude of the resurgent current (F) to be markedly different than the Raman-Bean data, we now note this to likely be due to the large differences in the extracellular Na+ concentrations used in voltage-clamp experiments (lines 440-444). Our models, however, specifically differ in our parallel fast and slow inactivation pathways (Figure 7H). As seen in the Raman-Bean model, in response to a prolonged depolarizing holding potential, there is negligible inactivation, as the OB state remains absorbent until the channel is repolarized. This is primarily because the channel must transit through the Open state on repolarization. We find distinctly different behavior in our data. As seen in the experimental data shown in 7H, despite a prolonged depolarization, Nav channels begin to inactivate and accumulate in the slow inactivated state without prerequisite channel opening. This behavior is impossible to fit in the Raman-Bean model, given the topological constraint of the model requiring a single pathway through the open state from the OB state.

      To that point, it is also unlikely that the addition of inactivated states to the Raman-Bean model would help fit this new dataset. Indeed, the Raman-Bean model contains 7 inactivated states. If there were a connection between OB ->I6, it is possible that direct inactivation (bypassing the O state) may help. Again, however, it is not our intention to discredit the Raman-Bean model, nor is it our intention to improve the Raman-Bean model. With new datasets, a fresh look at model topology was undertaken, which is how we developed our proposed model.

      This Reviewer astutely points out a known limitation of Markov (state-chain) modeling; it is impossible to tell uniqueness, or ambiguity of the model (both with parameters as well as model topology). Following the results of Menon et al. 2009 (PNAS vol. 106 / #39 / 16829 – 16834), in which they used a state mutating genetic algorithm to vary topologies of a Markov model, our group (Mangold et al. 2021, PLoS Comp Bio) recently published an algorithm to distinctly enumerate all possible model structures using rooted graph theory (e.g. all possible combinations of models, rooted around a single open state). What we found (which is not entirely surprising) is that there are many model structures and parameter sets that adequately fit certain datasets (e.g., cardiac Nav channels).

      Therefore, the goal is never to find the model (indeed we don’t propose that we have done so), but rather to find a model with acceptable fits to the data and then use that model to hypothesize why that model structure works, as well as to hypothesize higher dimensional dynamics. We make these points in the revised manuscript (lines 591-597).

      We did not specifically explore the impact of a second open state in our modeling and simulation studies, but we would certainly agree that a model with a second open state may recapitulate the dataset.

    1. Author Response

      Reviewer #3: (Public Review):

      In this ms Li et al. examine the molecular interaction of Rabphilin 3A with the SNARE complex protein SNAP25 and its potential impact in SNARE complex assembly and dense core vesicle fusion.

      Overall the literature of rabphilin as a major rab3/27effector on synaptic function has been quite enigmatic. After its cloning and initial biochemical analysis, rather little new has been found about rabphilin, in particular since loss of function analysis has shown rather little synaptic phenotypes (Schluter 1999, Deak 2006), arguing against that rabphilin plays a crucial role in synaptic function.

      While the interaction of rabphilin to SNAP25 via its bottom part of the C2 domain has been already described biochemically and structurally in the Deak et al. 2006, and others, the authors make significant efforts to further map the interactions between SNAP25 and rabphilin and indeed identified additional binding motifs in the first 10 amino acids of SNAP25 that appear critical for the rabphilin interaction.

      Using KD-rescue experiments for SNAP25, in TIRF based imaging analysis of labeled dense core vesicles showed that the N-terminus of SN25 is absolutely essential for SV membrane proximity and release. Similar, somewhat weaker phenotypes were observed when binding deficient rabphilin mutants were overexpressed in PC12 cells coexpressing WT rabphilin. The loss of function phenotypes in the SN25 and rabphilin interaction mutants made the authors to claim that rabphilin-SN25 interactions are critical for docking and exocytosis. The role of these interaction sites were subsequently tested in SNARE assembly assays, which were largely supportive of rabphilin accelerating SNARE assembly in a SN25 -terminal dependent way.

      Regarding the impact of this work, the transition of synaptic vesicles to form fusion competent trans-SNARE complex is very critical in our understanding of regulated vesicle exocytosis, and the authors put forward an attractive model forward in which rabphilin aids in catalyzing the SNARE complex assembly by controlling SNAP25 a-helicalicity of the SNARE motif. This would provide here a similar regulatory mechanism as put forward for the other two SNARE proteins via their interactions with Munc18 and intersection, respectively.

      We thank the reviewer #3 for the summary of the paper and for the praise of our work. The point-to-point replies are as follow:

      While discovery of the novel interaction site of rabphilin with the N-Terminus of SNAP25 is interesting, I have issues with the functional experiments. The key reliance of the paper is whether it provides convincing data on the functional role of the interactions, given the history of loss of function phenotypes for Rabphilin. First, the authors use PC12 cells and dense core vesicle docking and fusion assays. Primary neurons, where rabphilin function has been tested before, has unfortunately not been utilized, reducing the impact of docking and fusion phenotype.

      We have discussed these questions as mentioned in our response to Essential Revisions 3 and added this corresponding passage to the Discussion section (pp.18-19, lines 407-427).

      In particular the loss of function phenotype in figure 3 of the n-terminally deleted SNAP25 in docking and fusion is profound, and at a similar level than the complete loss of the SNARE protein itself. This is of concern as this is in stark contrast to the phenotype of rabphilin loss in mammalian neurons where the phenotype of SNAP25 loss is very severe while rabphilin loss has almost no effect on secretion. This would argue that the N-terminal of SNAPP25 has other critical functions besides interacting with rabphilin. In addition, it could argue that the n-Terminal SNAP25 deletion mutant may be made in the cell (as indicated from the western blot) but may not be properly trafficked to the site of release

      To test whether the N-peptide deletion mutant of SN25 can properly target to the plasma membrane, we overexpressed the SN25 FL or SN25 (11–206) with C-terminal EGFP-tag in PC12 cells and monitored the localization of SN25 FL-EGFP and SN25 (11–206)-EGFP near the plasma membrane by TIRF microscopy. We observed that the average fluorescence intensity of SN25 (11–206)-EGFP showed no significant difference with SN25 FL-EGFP as below, suggesting that the N-peptide deletion mutant may not influence the trafficking of SN25 to plasma membrane.

      (A) TIRF imaging assay to monitor the localization of SN25-EGFP near the plasma membrane. Overexpression of SN25 FL-EGFP (left) and SN25 (11–206)-EGFP (right) using pEGFP-N3 vector in PC12 cells. Scale bars, 10 μm. (B) Quantification of the average fluorescence intensity of SN25-EGFP near the plasma membrane in (A). Data are presented as mean ± SEM (n ≥ 10 cells in each). Statistical significance and P values were determined by Student’s t-test. ns, not significant.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a PyTorch-based simulator for prosthetic vision. The model takes in the anatomical location of a visual cortical prostheses as well as a series of electrical stimuli to be applied to each electrode, and outputs the resulting phosphenes. To demonstrate the usefulness of the simulator, the paper reproduces psychometric curves from the literature and uses the simulator in the loop to learn optimized stimuli.

      One of the major strengths of the paper is its modeling work - the authors make good use of existing knowledge about retinotopic maps and psychometric curves that describe phosphene appearance in response to single-electrode stimulation. Using PyTorch as a backbone is another strength, as it allows for GPU integration and seamless integration with common deep learning models. This work is likely to be impactful for the field of sight restoration.

      1) However, one of the major weaknesses of the paper is its model validation - while some results seem to be presented for data the model was fit on (as opposed to held-out test data), other results lack quantitative metrics and a comparison to a baseline ("null hypothesis") model. On the one hand, it appears that the data presented in Figs. 3-5 was used to fit some of the open parameters of the model, as mentioned in Subsection G of the Methods. Hence it is misleading to present these as model "predictions", which are typically presented for held-out test data to demonstrate a model's ability to generalize. Instead, this is more of a descriptive model than a predictive one, and its ability to generalize to new patients remains yet to be demonstrated.

      We agree that the original presentation of the model fits might give rise to unwanted confusion. In the revision, we have adapted the fit of the thresholding mechanism to include a 3-fold cross validation, where part of the data was excluded during the fitting, and used as test sets to calculate the model’s performance. The results of the cross- validation are now presented in panel D of Figure 3. The fitting of the brightness and temporal dynamics parameters using cross-validation was not feasible due to the limited amount of quantitative data describing temporal dynamics and phosphene size and brightness for intracortical electrodes. To avoid confusion, we have adapted the corresponding text and figure captions to specify that we are using a fit as description of the data.

      We note that the goal of the simulator is not to provide a single set of parameters that describes precise phosphene perception for all patients but that it could also be used to capture variability among patients. Indeed, the model can be tailored to new patients based on a small data set. Figure 3-figure supplement 1 exemplifies how our simulator can be tailored to several data sets collected from patients with surface electrodes. Future clinical experiments might be used to verify how well the simulator can be tailored to the data of other patients.

      Specifically, we have made the following changes to the manuscript:

      • Caption Figure 2: the fitted peak brightness levels reproduced by our model

      • Caption Figure 3: The model's probability of phosphene perception is visualized as a function of charge per phase

      • Caption Figure 3: Predicted probabilities in panel (d) are the results of a 3-fold cross- validation on held-out test data.

      • Line 250: we included biologically inspired methods to model the perceptual effects of different stimulation parameters

      • Line 271: Each frame, the simulator maps electrical stimulation parameters (stimulation current, pulse width and frequency) to an estimated phosphene perception

      • Lines 335-336: such that 95% of the Gaussian falls within the fitted phosphene size.

      • Line 469-470: Figure 4 displays the simulator's fit on the temporal dynamics found in a previous published study by Schmidt et al. (1996).

      • Lines 922-925: Notably, the trade-off between model complexity and accurate psychophysical fits or predictions is a recurrent theme in the validation of the components implemented in our simulator.

      2) On the other hand, the results presented in Fig. 8 as part of the end-to-end learning process are not accompanied by any sorts of quantitative metrics or comparison to a baseline model.

      We now realize that the presentation of the end-to-end results might have given the impression that we present novel image processing strategies. However, the development of a novel image processing strategy is outside the scope of the study. Instead, The study aims to provide an improved simulation which can be used for more realistic assessment of different stimulation protocols. The simulator needs to fit experimental data, and it should run fast (so it can be used in behavioral experiments). Importantly, as demonstrated in our end-to-end experiments, the model can be used in differentiable programming pipelines (so it can be used in computational optimization experiments), which is a valuable contribution in itself because it lends itself to many machine learning approaches which can improve the realism of the simulation.

      We have rephrased our study aims in the discussion to improve clarity.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck et al., 2022; Ghaffari et al., 2021).

      • Lines 814-819: Ultimately, the development of task-relevant scene-processing algorithms will likely benefit both from computational optimization experiments as well as exploratory SPV studies with human observers. With the presented simulator we aim to contribute a flexible toolkit for such experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      3) The results seem to assume that all phosphenes are small Gaussian blobs, and that these phosphenes combine linearly when multiple electrodes are stimulated. Both assumptions are frequently challenged by the field. For all these reasons, it is challenging to assess the potential and practical utility of this approach as well as get a sense of its limitations.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      4) Another weakness of the paper is the term "biologically plausible", which appears throughout the manuscript but is not clearly defined. In its current form, it is not clear what makes this simulator "biologically plausible" - it certainly contains a retinotopic map and is fit on psychophysical data, but it does not seem to contain any other "biological" detail.

      We thank the reviewer for the remark. We improved our description of what makes the simulator “biologically plausible” in the introduction (line 78): ‘‘Biological plausibility, in our work's context, points to the simulation's ability to capture essential biological features of the visual system in a manner consistent with empirical findings: our simulator integrates quantitative findings and models from the literature on cortical stimulation in V1 [...]”. In addition, we mention in the discussion (lines 611 - 621): “The aim of this study is to present a biologically plausible phosphene simulator, which takes realistic ranges of stimulation parameters, and generates a phenomenologically accurate representation of phosphene vision using differentiable functions. In order to achieve this, we have modeled and incorporated an extensive body of work regarding the psychophysics of phosphene perception. From the results presented in section H, we observe that our simulator is able to produce phosphene percepts that match the descriptions of phosphene vision that were gathered in basic and clinical visual neuroprosthetics studies over the past decades.”

      5) In fact, for the most part the paper seems to ignore the fact that implanting a prosthesis in one cerebral hemisphere will produce phosphenes that are restricted to one half of the visual field. Yet Figures 6 and 8 present phosphenes that seemingly appear in both hemifields. I do not find this very "biologically plausible".

      We agree with the reviewer that contemporary experiments with implantable electrodes usually test electrodes in a single hemisphere. However, future clinically useful approaches should use bilaterally implanted electrode arrays. Our simulator can either present phosphene locations in either one or both hemifields.

      We have made the following textual changes:

      • Fig. 1 caption: Example renderings after initializing the simulator with four 10 × 10 electrode arrays (indicated with roman numerals) placed in the right hemisphere (electrode spacing: 4 mm, in correspondence with the commonly used 'Utah array' (Maynard et al., 1997)).

      • Line 518-525: The simulator is initialized with 1000 possible phosphenes in both hemifields, covering a field of view of 16 degrees of visual angle. Note that the simulated electrode density and placement differs from current prototype implants and the simulation can be considered to be an ambitious scenario from a surgical point of view, given the folding of the visual cortex and the part of the retinotopic map in V1 that is buried in the calcarine sulcus. Line 546-547: with the same phosphene coverage as the previously described experiment

      Reviewer #2 (Public Review):

      Van der Grinten and De Ruyter van Steveninck et al. present a design for simulating cortical- visual-prosthesis phosphenes that emphasizes features important for optimizing the use of such prostheses. The characteristics of simulated individual phosphenes were shown to agree well with data published from the use of cortical visual prostheses in humans. By ensuring that functions used to generate the simulations were differentiable, the authors permitted and demonstrated integration of the simulations into deep-learning algorithms. In concept, such algorithms could thereby identify parameters for translating images or videos into stimulation sequences that would be most effective for artificial vision. There are, however, limitations to the simulation that will limit its applicability to current prostheses.

      The verification of how phosphenes are simulated for individual electrodes is very compelling. Visual-prosthesis simulations often do ignore the physiologic foundation underlying the generation of phosphenes. The authors' simulation takes into account how stimulation parameters contribute to phosphene appearance and show how that relationship can fit data from actual implanted volunteers. This provides an excellent foundation for determining optimal stimulation parameters with reasonable confidence in how parameter selections will affect individual-electrode phosphenes.

      We thank the reviewer for these supportive comments.

      Issues with the applicability and reliability of the simulation are detailed below:

      1) The utility of this simulation design, as described, unfortunately breaks down beyond the scope of individual electrodes. To model the simultaneous activation of multiple electrodes, the authors' design linearly adds individual-electrode phosphenes together. This produces relatively clean collections of dots that one could think of as pixels in a crude digital display. Modeling phosphenes in such a way assumes that each electrode and the network it activates operate independently of other electrodes and their neuronal targets. Unfortunately, as the authors acknowledge and as noted in the studies they used to fit and verify individual-electrode phosphene characteristics, simultaneous stimulation of multiple electrodes often obscures features of individual-electrode phosphenes and can produce unexpected phosphene patterns. This simulation does not reflect these nonlinearities in how electrode activations combine. Nonlinearities in electrode combinations can be as subtle the phosphenes becoming brighter while still remaining distinct, or as problematic as generating only a single small phosphene that is indistinguishable from the activation of a subset of the electrodes activated, or that of a single electrode.

      If a visual prosthesis happens to generate some phosphenes that can be elicited independently, a simulator of this type could perhaps be used by processing stimulation from independent groups of electrodes and adding their phosphenes together in the visual field.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      2) Verification of how the simulation renders individual phosphenes based on stimulation parameters is an important step in confirming agreement between the simulation and the function of implanted devices. That verification was well demonstrated. The end use a visual-prosthesis simulation, however, would likely not be optimizing just the appearance of phosphenes, but predicting and optimizing functional performance in visual tasks. Investigating whether this simulator can suggest visual-task performance, either with sighted volunteers or a decoder model, that is similar to published task performance from visual-prosthesis implantees would be a necessary step for true validation.

      We agree with the reviewer that it will be vital to investigate the utility of the simulator in tasks. However, the literature on the performance of users of a cortical prosthesis in visually-guided tasks is scarce, making it difficult to compare task performance between simulated versus real prosthetic vision.

      Secondly, the main objective of the current study is to propose a simulator that emulates the sensory / perceptual experience, i.e. the low-level perceptual correspondence. Once more behavioral data from prosthetic users become available, studies can use the simulator to make these comparisons.

      Regarding the comparison to simulated prosthetic vision in sighted volunteers, there are some fundamental limitations. For instance, sighted subjects are exposed for a shorter duration to the (simulated) artificial percept and lack the experience and training that prosthesis users get. Furthermore, sighted subjects may be unfamiliar with compensation strategies that blind individuals have developed. It will therefore be important to conduct clinical experiments.

      To convey more clearly that our experiments are performed to verify the practical usability in future behavioral experiments, we have incorporated the following textual adjustments:

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end (Fauvel et al., 2022; Beyeler & Sanchez- Garcia, 2022).

      3) A feature of this simulation is being able to convert stimulation of V1 to phosphenes in the visual field. If used, this feature would likely only be able to simulate a subset of phosphenes generated by a prosthesis. Much of V1 is buried within the calcarine sulcus, and electrode placement within the calcarine sulcus is not currently feasible. As a result, stimulation of visual cortex typically involves combinations of the limited portions of V1 that lie outside the sulcus and higher visual areas, such as V2.

      We agree that some areas (most notably the calcarine sulcus) are difficult to access in a surgical implantation procedure. A realistic simulation of state-of-the-art cortical stimulation should only partially cover the visual field with phosphenes. However, it may be predicted that some of these challenges will be addressed by new technologies. We chose to make the simulator as generally applicable as possible and users of the simulator can decide which phosphene locations are simulated. To demonstrate that our simulator can be flexibly initialized to simulate specific implantation locations using third- party software, we have now added a supplementary figure (Figure 1-figure supplement 1) that displays a demonstration of an electrode grid placement on a 3D brain model, generating the phosphene locations from receptive field maps. However, the simulator is general and can also be used to guide future strategies that aim to e.g. cover the entire field with electrodes, compare performance between upper and lower hemifields etc.

      Reviewer #3 (Public Review):

      The authors are presenting a new simulation for artificial vision that incorporates many recent advances in our understanding of the neural response to electrical stimulation, specifically within the field of visual prosthetics. The authors succeed in integrating multiple results from other researchers on aspects of V1 response to electrical stimulation to create a system that more accurately models V1 activation in a visual prosthesis than other simulators. The authors then attempt to demonstrate the value of such a system by adding a decoding stage and using machine-learning techniques to optimize the system to various configurations.

      1) While there is merit to being able to apply various constraints (such as maximum current levels) and have the system attempt to find a solution that maximizes recoverable information, the interpretability of such encodings to a hypothetical recipient of such a system is not addressed. The authors demonstrate that they are able to recapitulate various standard encodings through this automated mechanism, but the advantages to using it as opposed to mechanisms that directly detect and encode, e.g., edges, are insufficiently justified.

      We thank the reviewer for this constructive remark. Our simulator is designed for more realistic assessment of different stimulation protocols in behavioral experiments or in computational optimization experiments. The presented end-to-end experiments are a demonstration of the practical usability of our simulator in computational experiments, building on a previously existing line of research. In fact, our simulator is compatible with any arbitrary encoding strategy.

      As our paper is focused on the development of a novel tool for this existing line of research, we do not aim to make claims about the functional quality of end-to-end encoders compared to alternative encoding methods (such as edge detection). That said, we agree with the reviewer that it is useful to discuss the benefits of end-to-end optimization compared to e.g. edge detection will be useful.

      We have incorporated several textual changes to give a more nuanced overview and to acknowledge that many benefits remain to be tested. Furthermore, we have restated our study aims more clearly in the discussion to clarify the distinction between the goals of the current paper and the various encoding strategies that remain to be tested.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck, Güçlü et al., 2022; Ghaffari et al., 2021).

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      2) The authors make a few mistakes in their interpretation of biological mechanisms, and the introduction lacks appropriate depth of review of existing literature, giving the reader the mistaken impression that this is simulator is the only attempt ever made at biologically plausible simulation, rather than merely the most recent refinement that builds on decades of work across the field.

      We thank the reviewer for this insight. We have improved the coverage of the previous literature to give credit where credit is due, and to address the long history of simulated phosphene vision.

      Textual changes:

      • Lines 64-70: Although the aforementioned SPV literature has provided us with major fundamental insights, the perceptual realism of electrically generated phosphenes and some aspects of the biological plausibility of the simulations can be further improved and by integrating existing knowledge of phosphene vision and its underlying physiology.

      • Lines 164-190: The aforementioned studies used varying degrees of simplification of phosphene vision in their simulations. For instance, many included equally-sized phosphenes that were uniformly distributed over the visual field (informally referred to as the ‘scoreboard model’). Furthermore, most studies assumed either full control over phosphene brightness or used binary levels of brightness (e.g. 'on' / 'off'), but did not provide a description of the associated electrical stimulation parameters. Several studies have explicitly made steps towards more realistic phosphene simulations, by taking into account cortical magnification or using visuotopic maps (Fehervari et al., 2010;, Li et al., 2013; Srivastava et al., 2009; Paraskevoudi et al., 2021), simulating noise and electrode dropout (Dagnelie et al., 2007), or using varying levels of brightness (Vergnieux et al., 2017; Sanchez-Garcia et al., 2022; Parikh et al., 2013). However, no phosphene simulations have modeled temporal dynamics or provided a description of the parameters used for electrical stimulation. Some recent studies developed descriptive models of the phosphene size or brightness as a function of the stimulation parameters (Winawer et al., 2016; Bosking et al., 2017). Another very recent study has developed a deep-learning based model for predicting a realistic phosphene percept for single stimulating electrodes (Granley et al., 2022). These studies have made important contributions to improve our understanding of the effects of different stimulation parameters. The present work builds on these previous insights to provide a full simulation model that can be used for the functional evaluation of cortical visual prosthetic systems.

      • Lines 137-140: Due to the cortical magnification (the foveal information is represented by a relatively large surface area in the visual cortex as a result of variation of retinal RF size) the size of the phosphene increases with its eccentricity (Winawer & Parvizi, 2016, Bosking et al., 2017).

      • Lines 883-893: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006; McIntosh et al., 2013, Paraskevoudi & Pezaris, 2021; Rassia & Pezaris 2018, Titchener et al., 2018, Srivastava et al., 2009)

      3) The authors have importantly not included gaze position compensation which adds more complexity than the authors suggest it would, and also means the simulator lacks a basic, fundamental feature that strongly limits its utility.

      We agree with the reviewer that the inclusion of gaze position to simulate gaze-centered phosphene locations is an important requirement for a realistic simulation. We have made several textual adjustments to section M1 to improve the clarity of the explanation and we have added several references to address the simulation literature that took eye movements into account.

      In addition, we included a link to some demonstration videos in which we illustrate that the simulator can be used for gaze-centered phosphene simulation. The simulation models the phosphene locations based on the gaze direction, and updates the input with changes in the gaze direction. The stimulation pattern is chosen to encode the visual environment at the location where the gaze is directed. Gaze contingent processing has been implemented in prior simulation studies (for instance: Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018) and even in the clinical setting with users of the Argus II implant (Caspi et al., 2018). From a modeling perspective, it is relatively straightforward to simulate gaze-centered phosphene locations and gaze contingent image processing (our code will be made publicly available). At the same time, however, seen from a clinical and hardware engineering perspective, the implementation of eye-tracking in a prosthetic system for blind individuals might come with additional complexities. This is now acknowledged explicitly in the manuscript.

      Textual adjustment:

      Lines 883-910: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006, McIntosh et al., 2013; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018; Srivastava et al., 2009). Note that some of the cited studies implemented a simulation condition where not only the simulated phosphene locations, but also the stimulation protocol depended on the gaze direction. More specifically, instead of representing the head-centered camera input, the stimulation pattern was chosen to encode the external environment at the location where the gaze was directed. While further research is required, there is some preliminary evidence that such a gaze-contingent image processing can improve the functional and subjective quality of prosthetic vision (Caspi et al., 2018; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018). Some example videos of gaze-contingent simulated prosthetic vision can be retrieved from our repository (https://github.com/neuralcodinglab/dynaphos/blob/main/examples/). Note that an eye-tracker will be required to produce gaze-contingent image processing in visual prostheses and there might be unforeseen complexities in the clinical implementation thereof. The study of oculomotor behavior in blind individuals (with or without a visual prosthesis) is still an ongoing line of research (Caspi et al.,2018; Kwon et al., 2013; Sabbah et al., 2014; Hafed et al., 2016).

      4) Finally, the computational capacity required to run the described system is substantial and is not one that would plausibly be used as part of an actual device, suggesting that there may be difficulties with converting results from this simulator to an implantable system.

      The software runs in real time with affordable, consumer-grade hardware. In Author response image 1 we present the results of performance testing with a 2016 model MSI GeForce GTX 1080 (priced around €600).

      Author response image 1.

      Note that the GPU is used only for the computation and rendering of the phosphene representations from given electrode stimulation patterns, which will never be part of any prosthetic device. The choice of encoder to generate the stimulation patterns will determine the required processing capacity that needs to be included in the prosthetic system, which is unrelated to the simulator’s requirements.

      The following addition was made to the text:

      • Lines 488-492: Notably, even on a consumer-grade GPU (e.g. a 2016 model GeForce GTX 1080) the simulator still reaches real-time processing speeds (>100 fps) for simulations with 1000 phosphenes at 256x256 resolution.

      5) With all of that said, the results do represent an advance, and one that could have wider impact if the authors were to reduce the computational requirements, and add gaze correction.

      We appreciate the kind compliment from the reviewer and sincerely hope that our revised manuscript meets their expectations. Their feedback has been critical to reshape and improve this work.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, the authors studied the erythropoiesis and hematopoietic stem/progenitor cell (HSPC) phenotypes in a ribosome gene Rps12 mutant mouse model. They found that RpS12 is required for both steady and stress hematopoiesis. Mechanistically, RpS12+/- HSCs/MPPs exhibited increased cycling, loss of quiescence, protein translation rate, and apoptosis rates, which may be attributed to ERK and Akt/mTOR hyperactivation. Overall, this is a new mouse model that sheds light into our understanding of Rps gene function in murine hematopoiesis. The phenotypic and functional analysis of the mice are largely properly controlled, robust, and analyzed.

      A major weakness of this work is its descriptive nature, without a clear mechanism that explains the phenotypes observed in RpS12+/- mice. It is possible that the counterintuitive activation of ERK/mTOR pathway and increased protein synthesis rate is a compensatory negative feedback. Direct mechanism of Rps12 loss could be studied by ths acute loss of Rps12, which is doable using their floxed mice. At the minimum, this can be done in mammalian hematopoietic cell lines.

      We thank the reviewer for pointing this out. We have addressed this question by developing a new inducible conditional knockout Rps12 mouse model (see response below to major point 1).

      Below are some specific concerns need to be addressed.

      1) Line 226. The authors conclude that "Together, these results suggest that RpS12 plays an essential role in HSC function, including self-renewal and differentiation." The reviewer has three concerns regarding this conclusion and corresponding Figure3. 1) The data shows that RpS12+/- mice have decreased number of both total BM cells and multiple subpopulations of HSPCs. The frequency of HSPC subpopulations should also be shown to clarify if the decreased HSPC numbers arises from decreased total BM cellularity or proportionally decrease in frequency. 2) This figure characterizes phenotypic HSPC in BM by flow and lineage cells in PB by CBC. HSC function and differentiation are not really examined in this figure, except for the colony assay in Figure 3K. BMT data in Figure4 is actually for HSC function and differentiation. So the conclusion here should be rephrased. 3) Since all LT-, ST-HSCs, as well as all MPPs are decreased in number, how can the authors conclude that Rps12 is important for HSC differentiation? No experiments presented here were specifically designed to address HSC differentiation.

      We thank the reviewer for this excellent point. We think that the main defect is in HSC and progenitor maintenance, rather than in HSC differentiation. This is consistent with the decrease in multiple HSC and progenitor populations, as observed both by calculating absolute numbers and by frequency of the parent population (see new Supplementary Figures S2C-S2C). We have removed any references to altered differentiation from the text.

      We added data on the population frequency in the Supplementary Figure 2. And in the corresponding text. See lines 221-235.

      2) Figure 3A and 5E. The flow cytometry gating of HSC/MPP is not well performed or presented, especially HSC plot. Populations are not well separated by phenotypic markers. This concerns the validity of the quantification data.

      We chose a better representative HSC plot and included it in the Figure 3A

      3) It is very difficult to read bone marrow cytospin images in Fig 6F without annotation of cell types shown in the figure. It appears that WT and +/- looked remarkably different in terms of cell size and cell types. This mouse may have other profound phenotypes that need detailed examination, such as lineage cells in the BM and spleen, and colony assays for different types of progenitors, etc.

      The purpose of the bone marrow cytospin images in Figure 6F was to show the high number of apoptotic cells in the bone marrow of Rps12 KO/+ mice compared with controls. The differences in apoptosis in the LSK and myeloid progenitor populations are quantified in the flow cytometry data shown in Figure 6G-H. A detailed quantitative analysis of different bone marrow cell populations and their relative frequencies is also shown in Figures 2 and 3. In Rps12 KO/+ bone marrow, we observed a significant decrease in multiple stem cell and progenitor populations.

      4) For all the intracellular phospho-flow shown in Fig7, both a negative control of a fluorescent 2nd antibody only and a positive stimulus should be included. It is very concerning that no significant changes of pAKT and pERK signaling (MFI) after SCF stimulation from the histogram in WT LSKs. There are no distinct peaks that indicate non-phospho-proteins and phosphoproteins. This casts doubt on the validity of results. It is possible though that Rsp12+/- have very high basal level of activation of pAKT/mTOR and pERK pathway. This again may point to a negative feedback mechanism of Rps12 haploinsufficiency.

      It is true that we did not observe an increase in pAKT, p4EBP1, or pERK in control cells in every case. This is often an issue with these specific phospho-flow cytometry antibodies, as they are not very sensitive, and the response to SCF is very time-dependent. We did observe an increase in pS6 with SCF in both LSK cells and progenitors (Figure 7B, E). However, the main point of this experiment was to assess the basal level of signaling in Rps12 KO/+ vs control cells. We did not observe hypersensitivity of RpS12 cells to SCF, but we did observe significant increases in pAKT, pS6, p4EBP1, and pERK in Rsp12 KO/+ LSK cells.

      To address the concern about the validity of staining, please see the requested flow histograms for unstained vs individual Phospho-antibodies (Ab): p4EBP1, pERK, pS6 and pAKT (Figure R1 for reviewers) below. Additionally, since staining with the surface antibodies potentially can change the peak, we are including additional an control of the cell surface antibodies vs full sample with surface antibodies and Phospho-Ab: p4EBP1, pERK, pS6 and pAKT. We can include this figure in the Supplementary Data if requested.

      5) The authors performed in vitro OP-Puro assay to assess the global protein translation in different HSPC subpopulations. 1) Can the authors provide more information about the incubation media, any cytokine or serum included? The incubation media with supplements may boost the overall translation status, although cells from WT and RpS12+/- are cultured side by side. Based on this, in vivo OP-Puro assay should be performed in both genotypes. 2) Polysome profiling assay should be performed in primary HSPCs, or at least in hematopoietic cell lines. It is plausible that RpS12 haploinsufficiency may affect the content of translational polysome fractions.

      We are including these details in the methods section: for in vitro OP-Puro assay (lines 555565) cells were resuspended in DMEM (Corning 10-013-CV) media supplemented with 50 µM β-mercaptoethanol (Sigma) and 20 µM OPP (Thermo Scientific C10456). Cells were incubated for 45 minutes at 37°C and then washed with Ca2+ and Mg2+ free PBS. No additional cytokines were added.

      We did not perform polysome profiles. Polysome profiling of mutant stem and progenitor cells would be very challenging, as their numbers are much reduced. We now deem this of reduced interest, given the conclusion of the revised manuscript that RpS12 haploinsufficiency reduces overall translation. Also, because in RpS12-floxed/+;SCL-CRE-ERT mouse model with acute deletion of RpS12 we observed the expected decrease in translation in HSCs using the same ex vivo OPP protocol, we did not follow up with in vivo OPP treatment,

    1. Author Response:

      Reviewer #1 (Public Review):

      "Modality-specific tracking of attention and sensory statistics in the human electrophysiological spectral exponent," Waschke et al. This paper follows upon a recent paper by a subset of the same authors that laid out the signal processing-bases for decomposing the EEG signal into periodic (i.e., "oscillatory") and aperiodic components (Donoghue et al., 2020). Here, the focus is on establishing physiological and functional interpretations of one of these aperiodic components: the exponent term of the 1/f(to the x power) fit to the power spectrum (a.k.a., its 'slope'). This is very important work that will have strong and lasting impact on how people design and interpret the results from EEG experiments, and is also likely to trigger many reanalyses of previously published data sets. However, the manuscript could do a better job of explain WHY this is so. In this reviewer's opinion, more linkage with elements of Donoghue et al. (2020). would help considerably.

      First, a brief summary of what this manuscript does, and why it is important. The first section reanalyzes data sets in human subjects undergoing ketamine or propofol anaesthesia, known to influence the E:I balance in the neural circuits that give rise to the EEG. This is an important step in establishing the physiological validity of the fundamental proposition that flattening of the 1/f component reflects an increase in the E:I balance whereas steepening reflects a decrease. This is because these effects of these two anaesthetic agents has been well established in several invasive studies. The second section demonstrates the functional properties of 1/f slope, in that tracks shifts of attention between visual and auditory stimuli in an electrode-specific manner (i.e., posterior for visual, central for auditory), and it also captures aperiodic stucture in these stimuli. It's not too strong to say that, after this paper, EEG-related research will never be the same again. The reason for this, however, isn't stated as clearly as it could be.

      Thank you for your positive appraisal of our work! We appreciate that you see significant benefit to this work, and also understand that you see significant room from improvements in the way results are presented, framed and discussed and want to express our thanks for these helpful comments. Below, we elaborate on them and the changes they prompted in greater detail.

      With regard to exposition, the manuscript could be improved in terms of building on Donoghue et al. (2020). To simplify, a main take-away from Donoghue et al. (2020) is that many past interpretations of EEG signals have mistakenly attributed to task- (or state-) related changes to changes in one or more oscillatory components of the signal. Perhaps most egregiously, what can appear as a change in power in the alpha band can often be shown to be better explained as no change in alpha but instead a change in either the slope or the offset of the 1/f component of the power spectrum. (E.g., the bump at 10 Hz will increase or decrease if the slope of the 1/f component changes, even though the 'true' oscillator centered at 10 Hz hasn't changed.) In this paper, the authors demonstrate that many conditions, physiological state and cognitive challenge, influence 1/f slope in ways that are systematic and that occur independent of changes that may or may not be occuring simultaneously in oscillatory alpha. Broadly, the authors should consider two modifications: first, point out for each key experimental finding how attributing everything to changes in oscillatory alpha (or sometimes other frequencies) would lead to flawed inference; second, don't stop at demonstrating that the slope effects hold when alpha dynamics are partialed out, but also report the converse -- in what ways is oscillatory alpha sensitive to aspects of physiology and/or behavior that 1/f slope is not? Even if there aren't any such cases (which seems unlikely) it would be informative for this to be tested and reported.

      We agree that a stronger focus on the differentiation between oscillatory and 1/f aspects of EEG activity can help to improve the didactic strength of our manuscript. Wherever possible, we have tried to make clear that the separation of different oscillatory activity and aperiodic signals is essential to not confuse one for the other. This is not only the case for the analysis of anaesthesia data were changes in alpha and beta power have to be separated from changes in spectral exponent but also applies to the proposed attention contrast where common effects of alpha power have to be taken into account and differentiated from spectral exponents. Similarly, an alignment of stimulus spectra with EEG activity could appear as a twofold power change (e.g., increase over low, decrease over high frequencies) if no separation of oscillatory and aperiodic signal parts is performed.

      We agree that explicitly contrasting spectral exponents with estimates of low-frequency or alpha power is essential. The original version of the manuscript already included such a comparison for the effect of attention on EEG spectral exponents and alpha power, respectively. To expand this approach, we inverted models and used stimulus spectral exponents (auditory or visual) as dependent variables while using either EEG spectral exponents, low-frequency power or alpha power as predictors (among the same covariates as in the winning models of the original approach). In a next step, we used likelihood ratio tests to compare model fit separately at each electrode, resulting in a topography of model comparisons.

      (a) Attention contrasts

      As expected, based on decades of EEG research, and as can be seen in figure 3C, average EEG alpha power changed as a function of attentional focus, in a topographically specific manner. Importantly, the observed increase of alpha power from auditory to visual attention took place over and above the reported changes in EEG spectral exponents (as we had reported in the control analyses section). In other words, both EEG spectral exponents and EEG alpha power capture attention-related changes in brain dynamics, but are at least partially sensitive to distinct sources or mechanisms. In the updated version of the manuscript, we emphasize that changes in spectral exponents often can be mistaken for changes in alpha power (as in Donoghue et al., 2020), calling for a dedicated spectral parameterization approach. Attention-related changes in spectral exponents and alpha power might depict results of distinct modes of thalamic activity that transitions from tonic to bursty firing and shapes cortical activity to selectively process attended sensory input. In the updated version of the manuscript, we discuss the potential role of thalamic activity in greater detail. The updated parts of the discussion section are pasted below for convenience.

      “Despite these differences in the sensitivity of EEG signals, our results provide clear evidence for a modality-specific flattening of EEG spectra through the selective allocation of attentional resources. This attention allocation likely surfaces as subtle changes in E:I balance (Borgers et al., 2005; Harris and Thiele, 2011). Importantly, these results cannot be explained by observed attention-dependent differences in neural alpha power (8–12 Hz, Fig 3) which have been suggested to capture cortical inhibition or idling states (Cooper et al., 2003; Pfurtscheller et al., 1996). Also note that the employed spectral parameterization approach enabled to us to separate 1/f like signals from oscillatory activity and hence offered distinct estimates of spectral exponent and alpha power that would otherwise have been conflated (Donoghue et al., 2020).

      How could attentional goals come to shape spectral exponents and alpha oscillations? Both attention-related changes in EEG activity might trace back to distinct functions of thalamo-cortical circuits. On the one hand, bursts of thalamic activity that project towards sensory cortical areas might sculpt cortical excitability in an attention-dependent manner by inhibiting irrelevant distracting information (Klimesch et al., 2007; Saalmann and Kastner, 2011). On the other hand, tonic thalamic activity likely drives cortical desynchronization via glutamatergic projections and, with attentional focus, results in boosted representations of stimulus information within brain signals (Cohen and Maunsell, 2011; Harris and Thiele, 2011; Sherman, 2001).

      Our findings of separate attentional modulations of both, EEG spectral exponents and alpha power, point towards the involvement of both thalamic modes in the realization of attentional states. Recently, momentary trade-offs between both modes of thalamic activity have been suggested to give way to attention-related modulations of alpha power and E:I balance, as captured by EEG spectral exponents (Kosciessa et al., 2021). Here, task difficulty remained constant throughout the experiment an fluctuations between both modes might not follow momentary demand (Kosciessa et al., 2021; Pettine et al., 2021) but varying sensory-cognitive resources.

      Additionally, modulations of both alpha power and EEG spectral exponents appeared uncorrelated across individuals - further evidence that they reflect separate neural sources. Future studies that combine a systemic manipulation of E:I (e.g., through GABAergic agonists) with the investigation of attentional load in humans are needed to specify with greater detail how thalamic activity modes drive alpha oscillations and EEG spectral exponents. Specifying potential demand- and resource-dependent trade-offs between different modes of attention-related modulations of cortical activity and sensory processing will offer crucial insights into the neural basis of adaptive behaviour.”

      (b) Stimulus spectral exponent tracking

      We inverted all models and instead of modelling EEG spectral exponents, we used auditory or visual stimulus exponents as dependent variables. Predictors were identical to the previously reported models (see supplementary table for all details) but additionally included either single trial estimates of alpha power, low-frequency power, or EEG spectral exponents. Note that alpha power estimates were extracted using the same spectral parameterization approach that was used to estimate spectral exponents. Trials without an oscillation in the alpha range were excluded from all models to render likelihood comparisons interpretable (11.2%  3.4 %). Since oscillations were only seldomly detected in the low-frequency range (1–5 Hz), we instead used single trial power averaged across this range. For each electrode, 4 likelihood ratio tests were performed, one for each stimulus modality and one for each predictor (low-frequency or alpha power). Strikingly, low-frequency power resulted in worse model fits (non-positive likelihood ratio test statistics) compared to EEG spectral exponents across all electrodes and both stimulus modalities. The same was true for EEG alpha power when modelling auditory stimulus exponents. However, when modelling visual stimulus exponents, EEG alpha power displayed significantly improved model fit at one parietal electrode. In line with this observation, we observed a positive relationship between single trial alpha power and visual stimulus exponents at this parietal site (see below).

      Figure R5 Model comparison topographies. (a) Single trial auditory (upper row) or visual stimulus exponents (lower row) were modelled based on electrode wise low frequency power (left column) or alpha power (right) column, among other covariates. Models were compare d to a model of same size that only differed in the main predictor that consisted of single trial EEG spectral exponents. Topographies display the likelihood ratio test statistic, illustrating no improvements in model fit compared to EEG spectral exponent based models in all but one model family, illustrating the unique predictive power of aperiodic EEG activity in this context. Alpha power at one parietal electrode explained significantly more variance in visual stimulus exponents. (b) T values representi ng the main effect of alpha power on visual stimulus exponents. Highlighted electrode represents p< .05 after FDR correction.

      (c) Behavioural relevance of spectral exponent tracking

      Given the results from (b), we refrained from re-running PLS analysis focussing on the behavioural relevance of the links between low-frequency and alpha power with stimulus exponents. In our view, the absence of a significant link between single trial stimulus input and a measure of neural activity in this case precludes any further analysis on the between-subject level.

      Reviewer #2 (Public Review):

      The paper investigates two separate studies looking at the spectral exponent of the EEG 1/f-like spectrum: one a study of the effect of anesthesia type (propofol vs. ketamine), using publicly available data, and the other a traditional study of auditory and visual processing relying on selective attention to one modality vs. the other. The authors make a strong case that the value of the spectral exponent depends on the relevant condition, in both studies, but the case for the spectral exponent's dependence on the Excitation:Inhibition balance is much weaker.

      The paper presents the two separate studies as tightly linked, but by the end of the paper it appears they may be quite separate.

      The anesthesia study is brief and compelling. With respect to the effect of anesthesia type on spectral exponent, the results are very strong, and, given the results of Gao et al. (2017) and the stated properties of propofol vs. ketamine, the connection to E:I balance follows naturally.

      The auditory and spectral 1/f tracking study suffers from some weaknesses.

      Most importantly, the design is elegant and the results presented are very compelling. 1) Modality-specific attention selectively reduces the EEG spectral exponent (for relevant electrodes reflecting cortical processing of that modality); 2) Changing the value of the spectral exponent in the stimulus results in a similar change in the value of the spectral exponent of the response, but only for the selectively attended modality (and only for relevant electrodes); and 3) the amount of modality-specific spectral-exponent tracking predicts behavior. The interactions and main effects found all support the importance of the spectral exponent as a physiologically and behaviorally important index.

      The main problem is a weakness in analysis regarding whether the mechanistic origin of the above effects may be due to temporal tracking of the stimulus waveform (visual contrast/acoustic envelope) by the response waveform. [In the speech literature this would be referred to as "speech tracking", or, sometimes, as speech entrainment (in the weak sense of "entrainment").] As pointed out by the authors, this is not a steady state response because the instantaneous fluctuation rate of the stimulus is constantly changing, and so cannot be analyzed as such (it is also distinct from the evoked responses analyzed). But it is a good match for other analysis methods, for instance Ed Lalor's VESPA and AESPA methods, and their reverse-correlation descendants. Specifically, Lalor et al., 2009 analyzed EEG responses to a non-sinusoidal envelope modulation of a broadband noise carrier and found strong evidence for robust temporal locking. The success of such linear methods there (AESPA for auditory; VESPA for visual) implies that a change in the stimulus spectrum exponent would produce a similar change in the response spectrum exponent, having nothing to do with E:I balance.

      The evoked response analysis clearly aims to go in this direction, but since it does not reflect ongoing response properties, it cannot alone speak to this.

      Because this plausible mechanism for the spectral-exponent-tracking has not been explored, it is much harder to associate the observed spectral-exponent-tracking as originating from E:I balance. The study does not then hold together well with the anesthesia study, and weakens the links to E:I balance rather than strengthening it.

      Thank you for this in-depth assessment of our work and your general positive appraisal of it. Importantly, your major point of concern seems to at least partially trace back to a regrettable misunderstanding caused by the way we presented our results in the original version of the manuscript. While the first study aimed at establishing the validity of the EEG spectral exponent as a non-invasive marker of E:I, the second study had two objectives. First, to test attention-related changes in EEG spectral exponents that we assume to depict topographically specific changes in E:I. Second, to test the link between aperiodic stimulus features and aperiodic EEG activity by comparing stimulus spectral exponents and EEG spectral exponents. We understand that the reviewer is doubtful of the link between stimulus-related EEG spectral exponent changes and E:I – and so are we.

      In the updated version of the manuscript, we have tried to make it very clear that despite the displayed and inferred links between EEG spectral exponents and E:I balance, the positive relationship between stimulus spectral exponents and EEG spectral exponents does not necessarily reflect changes in E:I. Nevertheless, we feel that study 1 and 2 integrate well as they offer a comprehensive view on 1/f-like EEG activity and its sensitivity to (1) specific anaesthesia effects, (2) attentional focus, and (3) aperiodic stimulus features in a behaviourally-relevant way. While (1) and (2) can be mapped on to one underlying mechanism, cortical E:I balance, (3) rather represents bottom-up sensory cortical effects similar to those described in SSEP or speech tracking literature. The interaction of attentional focus and stimulus tracking illustrates the connection between top-down (or anaesthesia-driven) changes in E:I as captured by the EEG spectral exponent, and bottom-up sensory-related changes in EEG activity.

      Reviewer #3 (Public Review):

      The balance between excitation and inhibition in the cortex is an interesting topic, and it has already been a focus of study for a while. The current manuscript focuses on the 1/f slope of the EEG spectra as the neural substrate of the change in the balance between excitation and inhibition. While the approach they use to analyze their data is interesting, unfortunately, for the reasons I'll outline below the study's conclusions are not supported by the data, and the findings do not add any new insight conceptually or mechanistically to our understanding of attention, excitation or inhibition. While the study aims to "test the conjecture that 1/f-like EEG activity captures changes in the E:I balance of underlying neural populations.", ultimately the central conclusions of the work is just conjecture in that they are inference formed without sufficient evidence.

      Anaesthesia study: EEG spectral exponents as a non-invasive approximation of E:I balance The authors observe the 1/f slope was different over pre-selected central electrodes sites between 4 participants undergoing ketamine and propofol anaesthesia. The rather small sample size is a cause for concern, as are the authors' rationale for looking at the central electrodes -they claim these electrodes receive contributions from many cortical and subcortical sources, but that can be said of any other electrodes at the scalp. But I believe the most critical weakness here is the authors' claim that during anaesthesia is that propofol is "known" to result in a "net" increase of inhibition, while ketamine an increase in net excitation. We still know very little about what neurophysiologically is happening under anaesthesia and the concept of "net" inhibition and excitation is rather a gross simplification of what happens to the central nervous system under these two agents. Just as an example, propofol has been found to have some excitatory influence on brain function, with dosage of the anaesthetic also playing role: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2717965/. On the other hand, ketamine has been observed to inhibit interneurons and cortical stimulus-locked responses, but cause excitation in the auditory cortex : https://physoc.onlinelibrary.wiley.com/doi/10.1113/JP279705.

      Suffice to say the interaction between anaesthetic agents and the brain is rather complex. Decades of research has shown that the EEG spectra changes during anaesthesia. To rather arbitrarily say one agent has a net inhibitory impact while another excitatory impact, then link those to qualitative changes in the EEG spectra of 4 participants, and further link that back to E:I ratio is committing the scientific fallacy of Begging the Claim.

      We thank the reviewer for their insightful comments. Of course, we do not wish to challenge the complex nature of anaesthetic effects by any means and apologize if the original version of our manuscript had left that impression. Below, we outline that despite the complex impact of anaesthesia on central nervous activity, there exists plenty of evidence justifying our assumption of differentially altered E:I balance through propofol and ketamine, at least in cortical areas.

      First of all, we agree with the reviewer that a change in E:I balance certainly is not the only change that takes place in the central nervous system during anaesthesia. As has been shown before, propofol and ketamine affect the overall level of neural activity (Taub et al., 2013) and spiking (Quirk et al., 2209; Kajiwara et al., 2020), propofol is associated with frontal alpha oscillations and widespread changes in beta power (Purdon et al., 2012). In the updated version of the manuscript, we have added notions to these common patterns and discuss the oscillatory changes we observe in the current dataset.

      Importantly, while there might not be a single identifiable mechanism behind the host of different anaesthesia-induced changes in brain activity, there is relative clarity on the fact that higher doses of propofol drive a change in excitatory and inhibitory activity towards inhibition whereas ketamine drives disinhibition and hence shifts E:I towards excitation. In fact, the study by Deane et al. (2020) reports increased excitation and disinhibition in auditory cortex during ketamine anaesthesia, accompanied by stronger (not weaker, as stated by the reviewer) evoked responses. These findings speak to the validity of the simplification of a net increase of excitation under ketamine anaesthesia. Furthermore, the modelling results by McCarthy et al. (2008) target a dose- and cell-ensemble specific effect of propofol anaesthesia: paradoxical excitation. The observation that low doses of propofol can induce a temporary increase of excitatory activity is in stark contrast to the general GABA-A-potentiating and hence inhibiting nature of propofol (Concas et al., 1991). Importantly, however, higher doses of propofol as used in the analysed dataset are widely accepted to lead to relatively increased inhibition, even after initial paradoxical excitation (Concas et al., 1991; Zhang et al., 2009; Brown et al., 2011; Ching et al., 2010). Taken together, previous invasive physiology justifies the simplification of propofol as leading to net increased inhibition and ketamine leading to net excitation. Finally, our focus on the spectral exponent does not stem from a disregard of oscillatory changes in EEG activity but rather strictly follows from previous work that demonstrated the spectral exponent as a marker of E:I balance (Gao et al., 2017; Colombo et al., 2020; Lendner et al., 2021; Chini et al., 2021). Hence, the central goal of the presented analyses and results lies in the transfer of these previous results to non-invasive EEG recordings and the parameterization approach used by us. We hope that this becomes clearer in the updated version of the manuscript and have pasted relevant parts below.

      “Both anaesthetics exert widespread effects on the overall level of neural activity (Taub et al., 2013) as well as on oscillatory activity in the range of alpha and beta (8–12 Hz; ~15–30 Hz). Importantly, however, propofol is known to commonly result in a net increase of inhibition (Concas et al., 1991; Franks, 2008) whereas ketamine results in a relative increase of excitation (Deane et al., 2020; Miller et al., 2016). In accordance with invasive work and single cell modelling (Chini et al., 2021; Gao et al., 2017), propofol anaesthesia should thus lead to an increase in the spectral exponent (steepening of the spectrum) and ketamine anaesthesia to a decrease (flattening). Based on previous results, the effect of anaesthesia on EEG spectral exponents is expected to be highly consistent and display little topographical variation (Lendner et al., 2020). For simplicity, we focused on a set of 5 central electrodes that receive contributions from many cortical and subcortical sources (see Fig 1) but report topographically-resolved effects in the supplements (see Fig 1 supplement 1). Here, propofol anaesthesia led to an overall increase in EEG power which was especially pronounced in the alpha-beta range. Ketamine anaesthesia decreased the frequency of alpha oscillations and supressed power in the beta range. Importantly, however, EEG spectral exponents that were estimated while accounting for changes in oscillatory activity increased under propofol and decreased under ketamine anaesthesia in all participants (both ppermuted < .0009, Fig 1). These results replicate previous invasive findings and support the validity of EEG spectral exponents as markers of overall E:I balance in humans.”

      “[…] While the EEG spectral exponent as a remote, summary measure of brain electric activity can obviously not quantify local E:I in a given neural population, the non-invasive approximation demonstrated here enables inferences on global neural processes previously only accessible in animals and using invasive methods. Future studies should use a larger sample to directly compare dose-response relationships between GABA-A agonists or antagonists (e.g., Flumanezil) and the EEG spectral exponent as well as common oscillatory changes.”

      Regarding the reviewer’s comment on our choice of electrodes we first wish to highlight that several previous studies have revealed that anaesthesia effects commonly appear throughout the cortex of humans (Zhang et al., 2009; Lendner et al., 2020). Nevertheless, we understand that a priori choices of electrodes always are arbitrary to some degree. Hence, we performed pairwise comparisons of EEG spectral exponents between awake rest and anaesthesia (ketamine vs. propofol) at all 60 electrodes, resulting in the topographies of t-values shown below. As can be discerned from these topographies, ketamine anaesthesia entailed a reduction of spectral exponents across most areas of the scalp, peaking at frontal and central sites. Propofol led to increased EEG spectral components across all electrodes without a clear spatial pattern. The absence of an effect at the left mastoid likely traces back to artefactual recordings at that electrode site. In the updated version of the manuscript, we report topographies of comparisons in the supplements (figure 1 supplement 2).

      Figure R8 Topographically resolved t statistics comparing EEG spectral exponents between awake rest and different anaesthetics. Propofol leads to a wide spread increase in spectral exponents that is present across the entire scalp (left). Ketamine leads to a reduction in spectral exponents that is widely distributed but appears to peak at frontal and central electrodes (right).

      We acknowledge the small sample size of study 1 and have also added a more explicit notion to that in the updated version of our manuscript. Nevertheless, due to their consistency and the used permutation-based statistics which are appropriate for small sample sizes, the results of study can be interpreted. Furthermore, we realized that we had not included two additional participants of the publicly available dataset in our previous analysis. Both sets of recordings (ketamine / propofol) were included in the revised analyses of the data, further strengthening the reported results. Hence, despite the small sample size (now N = 5 per group), we believe that the used methods and the consistency of effects allows for a careful but clear interpretation, especially since they are in close agreement with previous invasive and modelling results as well as recent causal manipulation studies (Gao et al., 2017; Chini et al., 2021).

      Cross-modal study: EEG spectral exponents track modality-specific, attention-induced changes in E:I Here the authors observe a difference in 1/f slope depending on if the participants (n=24) were paying attention to the auditory or visual stream. My central issue here is again with the authors' assumptions: cross-modal attention reflects attention-induced E/I. While attention to a single sensory modality can result in decreased activity in cortical regions that process information from an unattended sensory modality, there is no basis here to say that the task-irrelevant region is actually inhibited. The authors here do observe differences in 1/f slope as a function of attentional location, and these differences do account for some of the variances in behavior in the task.

      But unfortunately other than a purely descriptive exercise, there is not any sort of mechanistic insight is revealed here with regards to attentional allocation, excitation, and inhibition.

      We wish to take this opportunity to briefly elaborate on our hypotheses behind the reported attention contrasts and their interpretation. Spectral exponents of invasively recorded neural field potentials have previously been shown to reflect pronounced changes in E:I balance, including recent causal optogenetic work explicitly testing this link (Gao et al., 2017; Chini, Pfeffer & Haganu-Opatz 2021). In a first step, we analysed data from different anaesthetics to establish the potency of non-invasive EEG recordings to track similar changes (see above). Building on these findings, we tested whether smaller, attention-related and topographically-specific changes in E:I balance can equally be observed by means of EEG spectral exponent changes. Importantly, topographically concise changes in E:I with attention have been reported previously in non-human animals (e.g., Kanashiro et al., 2017; Ni et al., 2018). We found an attention-related topographical pattern of EEG spectral exponents in support of such an idea: spectral exponents at occipital channels decreased during visual attention, pointing towards a relative increase of excitatory activity in visual cortical areas. The same effect was reduced at central electrodes and for auditory attention. These findings demonstrate the potency EEG spectral exponents to detect topographically-specific attention-related changes in brain activity that likely trace back to changes in E:I balance. Of note, we do not imply a role of E:I in the inhibition of unattended sensory input and activity in associated cortical areas but rather point to a potentially separate role of neural alpha power in this context. While it is generally difficult to draw strictly mechanistic insights based on correlational designs, our results at least strongly suggest a mechanistic role of modality-specific attention for EEG dynamics and E:I balance. Furthermore, by demonstrating separate effects of aperiodic activity and alpha power dynamics, we pave the way for a new line of studies (see comments by R1) on the neural dynamics of selective attention and their behavioural relevance in humans.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have used many cleverly chosen mouse models (periodontitis models; various models that lead to an on-switch of genes) and methods (immune localizations of high quality; single cell RNA sequencing) for the quest of elucidating a role for telocytes. They describe that more telocytes are present around teeth in mice that had periodontitis. These cells proliferated, and they expressed a pattern of genes that allowed macrophages to differentiate into a different direction. In particular, they showed that telocytes in periodontitis express HGF, a molecule that steers macrophage differentiation towards a less inflammatory cell type, paving the way for recovery. As a weakness, one could state that an attempt to extrapolate to human cells is missing.

      In the Discussion, we have a sentence that states further investigation in human periodontitis is required (see page 20, paragraph 416).

      Reviewer #3 (Public Review):

      Zhao and Sharpe identified telocytes in the periodontium. To address their contribution to periodontal diseases, they conducted scRNA-seq analysis and lineage tracing in mice. They demonstrated that telocytes are activated in periodontitis. The activated telocytes send HGF signals to surrounding macrophages, converting M2 to M1/M2 hybrid status. The study implies that targeting telocytes and HGF signal for the potential treatment of periodontitis.

      The significance of the study could be improved by authors testing if targeting telocytes or HGF signals could ameliorate periodontitis in the mouse model. The current form of the manuscript lacks the data that demonstrate the actual contribution of telocytes in the homeostasis of periodontium or progression of periodontitis.

      Major comments:

      1) I see the genetic validation of the role of telocytes or HGF signals are crucial to assure the significance of this manuscript. I recommend either of two experiments. a. testing the role of HGF signals by deleting the Hgf gene in telocytes. Using Wnt11-Cre; Hgf f/f mice, the authors could address the role of HGF signals in periodontitis. CX3CR1-Cre; cMet f/f mice will delete HGF signals in monocyte-derived macrophages. This will be another verification, but not sure if the PDL macrophages are derived from yolk sac or monocytes. b. measuring the contribution of telocytes in the homeostasis or disease progression. The mouse model could be challenging though, the system if achieved will be very informative. The authors could first check the expression of telocyte enriched genes, such as Lgr5 or Foxl1 reported previously in other tissue telocytes. Delete those genes under the Wnt1-Cre driver and check if telocyte lineage is removed. The system would be very useful for next-level study. DTA model could be an alternative, but Wnt1-Cre is vastly expressed in neural crest lineage.

      These are good suggestions but unfortunately not feasible as we do not have all the mouse lines (e.g., Hgf f/f mice). Lgr5 and Foxl1 are used in intestine but is not suitable for PDL tissue. CD34;DTA show CD34+ cells, however, we encountered challenges associated with induced genetic heterogeneity when using this model, preventing us from making concrete conclusions from the experiments using the CD34;DTA model. Lgf5/Foxl1 are either not expressed or overlap with CD34 in and therefore do not seem suitable for us to pursue.

      2) This paper points out that the M1/M2 hybrid state of macrophages appears upon periodontitis. The authors could further characterize the hybrid macrophages by the expression of more markers, production of cytokines, and morphology. Need to clarify if this means some macrophages are in M1 state and others are in M2 state, or one macrophage possesses both M1 and M2 phenotype. Please conduct either FACS or immunofluorescence to demonstrate if one macrophage expresses both markers. Please introduce more information about the M1/M2 hybrid state of macrophage based on other present literature.

      Unlike our single cell sequencing data, we were unsuccessful in determining if one macrophage possesses both M1 and M2 phenotype by immunolabelling.

      3) In the introduction part, the author lists several markers that can be used for telocyte identification, such as CD34+CD31-, CD34+c-Kit+, CD34+Vim+, CD34+PDGFRα+. Could authors explain why they chose CD34 CD31, but not other markers?

      As shown in the cluster images below, the other markers do not overlap very well with CD34 cells or in the case of Vim, expressed more ubiquitously. We generated a new supplementary figure (Supp Fig2) and explained this in the text (page 12, lines 235-238).

      4) In figure 5g, I don't think the yellow color cell shows the reduction trend in the Tivantinib treatment group compared with a control group. Please validate the observation by gene expression analysis, WB, etc. In addition, please show c-Met+ cells level in the Tivantinib treatment group and control group.

      New Supp Fig4 is included to show Met expression in homeostasis and periodontitis.

    1. Author Response

      Reviewer #1 (Public Review):

      The tools and approaches in this manuscript are of broad interest, not only to protein engineers but also to the many researchers using genome-editing reagents. However, putting the work in the context of previous research, both through changing the writing and additional experiments, will be critical for taking advantage of that widespread applicability.

      Strengths:

      Overall, the data support the conclusions of the manuscript.

      The most exciting product of this work is an engineered nuclease, Nsp2-SmuCas9, that has high activity and specificity in human cells and a relaxed PAM preference for a single C base. This chimeric enzyme can efficiently induce indels at endogenous sites. While other works have presented nucleases with minimal PAM preferences, Nsp2-SmuCas9 is a useful alternative and may be preferred. It is also more compact than the standard SpCas9, making it appealing for gene therapy applications.

      Technologically, the presented approach of screening orthologs for new specificities and making chimeras to achieve further diversity is a good way to develop new genome-editing reagents. The authors used appropriate methods, such as GUIDE-seq, to complete their goals. Extending beyond the GFP-activation assay to determine activity at endogenous targets enhanced the value of the results.

      Conceptually, it was important information to the field that proteins with very high sequence identity (93%) can have divergent PAM preferences. Through their engineering, the authors clearly demonstrate the advantage of characterizing such close orthologs with diverse amino acids in the area of PAM recognition.

      Weaknesses:

      1) An overall weakness with the work is that it is not clear how the activity level of the relaxed PAM enzyme, Nsp2-SmuCas9, compares to existing enzymes. Is it much better than the SpCas9 that has almost no PAM preference (SpRY) or the NGN PAM (SpG)? How does it compare to the most commonly used SpCas9 nuclease, which is known to be active in a wide variety of biological contexts? The activity assessment at endogenous sites seemed to have a long timeline, as the indel rate was measured 5 days after transfection. Clarifying the effectiveness of this new nuclease would increase the impact of this work.

      We sincerely thank the reviewer for the constructive comments on our manuscript. Following reviewer’s suggestions, we compared the editing efficiency of Nsp2Cas9, Nsp2-SmuCas9, SpCas9, SpCas9-NG, and SpCas9-RY side-by-side. Overall, the editing efficiency was low this time probably due to low transfection efficiency. The results revealed that SpCas9 was the most active enzyme. Nsp2Cas9, SpCas9-NG, and SpCas9-RY displayed similar activity. Nsp2-SmuCas9 displayed lower activities than other Cas9 variants (Figure 5C).

      2) In the presentation of the manuscript, there are several weaknesses. First, while it is true that allele-specific disruption is an important application of new CRISPR proteins, there are many other reasons why they would be useful. The specific focus on this single application throughout the abstract, introduction and discussion takes away from the widespread utility of these new tools. The writing would be more compelling if it targeted a broader audience. Allele-specific targeting is also possible beyond the PAM site if the mutation is in a position with high specificity.

      Many thanks for the reviewer’s suggestions. Following reviewer’s suggestions, we emphasize the widespread utility of these new tools throughout the abstract, introduction, and discussion in the revised manuscript. Allele-specific targeting is only mentioned in the discussion.

      3) Second, the introduction is further missing a discussion of other research engineering new PAM specificity or even completely removing specificity. A more convincing narrative would include reasoning for why characterizing naturally occurring orthologs is a powerful and important approach. This information is in the discussion, but it would be helpful for the reader if these points were in the introduction.

      Many thanks for the reviewer’s comments. Following reviewer’s suggestions, we added other research engineering new PAM specificity in the introduction. We also included reasoning for why characterizing naturally occurring orthologs is a powerful and important approach.

      “Engineered Cas9 variants with flexible PAMs can increase targeting scope. For example, SaCas9 was engineered to accept an NNNRRT PAM [1]; SpCas9 was engineered to accept almost all PAMs [2], but this strategy is time-consuming, and often comes at a cost of reduced on-target activity. Another strategy is to harness natural Cas9 nucleases for genome editing. We have developed several closely related Cas9 orthologs for genome editing [3, 4]. The advantage of developing tools from closely related Cas9 orthologs is that they can exchange the PAM-interacting (PI) domain. If an ortholog recognizes a particular PAM but does not work efficiently in human cells, we can use this ortholog PI to replace another ortholog PI to generate a chimeric Cas9.”

      4) A second concern with the presentation and analysis of the findings is a minimal connection to the structural context of the discoveries. Many readers will likely be interested in how the specificity shifts are occurring in these orthologs, which could be remedied by supplementary figures of homology models.

      We totally agree with the referee that structural models would help readers better understand the specificity shifts occurring in these orthologs. We have generated calculated structural models of these orthologs in complex with sgRNA and DNA using the crystal structure of Nme1Cas9 (PDB ID: 6JDV). Some specificity shifts can be well explained by these structural models. When the amino acid near the 5 position of the PAM is histidine, its side chain forms a potential hydrogen bond with the 6-hydroxyl group of guanine. Replacement of this guanine by cytosine or thymine would cause a major clash, whereas adenine lacks the hydroxyl group to form hydrogen bond with the histidine (Figure 2-figure supplement 2A). Likewise, an aspartate at 5 position of the PAM would favor a specific recognition of cytosine via hydrogen bonding with its 4-amine group, but not of other bases that may either result in major clash or abolish the hydrogen bond (Figure 2-figure supplement 2B). Similar explanation applies also to the apparent specificity between glutamine and adenine at the 8 position of the PAM on the target sequence (Figure 2-figure supplement 2C).

      5) Along the same lines, further structural analysis of the failures would be helpful for those embarking on similar projects. Are there any differences in the sequence or structure of the 4/29 orthologs that were not functional in the GFP-activation assay compared to those that were?

      Sequence alignment indicates that the four inactive orthologs possess intact active sites. In the predicted structural models of these orthologs, we did not observe local conformational variations that preclude the interaction with sgRNA or DNA. Sequence alignment indicates that the four inactive orthologs possess intact active sites. In the predicted structural models of these orthologs, we did not observe local conformational variations that preclude the interaction with sgRNA or DNA. We speculate that specific modifications of Cas9s in mammalian cells may occur, leading to the loss of enzymatic activities of the 4 orthologs.

      Calculated structural models of AseCas9, Hpa1Cas9, MspCas9, and PlaCas9. Overall calculated structures of AseCas9, Hpa1Cas9, MspCas9, and PlaCas9 with sgRNA and dsDNA.

      6) Similarly, it was surprising that the Nsp2-NarCas9 chimera was not active, and it would be helpful if the authors could speculate based on the differences between SmuCas9 and NarCas9, such as at the interface of the domains that were fused. Structural models of the fusions would help the reader to visualize the strategy. Exploring the failures and challenges is important for understanding the generalizability of the presented approach.

      Following reviewer’s comments, we generated structural models of Nsp2-NarCas9, Nsp2-SmuCas9, and NarCas9 using the crystal structure of highly homologous Nme1Cas9 in complex with sgRNA and dsDNA (PDB ID: 6JDV) as the template by SWISS-MODEL. By superimposing these models, we noticed that residues G1035, K1037 and T1038 of Nsp2-NarCas9 chimera protrude towards the DNA molecule, which would prevent the binding with DNA and thereby abolishing the editing activity (Figure 4-figure supplement 2A). In comparison, Nsp2-SmuCas9 and NarCas9, which possess the Cas activity, show no protrusion at the corresponding position (Figure 4-figure supplement 2B-C).

      7) Finally, the final sequence of Nsp2-SmuCas9 fusion, as well as other enzymes such as the failed Nsp2-NarCas9, are not obvious in the manuscript. I may have missed them, but I also did not see the primers used in the Methods section. Addgene submission is also encouraged and would be of great value to the scientific community.

      Thank you for your suggestions. The final sequence of Nsp2-SmuCas9, as well as other enzymes, have been provided in Supplemental file 1. The primers for chimera proteins were listed in Supplemental file 1. We will submit plasmids to Addgene soon.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Li et al characterize sex differences in the impact of macrophage RELMa in protection against diet-induced obesity [DIO]. This is a key area of interest as obesity studies in mice have generally focused exclusively on male animals, as they tend to gain more weight, faster than female mice. The authors use a combination of flow cytometry, adoptive transfer, and single-cell transcriptomics to characterize the mechanism of action for female-specific DIO protection. They identify a potential role for eosinophils in mediating female DIO protection downstream of RELMa production by macrophage. They also use the transcriptomic characterization of the stromal vascular fraction of the adipose tissue to evaluate molecular and cellular drivers of this sex-specific DIO protection.

      Although the authors provide solid evidence for many claims in the manuscript, there is generally not enough information about the studies' methods (especially on the computational/data analysis aspects) for a careful evaluation of the result's robustness at this stage.

      We have significantly expanded the methodology, especially of the scRNAseq, and deposited the script and raw data in public repositories. We also validated our methods and can confirm that the analysis presented is robust. This resubmission contains new Fig 7 and new supplementary material with this methodology and validation.

      Reviewer #2 (Public Review):

      In the study by Li et al., the authors hypothesize that RELMa, a macrophage-derived protein, plays a sex-dimorphic role as a protective factor in obesity in females vs males. The authors perform largely in vivo studies utilizing male and female WT and RELMa KO mice on a high-fat diet and perform an in-depth analysis of immune cell composition, gene expression, and single-cell RNA Sequencing. The authors find that WT females are protected from obesity and inflammation vs males, and this protection is lost in female RELMa KO mice. Further analysis by the authors including flow cytometry of the visceral fat SVF in female WT mice showed reduced macrophage infiltration, higher levels of eosinophils, and Th2 cytokine expression compared to WT male mice and female KO mice. The authors show that protection from obesity and inflammation in female RELMa KO mice can be rescued with an injection of eosinophils and recombinant RELMa. Lastly, the authors use single-cell RNA-Sequencing to further analyze SVF cells in WT and KO male and female mice on a high-fat diet.

      Overall, we find that the study represents an important finding in the immunometabolism field showing that RELMa is a key myeloid-derived factor that helps influence the macrophage-eosinophil function in female mice and protects from diet-induced obesity and inflammation in a sexually dimorphic manner. Overall, the study provides strong and convincing data supporting the authors' hypothesis and conclusion.

      We thank the reviewer for their positive review of our manuscript and their helpful feedback which we address below.

      Reviewer #3 (Public Review):

      Li, Ruggiero-Ruff et al. examine the role of RELMα, an anti-inflammatory macrophage signature gene, in mediating sex differences in high-fat diet (HFD)-induced obesity in young mice. Specifically, the authors hypothesize that RELMα protects females against HFD-induced obesity. Comparisons between RELMα-knockout (KO) and wildtype (WT) mice of both sexes revealed sex- and RELMα-specific differences in weight gain, immune cell populations, and inflammatory signaling in response to HFD. RELMα-deficiency in females led to increased weight gain, expansion of pro-inflammatory macrophage populations, and eosinophil loss in response to HFD. Female RELMα-deficiency could be rescued by RELMα treatment or eosinophil transfer. Single-cell RNA-sequencing (scRNA-seq) of adipose stromal vascular fraction (SVF) revealed sex- and RELMα-dependent differences under HFD conditions and identified potential "pro-obesity" and "anti-obesity" genes in a cell-type-specific manner. Using trajectory analysis, the authors suggest dysregulation of macrophage-to-monocyte transition in RELMα-deficient mice.

      The conclusions of this paper are mostly well supported by the data, but some aspects of the statistical and single-cell analyses will need to be corrected, clarified, and extended to enhance the report.

      We thank Dr. Ocanas for their positive comments and for the helpful feedback to improve our study. We have addressed all the comments and significantly revised the manuscript.

      Strengths:

      The authors use several orthogonal approaches (i.e., flow cytometry, immunohistochemistry, scRNA-Seq) and models to support their hypotheses.

      The authors demonstrate that phenotypes observed in HFD-fed females with RELMα-deficiency (i.e., weight gain, loss of eosinophils, a gain of M1 macrophages) can be rescued by RELMα treatment or eosinophil transfer.

      The authors recognized the complexity of macrophage activation that is beyond the 'M1/M2' paradigm and informed readers in the introduction as to why this paradigm was used in this study. During the scRNA-seq analyses, the authors further sub-cluster macrophages to include more granularity.

      Weaknesses:

      1) There are several instances in the text where the authors claim that there is a significant difference between the two groups, but the statistics for these comparisons are not shown in the figure.

      Because we are dealing with three variables: genotype, diet and sex, and many differences, we thought it too complicated to add all the significant differences on the graph, but sometimes just mentioned these in the text with a p value, or didn’t mention at all if the difference was obvious, or not meaningful (for example, we weren’t interested in comparing a WT male on a Ctr diet with a RELMalpha KO female on a HFD for the purpose of our hypothesis). We have now ensured clarity in the text and in the figures, and addressed the specific point-by-point comments from the reviewer. We have also now carefully re-evaluated the text to ensure that any significant differences we discuss are shown in the figure.

      2) It is unfortunate that eosinophils could not be identified in the single-cell analysis since this population of cells was shown to be important in rescuing the RELMα-deficiency in HFD-fed females. The authors should note in the discussion how future scRNA-Seq experiments could overcome this limitation (i.e., enriching immune cells prior to scRNA-Seq).

      We were indeed disappointed that we were not able to obtain eosinophil single cell seq, but realize that this is a reported issue in the field. We have expanded our discussion of this and cited a paper that performs eosinophil single cell sequencing (published at the time our manuscript was being submitted): ““At the same time as our ongoing analysis, the first publication of eosinophil single cell RNA-seq was published, using a flow cytometry based approach rather than 10x, including RNAse inhibitor in the sorting buffer, and performing prior eosinophil enrichment (PMID: 36509106). Based on guidance from 10x, we employed targeted approaches to identify eosinophil clusters according to eosinophil markers (e.g. Siglecf, Prg2, Ccr3, Il5r), and relaxed the scRNA-Seq cutoff analysis to include more cells and intronic content, but still could not find eosinophils. We conclude that eosinophils may be absent due to the enzyme digestion required for SVF isolation and processing for single cell sequencing, which could lead to specific eosinophil population loss due to low RNA content, RNases or cell viability issues. Future experiments would be needed to optimize eosinophil single cell sequencing, based on the recent publication of eosinophil single cell sequencing.”

      3a) There are several issues with the scRNA-Seq analysis and interpretation. More details on the steps taken in the single-cell analyses should be included in the methods section.

      We agree with the reviewer that more details on steps taken in the single cell data processing and bioinformatics needs to be included in the methods section. We included more information and separated sections within the data processing section in the Materials and Methods on the methodology used for these approaches, as well as provided a code for our data processing in a public Github repository: https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils.

      b) With regards to the 'pseudobulk' analyses presented in Figs. 5-6, several of the differentially expressed genes identified in Fig. 6 are hemoglobin genes (i.e., Hba, Hbb genes). It is not uncommon to filter these genes out of single-cell analysis since their presence usually indicates red blood cell (RBC) contamination (PMID: 31942070, PMID: 35672358). We would recommend assessing RBC contamination as well as removing Fig. 6 from the manuscript and focusing on cell-type-specific analyses. Re-analysis will likely have an impact on the overall conclusions of the study.

      Prior to our first submission, we consulted with 10x support scientists and the UCR bioinformatics core director to ensure that our analysis included the appropriate filtering. We have now added details in the Methods. The PMIDs provided above are from studies that looked at hippocampus development (where they didn’t perfuse so there may be blood contamination) or whole blood (where there would be significant red blood cell contamination). In contrast, we perfused our mice and treated the single cell suspension with RBC lysis buffer, as detailed in Methods. Also, we have now extended our scSeq analysis to compare hemoglobin RNA to red blood cell specific markers including Gypa/CD235a. While hemoglobin is distributed throughout the myeloid population in the female KO mice, Gypa/CD235a, which would suggest RBC contamination is not expressed at all (see new Fig 7B). Additionally, we provide hemoglobin protein ELISA and IF staining to support our finding that macrophages from KO mice express hemoglobin protein. Last, two publications support hemoglobin expression by nonerythroid sources, including macrophages (PMID: 10359765; PMID: 25431740). While we are confident based on above that our data is not due to RBC contamination, we cannot exclude the fact that, although unlikely, macrophages may be phagocytosing RBC and preserving specifically hemoglobin RNA and protein. Nonetheless, we discuss this possibility in the text. In conclusion, based on the justification above and the new data, we are confident that our findings and overall conclusions are robust.

      To assess for potential RBC contamination, in addition to Gypa, we additionally looked at top genes expressed by murine erythrocytes (PMID: 24637361). Please see below feature plots, showing little to no expression, and a very different distribution than the hemoglobin genes (see new Fig 7a):

      Also, we had a small cluster of potential RBCs (only 75 cells) that we filtered out of downstream DEG analysis, which revealed the same data as in the first submission.

      4) Within the text, there are several instances where the authors claim that a pathway is upregulated based on their Gene Ontology (GO) over-representation analysis (ORA). To come to this conclusion, the authors identify genes that are upregulated in one condition and then perform GO-ORA on these genes. However, the authors do not consider negative regulators, whose upregulation would actually decrease the pathway. Authors should either replace their GO-ORA analysis with one that considers the magnitude and direction of differentially expressed genes and provides an activation z-score (i.e., Ingenuity Pathway Analysis) or replace instances of 'upregulated' or 'downregulated' pathways with 'over-represented' pathways.

      Unfortunately, we did not have access to IPA for this project, therefore we have changed our analysis to over and under-represented pathways as suggested.

      5) For Fig.7A, a representative tSNE plot for each group (WT Female, KO Female, WT Male, KO Male) should be shown to ensure there is proper integration of the clusters across groups. There are some instances where the scRNA-Seq data do not appear to be integrated properly (i.e., Supplemental Figure 2C). The authors should explore integration techniques (i.e., Seurat; PMID: 29608179) to correct for potential batch effects within the analysis.

      We thank the reviewer for the suggestion of proper integration of the clusters across groups. We performed integration using the Cell Ranger aggregation (aggr) pipeline (see updated materials and methods section). In addition, many technical controls were performed to prevent batch effects between our samples. For sequencing, we used the 10x genomics library sequencing depth and run parameters for both gene expression and multiplexing libraries. For all 3’ gene expression library sequencing, we sequenced at a depth of 20,000 read pairs per cell and for all cell multiplexing library sequencing we sequenced at a depth of 5,000 read pairs per cell. All libraries were paired-end dual indexed libraries and were pooled on one flow cell lane using a 4:1 ratio (3’ Gene expression: Multiplexing ratio) in the Novaseq, as recommended by 10x Genomics, in order to maintain nucleotide diversity and prevent batch effects during the sequencing process. When performing integration/aggregation of all sample gene expression libraries using the Cell Ranger aggregation (aggr) pipeline, we performed sequencing depth normalization between all samples. Cell Ranger does this by equalizing the average read depth per cell between groups before merging all sample libraries and counts together. This is a default setting in the Cell Ranger aggr pipeline, and this approach avoids artifacts that may be introduced due to differences in sequencing depth. Thus, we are confident that changes we observed in gene expression and cell type populations are due to biological differences and not technical variability. Below we have provided a tSNE plot showing clustering of all 12 samples after we performed integration:

      We updated old Fig.7 (now Fig. 6) and included a representative tSNE plot for each group. We also updated the tSNE plot for Figure 5-figure supplement 2C (previously S2C) showing overall clustering amongst all groups. The largest population differences occurred in the fibroblast population and these population differences were largely due to sex differences. Because we are confident that integration was performed appropriately and that batch effects were controlled for, we believe these sex differences are a biological effect.

      6) LncRNA Gm47283 is identified as a gene that is differentially expressed by genotype in HFD females (Fig. 7G); however, according to Ensembl this gene is encoded on the Y-chromosome (https://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000096768;r=Y:90796007-90827734). The authors should use the RELMα genotype and sex chromosomally-encoded genes to confirm that their multiplexing was appropriate.

      We agree with the reviewer that it is crucial to confirm that multiplexing and all subsequent analyses are performed correctly. Comparison between males and females contains internal controls that increase confidence, such as Xist gene that is expressed only in females, and Ddx3y that is located on the Y chromosome. LncRNA, Gm47283 is located in the syntenic region of Y chromosome and is also present in females, annotated as Gm21887 located in the syntenic region of the X chromosome. It also has 100% alignment with Gm55594 on X chromosome. Additionally, it is also referred to erythroid differentiation regulator 1 (Erd1), x or y depending on the chromosome, although NCBI database specifies partial assembly and incomplete annotation. Therefore, this explains why we see expression of this gene in females. We have discussed this in the text. We revised the text to refer to this LncRNA as Gm47283/Gm21887 to prevent further confusion. The RELMalpha genotype (absence in the KO) was also confirmed. Last, the PC analysis (see Fig 5) supports clustering by group.

      7) For Fig. 8, samples should be co-clustered and integrated across groups before performing trajectory analysis to allow for direct comparisons between groups.

      We appreciate the valuable feedback and suggestions, which have been helpful in clarifying the trajectory analysis, which we have done as follows:

      Regarding the co-clustering and integration of our samples across groups, here is the explanation of our trajectory analysis approach. We have co-clustered all of our samples using the align_cds function from the Monocle3 package. We have included the code for Figure 8 in our Github repository at https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils/blob/main/Figure8.R. Specifically, lines 138, 166, 196 and 225 of the code indicate that the align_cds function was used to cluster our samples by "Sample.ID".

      The align_cds function in Monocle3 can be used to co-cluster all samples in a single-cell RNA-seq experiment by aligning coding sequences (CDS) across different cell types or conditions. The align_cds function takes a set of reference CDS sequences and single-cell RNA-seq reads and identifies the CDS sequences within each read, allowing the identification of differentially expressed genes across different cell types or conditions based on the aligned CDS sequences. More details about align_cds can be found here https://rdrr.io/github/cole-trapnell-lab/monocle3/man/align_cds.html .

      We hope that this additional information alleviates the reviewer’s concerns.

      8) Since the experiments presented in this report were from young mice using a single diet intervention, the authors should comment on how age and other obesogenic diets may impact the results found here. Also, the authors should expand their discussion as to what upstream regulators (i.e., hormones or genetics) may be driving the sex differences in RELMα expression in response to HFD.

      We thank the reviewer for the suggestion. We included several sentences to address this comment. However, since reviewers commented that some of the text needs to be trimmed down, extensive discussion regarding reasons for sex differences, which are numerous, are outside the scope of this manuscript. For example, sex differences can arise from all or any of these:

      1. Sex steroid hormones (estrogen and testosterone) are an obvious possibility for sex differences and this discussion has been included below and in the text.

      2. Sex differences we observe may stem from variety of other factors, besides ovarian estrogen; including extraovarian estrogen, primarily estrogen produced in adipose tissues (32119876).

      3. Sex differences exist in fat deposition, which may or may not be estrogen dependent (25578600, 21834845).

      4. Sex difference were determined in metabolic rate and oxidative phosphorylation, which may also be independent of estrogen (28650095, and reviewed in 26339468).

      5. Sex differences exist in the immune system, some of which are estrogen independent, but dependent on sex chromosomes (32193609).

      6. Sex differences particularly in myeloid lineage, which may also be estrogen independent (25869128).

      7. Sex differences were determined in adipokine levels, including leptin and adiponectin, which influence immune cells in adipose tissues (33268480).

      The role of estrogen is not clear either, and thus extensive discussion is not possible. Numerous studies demonstrated that estrogen is protective from inflammation, thus it is possible that estrogen drives some of the sex differences observed herein. However, several studies determined that estrogen can be pro-inflammatory (20554954, 15879140, 18523261). Previous publications by us (30254630, 33268480) and others (25869128) demonstrated intrinsic sex differences in immune system, that are maybe dependent on sex chromosome complement and/or Xist expression (34103397, 30671059).

      Studies are more consistent that estrogen is protective from weight gain: postmenopausal women with diminished estrogen, and ovariectomized animal models gain weight. The effects of ovariectomy on weight gain and its additive effects with high fat diet were reported in Rhesus monkeys (for example PMID: 2663699; and PMID: 16421340); and in rodents (PMID: 7349433).

      The reviewer is correct that the effects of aging or estrogen on RELMa levels would be of significant interest, and could be a future direction of our studies. Aging-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), that may be dependent on estrogen, can exacerbate obesity-mediated inflammation. We have added this discussion.

      For these reasons we limited our discussion regarding possible differences and stated this in the discussion: “Several studies demonstrated the protective role of estrogen in obesity-mediated inflammation and in weight gain, as discussed above. Whether estrogen protection occurs via estrogen regulation of RELMa levels is a focus of our future studies. Alternatively, intrinsic sex differences in immune system have been demonstrated as well (30254630, 33268480, 25869128) that are dependent on sex chromosome complement and/or Xist expression (34103397, 30671059), and RELMa may be regulated by these as well. Additionally, ageing-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), may also occur via changes in RELMa levels. Our studies used young but developmentally mature mice (4-6 weeks old when placed on diet, 18 weeks old at sacrifice), and future work on aged mice would be needed to investigate aging-mediated inflammation. Furthermore, there are sex differences in fat deposition, metabolic rates and oxidative phosphorylation (reviewed in 26339468), and adipokine expression (Coss) that regulate cytokine and chemokines levels, and therefore may regulate levels of RELMa as well. These possibilities will be addressed in future studies.”

    1. Author Response

      Reviewer #1 (Public Review):

      Most work on antibiotic resistance focuses on particular resistance genes often located on plasmids, but rarely how these genes interact with others located on the chromosome of the host organism. Considering variation in the host genome and its interaction with resistance plasmids can help predict which hosts are more likely to become resistant to a given antibiotic and explain why the same plasmid may not confer the same level of resistance to different strains.

      The authors take a clever approach to finding such genetic interactions by designing an evolution experiment using E. coli carrying an MCR-1 plasmid containing resistance genes to colistin. They then select for increased resistance to colistin and sequence the genomes of the most resistant isolates. This allowed them to identify a particular gene lpxC that confers increased resistance to E. coli when combined with the MCR-1 plasmid (more than the sum of each mutation alone) and find that this is because of decreased membrane surface charge. They then investigate whether this mutation is relevant in wild E. coli isolates by analysing environmental samples from patients and other sources and find that indeed, this mutation is often found in carriers of the MCR-1 plasmid.

      The study is very well-designed and presented in a concise and logical manner. The use of evolution experiments to identify the mutations and then engineer them to quantify the epistatic effects and understand the mechanism behind them is very elegant. The real-world relevance is then supported by looking for these mutations in environmental samples. Despite this simplicity and clarity, in some places, the writing could be improved. I particularly found that the second half of the paper was not as easy to follow as the first part and could benefit from some clarifications. The figures could also contain a bit more information to help the reader.

      Thank you!

      1.1 For example, the abstract starts by talking about standing genetic variation but it's not immediately clear what is meant by that. Standing genetic variation seems to suggest that the resistance gene itself is present in the initial population, rather than variation in other loci that might affect the selection of the resistance gene. This could be better formulated.

      We have revised the abstract to be clearer about the source of genetic variation.

      1.2 The figures could be improved by being more specific about the datasets: are mutations in Figure 2 in the WT or the MCR-1 positive lines? Are the SNPs in Fig. 4A in lpxC? Do all isolates in Fig. 4 have the MCR-1 plasmid?

      Thank you for the comment. We have edited the figure legend (line 128, page 5). Yes, Fig. 4A shows SNPs in lpxC, and all the isolates in Fig 4 have the MCR-1 plasmid. We have now clarified this in the figure legend (line 230, page 9).

      1.3 Finally, the arguments being made about diversity in the different phylogroups were not very clear. This could be made more explicit at first mention, rather than later in the discussion section.

      We have revised this section to clarify theses points (lines 242-245, page 10).

      Reviewer #3 (Public Review):

      Jangir et al. used an 'evolutionary ramp' experiment to evolve E. coli strains under the selection pressure of increasing colistin concentrations wherein the surviving fractions were collected for genomic analysis. They report that the mcr-1 carrying strain evolved higher colistin resistance much faster only in presence of lpxC mutations in the genome. They identify the mcr-1 and lpxC interactions to be positively epistatic and mutations only in lpxC do not lead to resistance to colistin. Taking a cue from their evolution experiments, they looked for the variations in lpxC sequences in the genomic datasets of clinical E. coli strains. They found many such variations in the genomes of clinical isolates. Importantly, they found those variations to be present even in non-resistant strains which might predispose those strains to gain untreatable levels of colistin resistance.

      Strengths:

      The study focuses on two key aspects of antibiotic resistance in clinical settings. First, is the antibiotic colistin itself which is part of the last line of defense. Second, is the importance of genomic variations in clinical isolates that have not been linked to any antibiotic resistance mechanisms. The data were presented in a logical sequence and maintained brevity. The link of lpxC to mcr-1 resistance is convincing.

      Thank you!

      Weaknesses:

      The basic premise of the paper is solid but the following should be addressed.

      3.1 In Figure 1, the authors applied the 'evolutionary ramp' method to isolate evolved strains with higher MIC to colistin; but, the conditions for the evolution of WT and strain carrying mcr-1 are different.Maintaining mcr-1 requires antibiotic selection which WT cannot withstand. Hence, if I am not mistaken, WT was not grown in the presence of any antibiotic.

      The referee’s assertion that the selective pressures experienced by the WT and MCR+ populations were different is incorrect. We increased relative antibiotic dose (i.e., as a fraction of the MIC of the parental strains) at the same rate for both the WT and MCR+ populations. This is clearly explained in the text (lines 98-100, page 3), and the absolute colistin doses are shown in Figure 1. Please also see response 2.4 above.

      In our study, we used a naturally occurring MCR-1 carrying plasmid from the IncX4 family. This plasmid is actually very stable (in the short term at least) in the absence of colistin, in spite of the costs imposed by MCR-1. We speculate that this stability in part reflect the high conjugation rate of the plasmids and the presence of a toxin-antitoxin module.

      3.2 Not only that, maintaining a ~32 Kb plasmid itself can have different selective landscapes. The authors may replicate the experiment with their low-copy clone of mcr-1 which would make it easier for the authors to have an empty vector in WT as a proper control. Since now they know the expected mutations to be in lpxC, they might sequence a PCR amplicon of that region for validation of their hypothesis.

      This is an interesting idea for a future study. We agree with the referee that the presence of the MCR-1 plasmid may impose additional selective pressures that could potentially lead to bacteria-plasmid co-evolution. However, our data suggests that bacteria-plasmid interactions were not an important selective force over the course of our experiment: we detected no mutations in the plasmid and almost all of the chromosomal mutations that we detected could be easily associated with selective pressures imposed by colistin.

      3.3 In Figure 2, what are the effects of these mutations in lpxC? The authors state that many mutations map on to the metal binding domain; but are those significant changes? LpxC is relatively well characterized and authors may want to comment on these mutations a little more.

      Yes, most of the evolved lines had mutations in the metal-binding domain site, and it is known that this site is very important for lpxC activity. For example, mutations at positions 79, 238, 242 and 246 lead to a hundred to thousand-fold decrease in lpxC activity (PMID: 24117400, 24108127, and 11148046), and many of our mutations map close to these sites (lines 140-143, page 6, and Figure 2b).

      3.4 Also, lpxC mutations showed enrichment but lpxA did not. Is this suggestive of the type of Lipid A that is more preferred for the epistatic interactions? The authors may want to comment on that.

      Interestingly, this could be the case that the epistatic interactions depend on the type of lipid A modification and the associated pleiotropic effects. Because mutations in LPS biosynthesis genes can have diverse adverse effects as it alters the membrane properties. However, in-depth future work is required to understand how the different types of changes in lipid A influence interactions with MCR. We chose not to further explore this in the paper, because lpxA was rarely mutated (2/17 clones) compared to lpxC (11/17 clones).

      3.5 In Figure 3, the lpxC mutant shows a reduction in fitness in a competition assay. What is the growth pattern of individual strains?

      The standard growth curve assay shows no significant difference in growth rate between LpxC mutant and wild-type strain (figure below). This is evident by the fact that standard growth curves are not ideal for capturing small differences in growth/fitness. Therefore, we emphasize the results of the competition experiment as this is gold standard method for measuring fitness effects (Figure 3c).

      3.6 There is a possibility that slow growth of lpxC mutant provides benefit under antibiotic stress.

      This is an interesting idea, but in this case, the slow growth of the lpxC mutant is clearly associated with a small decrease in colistin resistance (Figure 3A).

      3.7 Minor comment: the three individual replicates shown in Figure 3a are all identical within a sample and do not add to the figure where n=3. The authors can simply show SD or report correct values of replicates.

      We chose to show the raw data points, as this is the style of presentation that is being increasingly used by journals (i.e., many journals now say show all raw data points when n<6 or 10). It would not make sense to show a standard error as this was equal to 0.

      3.8 In Figure 4, as the authors themselves have stated, the difference in heterogeneity could be simply due to variation within phylogroups and subsequent compositional differences within the populations. The authors must check if mutations were found in the same location of lpxA as found in their own evolved strains. Without this information, the heterogeneity data would be speculative. Adding the lpxC variants reported in figure 2 to the trees of figure 4 (right) will make it clear if their conclusion is justified.

      This is an interesting point. We found no overlap between our experimentally evolved mutations and naturally occurring lpxC mutations, either at the level of nucleotides or codons. However, it is unclear if we should expect to see an overlap for two reasons: 1. The mutations present in natural isolates likely reflect a combination of beneficial mutations, neutral mutations, and weakly deleterious mutations. The mutations found in our evolved isolates, on the other hand, are all mutations that were beneficial under colistin selection. As such, it is probably not reasonable to expect a strong overlap between the two sets of mutations. 2. The lpxC mutations that we observed in our 11 lpxC mutated isolates are highly diverse – we found no cases of parallel evolution at the nucleotide level, and only a single example of parallel evolution at the codon level. Given this, our data suggest that a very wide diversity of sites of lpxC can interact epistatically with MCR-1 to increase colistin resistance. Again, this high diversity of potential lpxC mutations should give a weak association between lab evolved and clinical isolates.

      We have added these points in the text (lines 278-304, pages 11-12).

      3.9 The authors can perform a confirmatory experiment for the pre-existing part of their hypothesis. If they perform the evolutionary ramp experiment with a strain carrying lpxC mutant strain, will they see faster evolution of high MIC mutants?

      This is an interesting idea, our results suggest that more rapid evolution of high level colistin resistance would occur in the lpxC mutant compared to a wild-type strain (assuming that both had an equivalent opportunity to acquire MCR-1 by horizontal gene transfer).

      4.0 The rationale of how the presence of lpxC mutations can cause a strain without any colistin resistance to acquire mcr-1 is not addressed. The authors may want to comment on that.

      MCR-1 is carried on conjugative plasmids, and the main plasmid families that carry MCR-1 (IncI2 and IncX4) have high conjugative rates. We have changed the text of introduction to emphasize that MCR-1 is carried on conjugative plasmids, and we have linked MCR-1 acquisition to plasmid conjugation (lines 327-328, page 13).

    1. Autho Response

      Reviewer #1 (Public Review):

      Here the authors aimed to gain insight into the role of Septin-7 in skeletal muscle biology using a novel and powerful mouse model of inducible muscle specific septin-7 deletion. They combine this with CRISPR/Cas9 and shRNA mediated manipulation of Septin-7 in C2C12 cells in vitro to explore its role in muscle progenitor morphology and proliferation. There are a variety of interesting observations, with clear phenotypes induced by the Septin-7 manipulation, including effects on body weight, muscle force production, mitochondrial morphology, and cell proliferation. However each area is somewhat superficially examined, and certain conclusions require additional validation for robust support. Additionally, mechanistic insight into Septin 7's role is limited. Therefore, while the phenotypes are likely of intrigue to both the muscle and septin community, to significantly advance the field will require additional experimentation.

      Specifically, it is currently difficult to distinguish between developmental and adult roles of Septin-7. The authors induce tamoxifen-mediated deletion at 1 month of age and examine muscle structure/function only at 4 months. By not studying early time points, it is difficult to determine whether particular phenotypes are directly due to Septin deletion or a secondary consequence of muscle atrophy and/or a decline in body weight. Further, by not inducing deletion at a later time point (i.e. after 2 months when muscle is generally matured), it is difficult to assess whether septin-7 plays a role in maintaining structure and function of mature muscle, or if its primary role is in muscle development.

      We have conducted a number of trials for knocking-down of Septin-7 expression. These included Tamoxifen treatment of Cre- pregnant mothers, shorter treatments starting at early after birth, and treatments of adult animals. While the former led to still-born offsprings, the later resulted in only a minor – less than 20% - reduction of Septin-7 expression. These long trials led us to, on the one hand, concentrate on the protocol used throughout the manuscript (where a significant, up to 50%, reduction in the expression of the protein could be achieved) and to, on the other hand, focus also on myogenic cells in culture. This selection was also substantiated by the finding that Septin-7 expression is the highest in neonatal muscles and declines with age until adulthood (but remains essentially constant until an age of 18 months for the mice examined). As an identical Tamoxifen treatment of littermate Cre- mice did not result in any of the presented alterations (as demonstrated in the Supplementary material) we can conclude that they are the consequence of Septin-7 down-regulation. We, nonetheless, completely agree with the Reviewer that some observations are most likely indirect, i.e., are due to the loss of muscle mass. These include, e.g., the altered shape of the vertebra and the consequent “hunchback” phenotype. However, this observation further supports our claim that Septin-7 is essential for proper development of a normal musculature in these animals.

      Further, the conclusion that septin-7 has an essential role in regeneration (seemingly based on expression increasing after injury) is unsupported and requires further experimentation where injury and regeneration is triggered in the absence of Septin-7 to establish a causative role.

      We agree with the Reviewer that a clear causative role of Septin-7 in muscle regeneration would require a substantial amount of further experimentation on Septin-7 knock-down animals. We, however, believe that this – detailed description of the changes in transcription factors and key regulatory proteins together with changes in morphology in Septin-7 KD animals following muscle injury – is beyond the scope of the present manuscript and should be presented as a separate study. In this manuscript, however, we provide the essential background to substantiate this claim. We describe that fusion of myogenic cells is severely hindered if Septin-7 expression is suppressed while Septin-7 is upregulated following muscle injury to the extent which is significantly more than what would be expected if it would be simply due to the production of new muscle fibers.

      Finally, there are intriguing observations in mitochondrial and myofiber organization and mitochondrial content; however further interrogation into additional relevant metrics of each, and at different time points of Septin-7 deletion, are needed to better understand these phenotypes and gain insight into Septin-7's role in their regulation.

      Accepting the concern of the Reviewer we have conducted additional experiments to enable the proper characterization of the morphology. Additional relevant metrics – Aspect Ratios and Form Factors – have been calculated and are now incorporated into the revised MS and are presented in Figure 5.

      Reviewer #2 (Public Review):

      This is a comprehensive work describing for the first time the location and importance of the cytoskeletal protein Septin-7 in skeletal muscle. The authors, using a Septin-7 conditional knockdown mouse model, the C2C12 cell line, and enzymatically isolated adult muscle fibers, explore the normal location of this protein in muscle fibers, the morphological alterations in conditioned knockdown conditions, the developmental alterations, and the functional alterations in terms of force production. The global picture that emerges shows Septin-7 as a fundamental brick in both muscle construction, development, and regeneration; all this leads to reinforcing the basically structural nature of this protein role.

      We thank the Reviewer for the appreciative words. We indeed believe that Septin-7 plays and important role in the proper organization and development of skeletal muscle. Even a partial knock-down of the protein at the early stages of life results in a severe loss in muscle mass accompanied by skeletal deformities. A complete knock-out of the protein results, at the myoblast level, in the inability of the cells to proliferate and form multinucleated cells confirming the essential role of this structural protein.

      Reviewer #3 (Public Review):

      This is an original study to explore the role of Septin-7, a cytoskeleton protein, in skeletal muscle physiology. The authors produced a unique mouse model with Septin-7 conditional knockdown specifically in skeletal muscle, which allowed them to examine the structure and function changes of skeletal muscle in response to the reduced protein expression level of Septin-7 in vivo and ex vivo at different development stages without the influence of other body parts with reduced Septin-7 expression. The study on the cellular model, C2C12 myoblast/myotubes with knockdown of Septin-7 expression, provided additional evidence of the importance of this cytoskeleton protein in regulating myoblast proliferation and differentiation. Majority of the data are supportive of the the major claim in this manuscript. However, additional key experiments and data analysis are needed to provide more mechanistic characterization of Septin-7 in muscle physiology.

      We would like to express our thanks to the Reviewer for the critical comments on our manuscript and for the valuable suggestions that help substantiate our claim, that Septin-7 is an essential part of the cytoskeletal network in skeletal muscle and plays an important role in muscle differentiation as well as in myoblast proliferation and fusion.

      A number of additional experiments were carried out to answer the comments/concerns of the Reviewer. Immunostaining of critical proteins (actin, myosin, and the L-type calcium channel) are now presented in Figure S4 for Cre+ animals. The T-tubules of enzymatically isolated fibers from these Septin-7 knock-down mice were also stained using Di-8-ANEPPS and the corresponding images are presented below. We describe how different Tamoxifen treatments at different time-points in the intra- and extra-uterine life of the animals resulted in the deletion of the SEPTIN 7 gene which ultimately led us to use the protocol (largest reduction with still viable mice) described in this manuscript. A more detailed description on how the fusion index, a clear marker a myotube differentiation, was conducted using desmin staining is now included and additional experiments (immunostaining and western blot) with MYH as suggested by the Reviewer are also presented. We carried out a thorough analysis of mitochondrial morphology (in line with the requirements of another Reviewer) and modified the corresponding figure in the revised MS accordingly.

      Major Concerns:

      1) The Septin-7 knockdown mouse model, the EM and IHC techniques are all established in the research group. It is a surprise to see that authors missed the opportunity to characterize the morphological changes in the T-tubule network, triad structure, the distribution of Ca release units (i.e., IHC of DHPR and RyR), and its co-localization with other key cytoskeletal proteins (i.e. actin) etc., in the muscle section or isolated muscle fibers.

      We appreciate the reviewer's valuable critical comments. Even if we were not able to fully comply with all the requests, we corrected as many of the mentioned shortcomings as possible, by correcting the errors and to prove our claims with further experiments. Please find our responses to each critical remark below.

      We conducted IHC staining on individual FDB fibers of C57Bl/6 mice presenting the distribution of skeletal muscle specific α-actinin, and RyR1 alongside with Septin-7 proteins (Figure 1E and F). As demonstrated in Figure 5E and F of the original MS (Figure 5 F and G in the revised version) normal triad structures were present both in Cre- and Cre+ muscle samples using EM analysis. However, the sarcomeres were distorted at places where large mitochondria appeared in Cre+ samples.

      As suggested, T-tubule staining by Di-8-ANEPPS was carried out on isolated FDB fibers from Cre- and Cre+ animals, which revealed no considerable differences between the two groups.

      Images present the T-tubule system of a single muscle fibers isolated from Cre- and Cre+ FDB muscle. Di-8-ANEPPS staining reveals no considerable difference between the two type of animals suggesting that the reduced Septin-7 expression does not alter the T-tubular system of skeletal muscle cells.

      To further investigate the key components of muscle contraction and EC coupling, we carried out immunostaining in isolated single fibers from FDB muscle originating from Cre+ and Cre- mice. Immunocytochemistry revealed no significant alteration of actin, myosin 4, and L-type calcium channel labeling comparing the two mouse strains (see Figure S4 in the revised version).

      2) The authors only studied one time point following the Tamoxifen treatment (4-month old with 3-month treatment). Based on Fig 2D, a significant body weight reduction was achieved after one month of the Tamoxifen treatment (at the age of 7 weeks), indicating a potential reduced muscle development at this age. Mice are considered fully matured at the age of 2 months. It will be more informative if the muscle samples and the in vivo and in vitro muscle activity are analyzed at this time point (7 or 8-week old), which should provide a direct answer if the knockdown of Septin-7 affects the muscle development. Additionally, a time dependent correlation of the level of Septin-7 knockdown with muscle function/morphology analysis should better define the role of Septin-7 in muscle development and function.

      We agree with the Reviewer that Septin-7 has presumably more pronounced effect in the early stage of muscle development, since we detected higher expression level of the protein in muscle samples isolated from newborn and young as compared with adult animals. We conducted preliminarily in vivo and in vitro force experiments on 2-month-old mice after 1 month of Tamoxifen treatment. The grip force already decreased significantly in Cre+ mice but the decrease in twitch and tetanic force of EDL and Sol did not reach significance. These experiments were followed by the analysis of Septin-7 level in the muscle samples which showed less than 20% of reduction on average in the samples of Cre+ mice. This suggested that a more robust suppression of Septin-7 is needed to reach significant reduction in in vitro force thus we decided to extend the Tamoxifen treatment to 3 months.

      3) Although the expression level of Septin-7 reduced during muscle development (Fig 1C), but its expression is still evident at the age of 4 months (Fig 1C and Fig S1F), indicating a potential role of Septin-7 in maintaining normal muscle function. It is important to examine whether the Tomaxifen treatment started after the muscle maturation at the age of 2-month old would affect the muscle structure and function. Particularly, these type of KD mice will be critical to answer if the KD will affect the regeneration rate following the muscle injury. The outcome will further test or support their claim of the essential roles of Septin-7 in muscle regeneration.

      We agree with the Reviewer opinion that Septin-7 presumably plays an essential role not only during the early development of skeletal muscle but also in the matured tissue. In our preliminary studies Septin-7 protein expression was determined in skeletal muscle samples from mice at different developmental stage. As presented in Figure 1C we observed decrease in Septin-7 protein expression from newborn to adult stages. The expression profile of Septin-7 was also investigated in samples from 2, 4, 6, 9, and 18-month-old mice and a significant decrease was observed in samples isolated from mice of 4, 6, 9, and 18 months of age (58±8; 48±9; 66±16; 54±9% relative to the 2-month-old muscles, respectively), however there were no considerable changes between samples after 4 months of age.

      In order to generate skeletal muscle specific, conditional Septin-7 knock-down animals, we applied Tamoxifen treatment at different developmental stages in our preliminary studies (see the table and figures below). When Cre- pregnant females were fed with Tamoxifen in the third trimester of pregnancy, it caused intrauterin lethality independent of the genotype. According to the animal ethics requirements we did not continue this experimental protocol. In the next stage of our initial experiments, 3 month-old mice were treated with both intraperitoneal injections for 5 consecutive days or Tamoxifen diet for 4 weeks. Here, only a moderate deletion of the exon4 was detected in SEPTIN 7 gene in Cre+ animals (data obtained from these mice are shown below).

      These findings and the observation of ontogenesis dependent expression of Septin-7 indicated its significance at the early stage of development and suggested that we should try to modify the gene expression at earlier age. Six weeks of diet supplemented with Tamoxifen generated well detectable exon deletion in younger (1-month-old) mice. Regarding these observations we decided to start the Tamoxifen-supplemented diet in younger (4-week-old) animals immediately after separation from the mother and we continued the treatment for a longer period (3 months) to be sure that exon deletion will be prominent in all Cre+ animals.

      Genetic modification of SEPTIN 7 gene following Tamoxifen treatment in mice mentioned above. RT-PCR

      Figure presents the presence of floxed sites at SEPTIN 7 gene (white arrow) and the deletion of exon4 (red arrows) in the appropriate DNA samples isolated from mice treated with Tamoxifen from different age and using different methods and period of Tamoxifen application. Exon4 deletions were less than 20%, therefore these trials were not continued. Numbers above each lane correspond to the animal ID-s presented in the table above. Q – m. quadriceps, B- m. biceps femoris, P – m. pectoralis.

      The knock-down of Septin-7 in the adult animals (where its expression is already low; see above) did not result in an appreciable further reduction. This led us to conclude that the role of Septin-7 is most pronounced in muscle development. In this framework, at the adult stage a possible function of Septin-7 in muscle regeneration following injury could be envisioned. This is demonstrated in Fiure 6 where we present that Septin-7 is upregulated following a mild injury. However, we believe, that a detailed examination of the role of Septin-7 in the regeneration is beyond the scope of the current manuscript and should be the basis of further studies.

      4) Regarding the impact of Septin-7 on differentiation, it could be problematic if the images with the resolution shown in Figure S4A-C were used for fusion index calculation. If those are just zoomed in representative images and the authors used other lower resolution, global view images for quantification, those images are needed to be shown. The authors may also need to elaborate on why they stained Desmin instead of MYH for quantification of the fusion index of myotubes (page 27). Desmin also marks mesenchymal cells.

      We apologize that the method used for fusion index calculation was not clear enough. Images in Figure S4A-C present the Septin-7 and actin cytoskeletal structure in proliferating myoblasts, before the induction of differentiation. Fusion index was determined in cultures where myotube differentiation was induced by reduced serum content (as described in Methods). We used desmin staining as the expression of this protein is present only in myotubes with 2 or more nuclei, where fusion of myoblasts has already started (see representative images below). Representative desmin-labeling images from control, scrambled and KD cultures are now included in Figure S5G at 5 days differentiated stage.

      Figure presents two examples (bottom row is now added to Figure S5 as panel G) of the desmin-specific immunostaining used for the calculation of fusion index in the different C2C12 cultures. Specific signals of desmin are present following the fusion of single nuclei myoblast into myotubes (green), while non-differentiated myoblasts did not show immunolabeling for desmin. Nuclei are stained with DAPI (blue).

      If Septin-7 is truly affecting differentiation, a decrease of MYH 2 expression can be readily detected by IHC or WB.

      We are grateful for the Reviewer´s suggestion. We have conducted immunocytochemistry and WB experiments in proliferating myoblasts and myotubes at day 5 of differentiation. As the figure below demonstrates, myosin heavy chain-specific immunolabeling could be detected only in differentiated samples, while myoblasts did not show positive signal. However, there is a significantly lower number of MYH2-positive myotubes in Septin-7 KD cultures as compared with the control and scrambled samples. In addition, we detected decreased WB signal for MYH2 in Septin-7 KD protein samples compared with their control counterparts.

      Figure presents the MYH2-specific immunostaining in the different C2C12 cultures. Specific signals of myosin heavy chain 2 (green) are present during myotube formation of differentiating cultures, however, less MYH2-positive myotubes are present in the Septin-7 KD cultures as a result of reduced capability of cells to fuse, here the DAPI-stained nuclei were only present. Proliferating myoblasts did not show specific immunolabeling for MYH2, as the confocal image and the appropriate part of the WB membranes show. We could also detect a decreased MYH2-specific labeling in Septin-7 KD samples as compared with the control ones using WB.

      Additionally, Septin-7 may also affect the migration or fusion of myoblasts instead of differentiation. The observation of altered cell morphology and filopodia/lamellipodia formation (Figure 3C) in Septin7-KD cells before differentiation also implies a potential role of Septin-7 in migration. This possibility should be at least discussed.

      We appreciate the Reviewer´s comment and suggestion. There are a few publication showing that alteration of septin (in some cases Septin-7) expression modifies the migration of different eukaryotic cell types, like in microvascular endothelial cells (PMID: 24451259), in human epithelial cells (PMID: 31905721), in neural crest cells (PMID: 2881782), and in human breast cancer or lung cancer cells (PMID: 27557506, 31558699, and 32516969). In the work of Li et al. (PMID:32382971) their findings revealed that miR-127-3p regulates myoblast proliferation by targeting Septin-7. In the present manuscript we described that Septin-7 modification alters myoblast fusion (Figure 3J), which is the accompanying phenomenon of differentiation. On the other hand, the effect of Septin-7 gene silencing on cell migration has been studied in detail and was presented to The Biophysical Society. The results are intended to be submitted as a separate manuscript.

      5) The image shown in Figure 5F does not support the pooled data showed in Figure 5C. The size of mitochondria is remarkably lager in Cre+ muscle (Fig 5E and 5F). The morphology of mitochondria in Cre+ muscle are apparently normal (Fig 5F), while the mitochondrial DNA content are drastically reduced (Figure 5H), which is an important discovery and deserved to be further confirmed by WB and/or qPCR for critical mitochondrial proteins (i.e. MTCOX, COXV, etc.).

      We thank the Reviewer for pointing out that the interpretation of images in Figure 5 was not clear enough. Based on this, and the on the clear request from the other Reviewer, a detailed evaluation of mitochondrial morphology was carried out and the panels of Figure 5 were redrawn and reorganized. The revised Figure 5 now presents the average Perimeter, the average Aspect Ratio, and the average Form Factor (panels C & H, for cross- & horizontal-sections, respectively), the relative distributions of the areas (panels D & I, for cross- & horizontal-sections, respectively), and the number of mitochondria normalized to fiber area (panel E, cross-sections). The mitochondrial DNA content is presented in panel J. As evidenced from these figures (and from the representative EM micro graphs), larger mitochondria, sometimes in large associations, are present in the muscles of Cre+ animals.

      Furthermore, gene expression of four essential mitochondrial proteins cytochrome oxidase 1 (COX1), cytochrome oxidase 2 (COX2), succinate dehydrogenase (SDH), and ATP synthase) were determined in RNA samples from different skeletal muscles of Cre- and Cre+ animals using qPCR. As the figure below demonstrates there was a tendency of decreased expression of the aforementioned genes in Cre+ muscle samples, however, significant difference between the Cre- and Cre+ data could not be detected.

      Figure represents the normalized mRNA expression of ATP synthase, SDH, COX1, and COX2 in Cre- (green) and Cre+ (red) samples isolated from m. quadriceps and m. pectoralis. Each gene expression was determined from 3 individual animals and a technical duplicate was used during the qPCR analysis. 36B4 gene encoding an acidic ribosomal phosphoprotein P0 was used as a normalizing gene.

      6) Figure 2 H & I: It is unclear whether the muscle force was normalized to the individual muscle weight.

      We are sorry about the incomplete representation and explanation of muscle force values. Figure 2F-I presents absolute force values without normalization to the cross sectional area. In order to answer the Reviewer´s comment the averages of normalized values are given in Table S3 in the modified manuscript.

      7) The IHC results in Figure 6B are confusing. There are no centrally located nuclei in the Pax7 alone image of Figure 6B but abundant in the Pax7 + H&E image. The brown color of DAB and the purple color of hematoxylin are hard to be distinguished.

      Images presenting the labeling of Pax7 (a transcription factor expressed in activated satellite cells) alone could not show centrally located nuclei, as the nuclei could only be visible when HE staining is applied. As the Reviewer mentioned brown color of DAB and the purple color of hematoxylin are sometimes difficult to distinguish, therefore, we first presented PAX7 expression visualized by DAB staining (localization was near the sarcolemma). In the next step we performed a double staining for PAX7 and HE to show both the cytoplasm and nuclei.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Goering et al. investigate subcellular RNA localization across different cell types focusing on epithelial cells (mouse C2bbe1 and human HCA-7 enterocyte monolayers, canine MDCK epithelial cells) as well as neuronal cultures (mouse CAD cells). They use their recently established Halo-seq method to investigate transcriptome-wide RNA localization biases in C2bbe1 enterocyte monolayers and find that 5'TOP-motif containing mRNAs, which encode ribosomal proteins (RPs), are enriched on the basal side of these cells. These results are supported by smFISH against endogenous RP-encoding mRNAs (RPL7 and RPS28) as well as Firefly luciferase reporter transcripts with and without mutated 5'TOP sequences. Furthermore, they find that 5'TOP-motifs are not only driving localization to the basal side of epithelial cells but also to neuronal processes. To investigate the molecular mechanism behind the observed RNA localization biases, they reduce expression of several Larp proteins and find that RNA localization is consistently Larp1-dependent. Additionally, the localization depends on the placement of the TOP sequence in the 5'UTR and not the 3'UTR. To confirm that similar RNA localization biases can be conserved across cell types for other classes of transcripts, they perform similar experiments with a GA-rich element containing Net1 3'UTR transcript, which has previously been shown to exhibit a strong localization bias in several cell types. In order to determine if motor proteins contribute to these RNA distributions, they use motor protein inhibitors to confirm that the localization of individual members of both classes of transcripts, 5'TOP and GA-rich, is kinesin-dependent and that RNA localization to specific subcellular regions is likely to coincide with RNA localization to microtubule plus ends that concentrate in the basal side of epithelial cells as well as in neuronal processes.

      In summary, Goering et al. present an interesting study that contributes to our understanding of RNA localization. While RNA localization has predominantly been studied in a single cell type or experimental system, this work looks for commonalities to explain general principles. I believe that this is an important advance, but there are several points that should be addressed.

      Comments:

      1) The Mili lab has previously characterized the localization of ribosomal proteins and NET1 to protrusions (Wang et al, 2017, Moissoglu et al 2019, Crisafis et al., 2020) and the role of kinesins in this localization (Pichon et al, 2021). These papers should be cited and their work discussed. I do not believe this reduces the novelty of this study and supports the generality of the RNA localization patterns to additional cellular locations in other cell types.

      This was an unintentional oversight on our part, and we apologize. We have added citations for the mentioned publications and discussed our work in the context of theirs.

      2) The 5'TOP motif begins with an invariant C nucleotide and mutation of this first nucleotide next to the cap has been shown to reduce translation regulation during mTOR inhibition (Avni et al, 1994 and Biberman et al 1997) and also Lapr1 binding (Lahr et al, 2017). Consequently, it is not clear to me if RPS28 initiates transcription with an A as indicated in Figure 3B. There also seems to be some differences in published CAGE datasets, but this point needs to be clarified. Additionally, it is not clear to me how the 5'TOP Firefly luciferase reporters were generated and if the transcription start site and exact 5'-ends of these constructs were determined. This is again essential to determine if it is a pyrimidine sequence in the 5'UTR that is important for localization or the 5'TOP motif and if Larp1 is directly regulating the localization by binding to the 5'TOP motif or if the effect they observe is indirect (e.g. is Larp1 also basally localized?). It should also be noted that Larp1 has been suggested to bind pyrimidine-rich sequences in the 5'UTR that are not next to the cap, but the details of this interaction are less clear (Al-Ashtal et al, 2021)

      We did not fully appreciate the subtleties related to TOP motif location when we submitted this manuscript, so we thank the reviewer for pointing them out.

      We also analyzed public CAGE datasets (Andersson et al, 2014 Nat Comm) and found that the start sites for both RPL7 and RPS28 were quite variable within a window of several nucleotides (as is the case for the vast majority of genes), suggesting that a substantial fraction of both do not begin with pyrimidines (Reviewer Figure 1). Yet, by smFISH, endogenous RPL7 and RPS28 are clearly basally/neurite localized (see new figure 3C).

      Reviewer Figure 1. Analysis of transcription start sites for RPL7 (A) and RPS28 (B) using CAGE data (Andersson et al, 2014 Nat Comm). Both genes show a window of transcription start sites upstream of current gene models (blue bars at bottom).

      A more detailed analysis of our PRRE-containing reporter transcripts led us to find that in these reporters, the pyrimidine-rich element was approximately 90 nucleotides into the body of the 5’ UTR. Yet these reporters are also basally/neurite localized. The organization of the PRRE-containing reporters is now more clearly shown in an updated figure 3D.

      From these results, it would seem that the pyrimidine-rich element need not be next to the 5’ cap in order to regulate RNA localization. To generalize this result, we first used previously identified 5’ UTR pyrimidine-rich elements that had been found to regulate translation in an mTOR-dependent manner (Hsieh et al 2012). We found that, as a class, RNAs containing these motifs were similarly basally/neurite localized as RP mRNAs. These results are presented in figures 3A and 3I.

      We then asked if the position of the pyrimidine-rich element within the 5’ UTR of these RNAs was related to their localization. We found no relationship between element position and transcript localization as elements within the bodies of 5’ UTRs were seemingly just as able to promote basal/neurite localization as elements immediately next to the 5’ cap. These results are presented in figures 3B and 3J.

      To further confirm that pyrimidine-rich elements need not be immediately next to the 5’ cap, we redesigned our RPL7-derived reporter transcripts such that the pyrimidine-rich motif was immediately adjacent to the 5’ cap. This was possible because the reporter uses a CMV promoter that reliably starts transcription at a known nucleotide. We then compared the localization of this reporter (called “RPL7 True TOP”) to our previous reporter in which the pyrimidine-rich element was ~90 nt into the 5’ UTR (called “RPL7 PRRE”) (Reviewer Figure 2). As with the PRRE reporter, the True TOP reporter drove RNA localization in both epithelial and neuronal cells while purine-containing mutant versions of the True TOP reporter did not (Reviewer Figure 2A-D). In the epithelial cells, the True TOP was modestly but significantly better at driving basal RNA localization than the PRRE (Reviewer Figure 2E) while in neuronal cells the True TOPs were modestly but insignificantly better. Again, this suggests that pyrimidine-rich motifs need not be immediately cap-adjacent in order to regulate RNA localization.

      Reviewer Figure 2. Experimental confirmation that pyrimidine-rich motif location within 5’ UTRs is not critical for RNA localization. (A) RPL7 True TOP smFISH in epithelial cells. (B) RPL7 True TOP smFISH in neuronal cells. (C) Quantification of epithelial cell smFISH in A. (D) Quantification of neuronal cell smFISH in D. (E) Comparison of the location in epithelial cells of endogenous RPL7 transcripts, RPL7 PRRE reporter transcripts, and PRL7 True TOP reporter transcripts. (F) Comparison of the neurite-enrichment of RPL7 PRRE reporters and RPL7 True TOP reporters. In C-F, the number of cells included in each analysis is shown.

      In response to the point about whether the localization results are direct effects of LARP1, we did not assay the binding of LARP1 to our PRRE-containing reporters, so we cannot say for sure. However, given that PRRE-dependent localization required LARP1 and there is much evidence about LARP1 binding pyrimidine-rich elements (including those that are not cap-proximal as the reviewer notes), we believe this to be the most likely explanation.

      It should also be noted here that while pyrimidine-rich motif position within the 5’ UTR may not matter, its location within the transcript does. PRREs located within 3’ UTRs were unable to direct RNA localization (Figure 5).

      3) In figure 1A, they indicate that mRNA stability can contribute to RNA localization, but this point is never discussed. This may be important to their work since Larp1 has also been found to impact mRNA half-lives (Aoki et al, 2013 and Mattijssen et al 2020, Al-Ashtal et al 2021). Is it possible the effect they see when Larp1 is depleted comes from decreased stability?

      We found that PRRE-containing reporter transcripts were generally less abundant than their mutant counterparts in C2bbe1, HCA7, and MDCK cells (figure 3 – figure supplements 5, 6, and 8) although the effect was not consistent in mouse neuronal cells (figure 3 – figure supplement 13).

      However, we don’t think it is likely that the changes in localization are due to stability changes. This abundance effect did not seem to be LARP1-dependent as both PRRE-containing and PRRE-mutant reporters were generally more expressed in LARP1-rescue epithelial cells than in LARP1 KO cells (figure 4 – figure supplement 9).

      It should be noted here that we are not ever actually measuring transcript stability but rather steady state abundances. It cannot therefore be ruled out that LARP1 is regulating the stability of our PRRE reporters. Given, though, that their localization was dependent on kinesin activity (figures 7F, 7G), we believe the most likely explanation for the localization effects is active transport.

      4) Also Moor et al, 2017 saw that feeding cycles changed the localization of 5'TOP mRNAs. Similarly, does mTOR inhibition or activation or simply active translation alter the localization patterns they observe? Further evidence for dynamic regulation of RNA localization would strengthen this paper

      We are very interested in this and have begun exploring it. We have data suggesting that PRREs also mediate the feeding cycle-dependent relocalization of RP mRNAs. As the reviewer says, we think this leads to a very attractive model involving mTOR, and we are currently working to test this model. However, we don’t have the room to include those results in this manuscript and would instead prefer to include them in a later manuscript that focuses on nutrient-induced dynamic relocalization.

      5) For smFISH quantification, is every mRNA treated as an independent measurement so that the statistics are calculated on hundreds of mRNAs? Large sample sizes can give significant p-values but have very small differences as observe for Firefly vs. OSBPL3 localization. Since determining the biological interpretation of effect size is not always clear, I would suggest plotting RNA position per cell or only treat biological replicates as independent measurements to determine statistical significance. This should also be done for other smFISH comparisons

      This is a good suggestion, and we agree that using individual puncta as independent observations will artificially inflate the statistical power in the experiment. To remedy this in the epithelial cell images, we first reanalyzed the smFISH images using each of the following as a unique observation: the mean location of all smFISH puncta in one cell, the mean location of all puncta in a field of view, and the mean location of all puncta in one coverslip. With each metric, the results we observed were very similar (Reviewer Figure 3) while the statistical power of course decreased. We therefore chose to go with the reviewer-suggested metric of mean transcript position per cell.

      Reviewer Figure 3. C2bbe1 monolayer smFISH spot position analysis. RNA localization across the apicobasal axis is measured by smFISH spot position in the Z axis. This can be plotted for each spot, where thousands of spots over-power the statistics. Spot position can be averaged per cell as outlined manually within the FISH-quant software. This reduces sample size and allows for more accurate statistical analysis. When spot position is averaged per field of view, sample size further decreases, statistics are less powered but the localization trends are still robust. Finally, we can average spot position per coverslip, which represents biological replicates. We lose almost all statistical power as sample size is limited to 3 coverslips. Despite this, the localization trends are still recognizable.

      When we use this metric, all results remain the same with the exception of the smFISH validation of endogenous OSBPL3 localization. That result loses its statistical significance and has now been omitted from the manuscript. All epithelial smFISH panels have been updated to use this new metric, and the number of cells associated with each observation is indicated for each sample.

      For the neuronal images, these were already quantified at the per-cell level as we compare soma and neurite transcript counts from the same cell. In lieu of more imaging of these samples, we chose to perform subcellular fractionation into soma and neurite samples followed by RT-qPCR as an orthogonal technique (figure 3K, figure 3 supplement 14). This technique profiles the population average of approximately 3 million cells.

      6) F: How was the segmentation of soma vs. neurites performed? It would be good to have a larger image as a supplemental figure so that it is clear the proximal or distal neurites segments are being compared

      All neurite vs. soma segmentations were done manually. An example of this segmentation is included as Reviewer Figure 4. This means that often only proximal neurites segments are included in the analysis as it is often difficult to find an entire soma and an entire neurite in one field of view. However, in our experience, inclusion of more distal neurite segments would likely only strengthen the smFISH results as we often observe many molecules of localized transcripts in the distal tips of these neurites.

      Reviewer Figure 4. Manual segmentation of differentiated CAD soma and neurite in FISH-quant software. Neurites that do not overlap adjacent neurites are selected for imaging. Often neurites extend beyond the field of view, limiting this assay to RNA localization in proximal neurites.

      Also, it should be noted that the neuronal smFISH results are now supplemented by experiments involving subcellular fractionation and RT-qPCR (figure 3 supplement 14). These subcellular fractionation experiments collect the whole neurite, both the proximal and distal portions.

      Text has been added to the methods under the header “smFISH computational analysis” to clarify how the segmentation was done.

    1. Author Response

      Reviewer #3 (Public Review):

      Main results:

      1) TCR convergence is different from publicity: The authors look at CDR3 sequence features of convergent TCRs in the large Emerson CMV cohort. Amino usage does not perfectly correlate with codon degeneracy, for example, arginine (which has 6 codons) is less common in convergent TCRs, whereas leucine and serine are elevated. It's argued that there's more to convergence than just recombination biases, which makes sense. (I wonder if the trends for charged amino acids could be explained by the enrichment of convergent TCRs in CD8 T cells, which tend to have more acidic CDR3 loops). There's also a claim that the overlap between convergent and public TCRs is lower in tumors with a high mutational burden (TMB), but this part is sketchy: the definition of public TCRs is murky and hard to interpret, and the correlation between TMB and convergence-publicity overlap is modest (two cohorts with low TMB have higher overlap, and the other three have lower, but there is no association over those three, if anything the trend is in the other direction). It's also not clear why the overlap between COVID19 cohort convergent TCRs and public TCRs defined by the pre-2019 Emerson cohort should be high. A confounder here is the potential association between convergence and clonal expansion since expanded clonotypes can spawn apparently convergent TCRs due to sequencing errors. The paper "TCR Convergence in Individuals Treated With Immune Checkpoint Inhibition for Cancer" (Ref#5 here) gives evidence that sequencing errors may be inflating convergence in this specific dataset.

      We really appreciate the reviewer’s feedback. We respond to each of the reviewer’s points below:

      (1) Amino acid preference of convergent TCRs might be caused by CD8+ T cell enrichment. To test this hypothesis, we performed the same analysis using only CD8+ T cells (using the Cader 2019 lymphoma cohort). The results are shown below. We do not observe significant changes after excluding CD4+ T cells, indicating that this enrichment might be caused by factors other than CD4/CD8 differences.

      (2) Definition of public TCRs. We have changed the definition of public TCRs. Instead of mixing the Emerson cohort into each group and using the mixed cohort to define the public TCRs, we just used the 666 samples of the Emerson cohort to define the same set of public TCRs and applied them to each cohort. Both the dataset and the approach used in this manuscript is consistent with a previous study on the same topic (Madi et al., 2014, elife).

      (3) Convergence-publicity overlap: We agree with the reviewer that some high TMB tumors did not show further decrease of convergence-publicity overlap. One potential explanation is that the correlation between the two is not linear. By adding additional cohorts in this revision (healthy and recovered COVID-19 patients), we confirmed the previously observed overall trend between TMB and the overlap, which supported our conclusions (see figure below). On the other hand, we believe that the high overlap of convergent TCRs among healthy cohorts might result from exposure to common antigens. In the cancer patients, while still exposed, private antigens derived from tumor cells are expected to compete for resources, thus reducing the proportion of these public TCRs in the blood repertoire. The above discussion has been added to the revised manuscript:

      “Healthy individuals are expected to be exposed to common pathogens, which might induce public T cell responses. On the other hand, cancer patients have more neoantigens due to the accumulative mutation, which drives their antigen-specific T cells to recognize these 'private' antigens. This reduces the proportion of public TCRs in antigen-specific TCRs. Furthermore, a higher tumor mutation burden (TMB) would indicate a higher abundance of neoantigens, resulting in a lower ratio of public TCRs.”

      2) Convergent TCRs are more likely to be antigen-specific: This is nicely shown on two datasets: the large dextramer dataset from 10x genomics, and the COVID19 datasets from Adaptive biotech. But given previous work on TCR convergence, for example, the Pogorelyy ALICE paper, and many others, this is also not super-surprising.

      We thank the reviewer for bringing up this related work. In the Pogorelyy ALICE paper, the authors defined TCR neighbors based on one nucleotide difference of a given CDR3, which included both synonymous and non-synonymous changes. In other words, ALICE combines both convergence and mismatched (with hamming distance 1) sequences as neighbors. Although highly relevant, our approach is different by focusing only on the convergence, as mismatch has been extensively investigated by previous studies. We have now added this paper as Ref 27, and discussed the difference between ALICE and our method in the revised manuscript.

      3) Convergent T cells exhibit a CD8+ cytotoxic gene signature: This is based on a nice analysis of mouse and human single-cell datasets. One striking finding is that convergent TCRs are WAY more common in CD8+ T cells than in CD4+ T cells. It would be interesting to know how much of this could be explained by greater clonal expansion of CD8+ T cells, together with sequencing errors. A subtle point here is that some of the P values are probably inflated by the presence of expanded clonotypes: a group of cells belonging to the same expanded clonotype will tend to have similar gene expression (and therefore similar cluster membership), and will necessarily all be either convergent or not convergent collectively since they share the same TCR. So it's probably not quite right to treat them as independent for the purposes of assessing associations between gene expression clusters and convergence (or any other TCR-defined feature). You can see evidence for clonal expansion in Figure 3C, where TRAV genes are among the most enriched, suggesting that Cluster 04 may contain expanded clones.

      (1) We agree with the reviewer that a possible explanation of the CD8/CD4 difference is the larger cell expansion of CD8+ T cells. We tested this hypothesis by counting the number of T cell clones instead of cell number to remove the effect that would have been caused by CD8 T cell expansion. We first investigated the bulk TCR repertoire sequencing samples as Figure 3 - figure supplement 2C-2D (see figure below). We observed higher convergence levels for the CD8+ T cell clones compared to CD4+ T cells. The additional description of this topic was added at the last paragraph of the result section of “Convergent T cells exhibit a CD8+ cytotoxic gene signature” as follows:

      “The results may be explained by larger cell expansions of CD8+ T cells than CD4+ T cells. Therefore, we calculated the number of convergent clones within CD8+ T cells and CD4+ T cells from the above datasets to exclude the effects of cell expansion. As a result, in the scRNA-seq mouse data, while only 1.54% of the CD4+ clones were convergent, 3.76% of the CD8+ clones showed convergence. Likewise, 0.17% of convergent CD4+ T cell clones and 1.03% of convergent CD8+ T cell clones were found in human scRNA-seq data. In the bulk TCR-seq lymphoma data, similar results were also observed, where the gap between the convergent levels of CD4+ and CD8+ T cells narrowed but remained significant (Figure 3—figure supplement 2C-2D). In conclusion, these results suggest that CD8+ T cells show higher levels of convergence than CD4+ T cells, which substantiated our hypothesis that convergent T cells are more likely antigen-experienced. This observation has been tested using multiple datasets with diverse sequencing platforms and sequencing depth to minimize the impact of batch or other technical artifacts.”

      (2) We next investigated the effect of cell expansion in the single cell analysis. We agree with the reviewer that some highly-expanded convergent clones could inflate the p-value. Therefore, we revised the calculation of TCR convergence by using the T cell clone instead of individual cells. We observed that the clusters of interest mentioned in the paper (for both mouse and human data) remain at the top convergent level among all clusters (see table below), with p values estimated using Binomial exact test. These results supported our hypothesis that TCR convergence is enriched for T cell clusters that are more likely antigen-experienced.

      4) TCR convergence is associated with the clinical outcome of ICB treatment: The associations for the first analysis are described as significant in the text, and they are, but just barely (0.045 and 0.047, but you have to check the figure to see that).

      As suggested by the reviewer, we have added the p-value to the test so that it is easier to see. In this revision, we adopted another definition of convergent level, changing from the ratio of convergent TCR to the actual number of convergent T cell clones within each sample. The p-values were more significant using this new indicator (0.02 and 0.00038). To avoid the effect of other variables that might be correlative with convergent levels, especially the sequencing depth, the multivariate Cox model was used for both datasets tested in the paper, correcting for TCR clonality, TCR diversity and sequencing depth (and different treatment methods for melanomas data). As a result, convergence remains significantly prognostic after adjusting for the additional variables.

      5) Introduction/Discussion: Overall, the authors could do a better job citing previous work on convergence, for example, papers from Venturi on convergent recombination and the work from Mora and Walczak (ALICE, another recombination modeling). They also present the use of convergence as an ICB biomarker as a novel finding, but Ref 5 introduces this concept and validates it in another cohort. Ref 5 also has a careful analysis of the link between sequencing errors and convergence, which could have been more carefully considered here.

      We thank the reviewer for this excellent suggestion. We have added the citation of Venturi on convergent recombination as Ref 43 and we cited it at the last paragraph of the result selection:

      “Convergent recombination was claimed to be the mechanistic basis for public TCR response in many previous studies(Quigley et al., 2010; Venturi et al., 2006).”

      We also included work from Mora and Walczak in the fourth paragraph of the introduction and the third paragraph of the discussion as Ref 27 to introduce this TCR similarity-based clustering method as well as its application in predicting ICB response:

      “This idea has led several TCR similarity-based clustering algorithms, such as ALICE (Pogorelyy et al., 2019), TCRdist (Dash et al., 2017), GLIPH2 (Huang et al., 2020), iSMART (Zhang et al., 2020), and GIANA (Zhang et al., 2021), to be developed for studying antigen-driven T cell expansion during viral infection or tumorigenesis.”

      “In addition, the potential prognostic value of TCR convergence and TCR similarity-based clustering was testified in other studies(Looney et al., 2019; Pogorelyy et al., 2019).”

      Ref 5 was recited while discussing the effect of sequencing error on TCR convergence in the fourth paragraph of discussion:

      “Improper handling of sequencing errors may result in the overestimation of TCR convergence (Looney et al., 2019).”

    1. Author Response:

      We have now revised the manuscript to address the helpful comments and criticisms from the reviewers. The revised manuscript includes additional experiments demonstrating that inclusion of Csn2/Cas9 in the in vitro assays does not suppress the disintegration activity of Cas1-Cas2 to favor integration. These additional factors do not confer strand selectivity on integration either. Furthermore, the results of integration reactions using substrates mimicking PAM-containing pre- spacers have also been added.

      New figures and figure modifications at a glance:

      1) The new Figure 2 shows Cas1-Cas2 reactions in a linear target site and the effects of Csn2 and/or Cas9 on proto-spacer insertion into this target (Reviewer 1).

      The original Figure 2 (with slight modifications) is now moved to ’Supplementary Data’ as Figure 2-figure supplement 2, and shows proto-spacer insertion by Cas1-Cas2 into a nicked linear target site (Reviewer 2). Figure 2 is the only one in the main set of figures that has been extensively modified.

      2) The new Figure 2-figure supplement 1 (under ‘Supplementary Data’) shows the effects of Csn2, Cas9 or both on proto-spacer integration-disintegration by Cas1-Cas2 when the target site is present in a supercoiled plasmid (Reviewer 1).

      3) The new Figure 4-figure supplement 1 lists the sequences of the full- and half-target sites used for the reactions shown in Figure 4 (Reviewer 2).

      4) The new Figure 2-figure supplement 3 shows the insertion properties of PAM-containing pre- spacer mimics in reactions with Cas1-Cas2 alone or supplemented with Csn2, Cas9 or both (Reviewer 1).

      5) The new Figure 6-figure supplement 1 gives a structural perspective of the trombone substrates used for the reactions shown in Figure 6B, C (Reviewer 1).

      6) The original Supplementary Figure S8 showing assays for PAM-specific cleavage by Cas1- Cas2 has been removed (Reviewer 1).

      7) There are no changes in the other figures under ‘Supplementary Data’, although several have new numbers consistent with the revisions made.

      Public Review (Reviewers #1 and #2):

      The present work is a critical extension of the in vitro biochemical activities of the Cas1- Cas2 complex described by Wright and Doudna (Nat Struct Mol Biol, 2016; 23: 876-883). We have kept all experimental conditions nearly identical to those used by these authors to make the results from the two studies directly comparable. Importantly, we now show that the prior model for proto-spacer integration into the CRISPR locus by Cas1-Cas2 is an oversimplification of a much more nuanced mechanism.

      While both reviewers recognize the importance of our findings in challenging the current thinking on the adaptation mechanism of CRISPR immunity, they express reservations as to whether the in vitro results recapitulate the in vivo mechanism of spacer acquisition. This seems to us to be too broad a criticism from which few (if any) biochemical experiments can be immune.

      Our key finding is that disintegration during the second step of proto-spacer integration generates a DNA structure that has all the hallmarks of a DNA damage intermediate that the bacterial repair machinery can readily process into an authentic integration product. We invoke no new or ad hoc mechanisms, and the model we propose fits neatly into the DNA gap-filling mechanisms known to operate in DNA transposition pathways.

      The proto-spacer is functionally a ‘micro-transposon’, whose shortness imposes severe torsional strain on the transposition intermediate that precedes the final integration product. In vitro experiments suggest that transcription is potentially capable of resolving this intermediate (Budhathoki et al., Nat Struct Mol Biol, 2020, 27: 489-99). In principle, replication can also accomplish this task. Our study now demonstrates that simply nicking the DNA (disintegration) is an equally effective solution for relieving the topological stress accompanying integration. DNA loose ends can then be readily tied up by the bacterial repair machinery.

      We concur with the concluding sentence of reviewer 2, “The simple conclusion that Cas1- Cas2 catalyzed hydrolysis of a phosphodiester may relieve strain and allow productive transposition to occur doesn’t get emphasized enough in my opinion.” We have now expanded on this point in the revised ‘Discussion’.

      Reviewer #1:

      In addition, the in vitro system used here is only partially reconstituted. The substrates lack a PAM sequence, which is necessary for protospacers to be incorporated in the correct orientation and may help direct the first integration event to the L-R junction. Presumably because of this all the reactions presented do not analyze the orientation of the incorporated prespacer sequence. Cas9 and Csn2 are also absent (as are other potentially required host factors), which are necessary for correct integration in vivo.

      1A. Strand specificity: The in vitro integration reactions with the Cas1-Cas2 complex were done using a protospacer of the optimal size (26 nt on each strand with the four 3’- proximal bases on each strand as unpaired). Either proto-spacer strand is equally competent to initiate the strand transfer reaction, as could be inferred from Figure 3 of the original submission. Here, reactions utilized modified proto-spacers that differed in their top and bottom strand lengths. They gave two insertion products (IP) each at the L-R (leader-repeat) and R-S (repeat-spacer) junctions of a normal target site. In modified targets in which integration was limited to just the L- R junction, two insertion products were formed. One panel of Figure 3 (which is retained in the revised manuscript) showing the four insertion products from the normal target (lane 10) and two from the modified targets (lanes 11-13) for a protospacer with 26 nt and 31 nt long strands is displayed below.

      The ability of either proto-spacer strand to initiate integration is now more directly shown in Figure 2 (new) of the revised manuscript. Here the labeled top or bottom strand of the proto- spacer (PS) gave insertion products (IP) at the L-R and R-S junctions of the target site. Panel B of Figure 2 (pasted below) demonstrates this result.

      1B. Cas9, Csn2 included reactions: The data for reactions containing Csn2 or Cas9 or both were not shown previously, as they did not alter Cas1-Cas2 activity by promoting strand specificity of integration or suppressing disintegration. These results are now shown in the revised Figure 2 (linear target) and the new Figure 2-figure supplement 1 (supercoiled target). Portions of these figures are shown below.

      The relevant revised text describing the lack of strand specificity to proto-spacer integration by Cas1-Cas2 and the Csn2/Cas9 effects on integration is pasted below.

      Page 15, lines 229-235.

      "Unlike orientation-specific proto-spacer integration in vivo, Cas1- Cas2 reactions in vitro showed no strand-specificity (Figure 2B). This bias-free insertion of the top or bottom strand from the proto-spacer was unchanged by the addition of Csn2 or Cas9 or both to the reactions (Figure 2C-E). These proteins, singly or in combiantion, also failed to stabilize proto-spacer integrations in the supercoiled plasmid target (Figure 2-figure supplement 1). Instead, they inhibited plasmid relaxation. Inhibition could occur at the level of integration per se or strand rotation during integration-disintegration"

      1C. PAM-containing substrates: We have now tested Cas1-Cas2 activity (with and without added Csn2 or Cas9 or both) on PAM-containing substrates that mimic ‘pre-spacers’, Figure 2- figure supplement 3 (new).

      In these substrates, a proto-spacer strand of the standard length (26 nt; lacking PAM or its complement) is inserted at the L-R junction with higher efficiency than the longer strand (containing PAM or its complement). Following the first integration at L-R, the pre-spacer mimics containing > 26 nt in one strand or both strands are inhibited in the second strand transfer to the R-S junction. A portion of Figure 2-figure supplement 3 illustrating theses points is shown below.

      The revised ‘Results’ section has the following added description of the activities of PAM- containing pre-spacer mimics.

      Pages 16-19, lines 265-297. Cas1-Cas2 activity on pre-spacer mimics carrying the PAM sequence

      "The strand cleavage and strand transfer steps of proto-spacer insertion at the CRISPR locus must engender safeguards against self-targeting of the inserted spacer as well as its non-functional orientation. However, no strand selectivity is seen in the in vitro Cas1-Cas2 reactions with already processed proto-spacers lacking the PAM sequence (Figures 2 and 3). By coordinating PAM- specific cleavage of a pre-spacer with transfer of this cleaved strand to the L-R junction, the inserted spacer will be in the correct orientation to generate a functional crRNA. To examine this possibility, we tested the integration characteristics of pre-spacer mimics containing the PAM sequence.

      The inclusion of PAM or PAM and its complement in the integration substrates (Figure 2- figure supplement 3A) did not confer strand specificity on reactions with Cas1-Cas2 alone or with added Csn2, Cas9 or both (Figure 2-figure supplement 3B-E). Optimal integration by Cas1-Cas2 occurred with the 26 nt strands of the native protospacer with their 4 nt 3’-overhangs (Figure 2- figure supplement 3B-E; lanes 2). The pre-spacer mimics containing one or both > 26 nt strands had reduced integration competence (Figure 2-figure supplement 3B-E; lanes 4). Even here, the 26 nt strand with the 4 nt overhang (Figure 2-figure supplement 3C; lane 4) was preferred in integration over the longer 29nt PAM-containing strand (Figure 2-figure supplement 3D; lane 4) or the 33 nt PAM complement-containing strand (Figure 2-figure supplement 3E; lane 4). In contrast to the processed proto-spacer that gave nearly equal integration at L-R and R-S, IP(L- R) ≈ IP(R-S) (Figure 2-figure supplement 3B-E; lanes 2), the longer pre-spacer mimics were inhibited in integration at R-S, IP(L-R) > IP(R-S) (Figure 2-figure supplement 3B-E lanes 4). This is the expected outcome if the initial strand transfer occurs at L-R, and a ruler-like mechanism orients the reactive 3’-hydroxyl for the second strand transfer at R-S. This sequential two-step scheme for proto-spacer integration is consistent with the results shown in Figure 3 as well. These reaction features were not modulated by Csn2 or Cas9 (Figure 2-figure supplement 3B-E; lanes 6 and 8), although Csn2 plus Cas9 was inhibitory (Figure 2-figure supplement 3B-E; lanes 10).

      There is no evidence for integration accompanying PAM-specific cleavage in our in vitro reactions. In the E. coli CRISPR system, Cas1-Cas2 is apparently sufficient for PAM-specific cleavage in vitro (22). By contrast, in the S. pyogenes system, cleavage is attributed to Cas9 or as yet uncharacterized bacterial nuclease(s) (35). The mechanism for generating an integration- proficient and orientation-specific proto-spacer, which may not be conserved among CRISPR systems, is poorly understood at this time."

    1. Author Response

      Reviewer #1 (Public Review):

      Kazrin appears to be implicated in many diverse cellular functions, and accordingly, localizes to many subcellular sites. Exactly what it does is unclear. The authors perform a fairly detailed analysis of Kazrin in-cell function, and find that it is important for the perinuclear localization of TfN, and that it binds to members of the AP-1 complex (e.g., gamma-adaptin). The authors note that the C-terminus of Kazrin (which is predicted to be intrinsically disordered) forms punctate structures in the cytoplasm that colocalize with components of the endosomal machinery. Finally, the authors employ co-immunoprecipitation assays to show that both N and C-termini of Kazrin interacts with dynactin, and the dynein light-intermediate chain.

      Much of the data presented in the manuscript are of fairly high quality and describe a potentially novel function for Kazrin C. However, I had a few issues with some of the language used throughout, the manner of data presentation, and some of their interpretations. Most notably, I think in its current form, the manuscript does not strongly support the authors' main conclusion: that Kazrin is a dynein-dynactin adaptor, as stated in their title. Without more direct support for this function, the authors need to soften their language. Specific points are listed below.

      Major comments:

      1) I agree with the authors that the data provided in the manuscript suggest that Kazrin may indeed be an endosomal adaptor for dynein-dynactin. However, without more direct evidence to support this notion, the authors need to soften their language stating as much. For example, the title as stated would need to be changed, as would much of the language in the first paragraph of the discussion. Alternatively, the manuscript could be significantly strengthened if the authors performed a more direct assay to test this idea. For example, the authors could use methods employed previously (e.g., McKenney et al., Science 2014) to this end. In brief, the authors can simply use their recombinant Kazrin C (with a GFP) to pull out dynein-dynactin from cell extracts and perform single molecule assays as previously described.

      While this is certainly an excellent suggestion, the in vitro dynein/dynactin motility assays are really not straight forward experiments for laboratories that do not use them as a routine protocol. That is why we asked Dr. Thomas Surrey (Centre for Genomic Regulation, Barcelona), an expert in the biochemistry and biophysics of microtubule dynamics, to help us with this kind of analysis. In their setting, TIRF microscopy is used to follow EGFPdynein/dynactin motility along microtubules immobilized on cover slides (Jha et al., 2017). As shown in figure R1, more binding of EGFP-dynein to the microtubules is observed when purified kazrin is added to the assay (from 20 to 400 nM), but there is no increase in the number or processivity of the EGFP-dynein motility events. These results are hard to interpret at this point. Kazrin might still be an activating adaptor but a component is missing in the assay (i. e. an activating posttranslational modification or a particular subunit of the dynein or dynactin complexes), or it could increase the processivity of dyneindynactin in complex with another bona fide activating adaptor, as it has been demonstrated for LIS1 (Baumbach et al., 2017; Gutierrez et al., 2017). Alternatively, kazrin could transport dynactin and/or dynein to the microtubule plus ends in a kinesin 1-dependent manner, in order to load the peripheral endosomes with the minus end directed motor (Yamada et al., 2008).

      Figure R1. Kazrin C purified from E. coli increases binding of dynein to microtubules but does not increase the number or processivity of EGFP-dynein motility events. A. TIRF (Total Internal Reflexion Fluorescence) micrographs of microtubule-coated cover slides incubated in the presence of 10 nM EGFP-dynein and 20 nM dynactin in the presence or absence of 20 nM kazrin C, expressed and purified from E. coli. B. Kymographs of TIRF movies of microtubule-coated cover slides incubated in the presence of purified 10 nM EGFP-dynein, 20 nM dynactin and either 400 nM of the activating adaptor BICD2 (1:2:40 ratio) (left panel) or kazrin C (right panel). Red squares indicate processive dynein motility events induced by BICD2”.

      Investigating the molecular activity of kazrin on the dynein/dynactin motility is a whole project in itself that we feel it is out of the scope of the present manuscript. Therefore, as suggested by the BRE, we have chosen to soften the conclusions and classify kazrin as a putative “candidate” dynein/dynactin adaptor based on its interactome, domain organization and subcellular localization, as well as on the defects installed in vivo on the endosome motility upon its depletion. We also discuss other possibilities as those outlined above.

      2) I'm not sure I agree with the use of the term 'condensates' used throughout the manuscript to describe the cytoplasmic Kazrin foci. 'Condensates' is a very specific term that is used to describe membraneless organelles. Given the presumed association of Kazrin with membrane-bound compartments, I think it's more reasonable to assume these foci are quite distinct from condensates.

      We actually used condensates to avoid implying that the kazrin IDR generates membraneless compartments or induces liquid-liquid-phase separation, which is certainly not a conclusion from the manuscript. However, since all reviewers agreed that the word was misleading, we have substituted the term condensates for foci throughout the manuscript.

      3) The authors note the localization of Tfn as perinuclear. Although I agree the localization pattern in the kazKO cells is indeed distinct, it does not appear perinuclear to me. It might be useful to stain for a centrosomal marker (such as pericentrin, used in Figure 5B) to assess Tfn/EEA1 with respect to MT minus ends.

      We have now changed the term perinuclear, which implies that endosomes surround the nucleus, by the term juxtanuclear, which more accurately define what we wanted to indicate (close to). We thank the reviewer for pointing out this lack of accuracy. We also more clearly describe in the text that in fibroblast, the Golgi apparatus and the Recycling Endosomes (REs) gather around the pericentriolar region ((Granger et al., 2014) and reference therein), which is usually close to the nucleus ((Tang and Marshall, 2012) and references therein). Nevertheless, as suggested by the reviewer, we have included pictures of the TxR-Tfn and EEA1-labelled endosomes accumulating around pericentrin in wild type mouse embryonic fibroblast (MEF) (Figure 1–supplement figure 3) to illustrate these points.

      4) "Treatment with the microtubule depolymerizing drug nocodazole disrupted the perinuclear localization of GFP-kazrin C, as well as the concomitant perinuclear accumulation of EE (Fig. 5C & D), indicating that EEs and GFP-kazrin C localization at the pericentrosomal region required minus end-directed microtubule-dependent transport, mostly affected by the dynactin/dynein complex (Flores-Rodriguez et al., 2011)."

      • I don't agree that the nocodazole experiment indicates that minus end-directed motility is required for this perinuclear localization. In the absence of other experiments, it simply indicates that microtubules are required. It might, however, "suggest" the involvement of dynein. The same is true for the subsequent sentence ("Our observations indicated that kazrin C can be transported in and out of the pericentriolar region along microtubule tracks...").

      We agree with the reviewer. To reinforce the point that GFP-kazrin C localization and the pericentriolar accumularion of EEA1 rely on dynein-dependent transport, we have now added an experiment in figure 5E and F, where we use ciliobrevin to inhibit dynein in cells expressing GFP-kazrin C. In the treated cells, we see that the GFP-kazrin C staining in the pericentrin foci is lost and that EEs have a more dispersed distribution, similar to kazKO MEF. We have also completed and rearranged the in vivo fluorescence microscopy data to more clearly show that small GFP-kazrin C foci can be observed moving towards the cell centre (Figure 5-S1 and movies 6 and 7). Taken all this data together, I think we can now suggest that kazrin might travel into the pericentriolar region, possibly along microtubules and powered by dynein.

      5) Although I see a few examples of directed motion of Tfn foci in the supplemental movies, it would be more useful to see the kymographs used for quantitation (and noted by the authors on line 272). Also related to this analysis, by "centripetal trajectories", I assume the authors are referring to those moving in a retrograde manner. If so, it would be more consistent with common vernacular (and thus more clear to readers) to use 'retrograde' transport.

      We have now included some more examples of the time projections used in the analysis in figure 6-S1 and 2, where we have coloured in blue the fairly straight, longer trajectories, as opposed to the more confined movements that appeared as round dots in the time projections (coloured in red). We have also added more videos illustrating the differences observed in cells expressing endogenous or GFP-kazrin C versus kazKO cells or kazKO cells expressing GFP or GFP-kazrin C-Nt. Movies 8 and 11 show the endosome motility in representative WT and kazKO cells (movie 8) and kazKO cells expressing GFP, GFPkazrin C or GFP-kazrin C Nt (movie 11). Movies 9 and 10 show endosome motility in four magnified fields of different WT and kazKO cells, where longer and faster motility events can be observed when endogenous kazrin is expressed. Movies 12 to 14 show endosome motility in four magnified fields of different kazKO cells expressing, GFP-kazrin C (movie 12), GFP (movie 13) and GFP-kazrin C-Nt (movie 14). Longer and faster movements can be observed in the different insets of movie 12, as compared with movies 13 and 14. Finally, as suggested by the reviewer, we have re-worded centripetal movement to retrograde movement throughout the manuscript.

      6) The error bars on most of the plots appear to be extremely small, especially in light of the accompanying data used for quantitation. The authors state that they used SEM instead of SD, but their reasoning is not stated. All the former does is lead to an artificial reduction in the real deviation (by dividing SD by the square root of whatever they define as 'n', which isn't clear to me) of the data which I find to be misleading and very nonrepresentative of biological data. For example, the error bars for cell migration speed in Figure 2B suggest that the speeds for WT cells ranged from ~1.7-1.9 µm/sec, which I'm assuming is largely underrepresenting the range of values. Although I'm not a statistician, as someone that studies biochemical and biological processes, I strongly urge the authors to use plots and error bars that more accurately describe the data to your readers (e.g., scatter plots with standard deviation are the most transparent way to display data).

      We have now changed all plots to scattered plots with standard deviations, as suggested.

    1. Author Response

      Reviewer #2 (Public Review):

      Wang et al. elegantly exploit single-cell RNA-seq datasets to question the putative involvement of lncRNAs in human germ cell development. In the first part of the study, the authors use computational approaches to identify and characterize, from existing data, lncRNAs expressed in the germline. Of note, the scRNA-seq data used were generated from polyA+ RNAs, and thus non-polyadenylated lncRNAs could not be retrieved. Most of the lncRNAs identified in the germ cells and in the somatic cells of the gonads were previously unannotated. While this increases the catalog of lncRNA genes in the human genome, further characterization is needed to determine which fraction of these newly identified lncRNAs represent bona fide transcripts or transcriptional noise.

      Differential expression analysis between developmental stages, sexes, or cell types led to several observations: (i) whatever the stage of development, the number of expressed lncRNAs is higher in fetal germ cells compared to gonadal somatic cells; (ii) there is a continuous increase in the number of expressed lncRNA during the development of the germline; of note, a similar, although the more subtle trend is observed for protein-coding genes; (iii) the developmental stage at which there is the highest number of lncRNA expressed differs between male and female germ cells. While convincing, the significance of these observations is difficult to assess. However, the authors remain prudent with their conclusion and are not over-interpreting their findings.

      We appreciate Reviewer #2 precise summary of our analysis and highlighting the significances of these datasets for other researchers and future studies.

      Interestingly, integrating lncRNA expression to classify cell types led to the identification of a novel population of cells in the female germline that had not been revealed by protein-coding gene only-based classification. The biological relevance of this population, which cluster with mitotic populations, remains to be demonstrated. Finally, by examining lncRNA biotype, the authors could demonstrate an enrichment, in the germ cells, of the antisense head-to-head organization (in relation to the nearby protein-coding gene) compared to other biotypes. Whether this is different from the general distribution of lncRNA should be discussed.

      We analyzed the lncRNAs in NONCODEv5 database (human genome), and the result showed that XH type occupied 21.73% of the intragenic lncRNA-mRNA pairs in NONCODEv5 database (human genome), which is lower than 26.58% in fGC and 26.23% in mGC (Response Figure 1).

      Response Figure 1. Genomic distribution and biotypes of the lncRNAs in NONCODEv5 database and lncRNAs expressed in human gonad.

      In the second part of the manuscript, Wang et al focus on one pair of divergent lncRNA-protein coding genes (LNC1845-LHX8). To document the choice of this particular pair, it would be informative to have its correlation score indicated in Figure 3C. he existence of this transcript was validated using female fetal ovaries, and its function was addressed in late primordial germ cells like cells (PGCLC) derived from human embryonic stem cells (hESCs). The authors have used an admirable set of orthogonal approaches that led them to conclude as to a role for LNC1845 in regulating in cis the nearby gene LHX8. They further went on to identify the underlying mechanisms, which involve modification of the chromatin landscape through direct interaction of LNC1845 with a histone modifier. Among the different strategies used (KO, stop transcription, overexpression), the shRNA-mediated knock-down is the only one to specifically address the function of the transcript itself, as opposed to the active transcription. The result of this experiment led the authors to conclude that the LNC1845 RNA is functional, a conclusion that is reinforced by the demonstration of physical interaction between the LNC1845 RNA and WDR5, a component of MLL methyltransferase complexes. The result of the KD experiment is however puzzling as RNAi has been shown not to be the method of choice for targeting nuclear lncRNAs (Lennox et al. NAR 2016).

      We thank the Reviewer #2’s suggestion to add the correlation score of LNC1845-LHX8 pair and the Pearson Correlation of this pair is 0.3268. We have added the number to Figure 4C because which the expression correlation of LNC1845 and LHX8 was first mentioned. We have compared many other similar studies, shRNA knockdown has been widely used to target nuclear lncRNAs (Guttman et al. Nature 2011; Luo et al. Cell Stem Cell 2016; Subhash et al. Nucleic Acids Res. 2018; Li et al. Genome Res 2021), and the knockdown efficiency seemed to be feasible and acceptable to be used. The knockdown results are consistent with the deletion mutation and stop transcription approaches, all three showed that LNC1845 transcriptional expression is required for proper LHX8 expression in late PGCLCs.

      Overall, the functional investigation is convincing and strengthened by the inclusion of multiple clones for each approach, and by the convergence in the outcome of each individual approach. The depth of characterization is also remarkable. The analyses of the mechanisms at stake are somehow less solid, as there is less evidence demonstrating the involvement of the LNC1845 RNA and its interaction with WDR5.

      We have added more experimental evidence to strengthen the model especially the interaction of LNC1845 and WDR5. Apart from the RIP-qPCR results of WDR5 demonstrating the enrichment of LNC1845 by WDR5 pulldown (Figure S8D), we performed chromatin isolation by RNA purification (ChIRP) assay using antisense oligos along the entire LNC1845 transcript sequence. ChIRP results confirmed that WDR5 protein were enriched when anti-LNC1845 oligo probes were used to isolate the complex but not the controls without the probes or without overexpression of LNC1845 transcript (Response Figure 2). Taken together, the findings of both approaches support the model that LNC1845 directly interacts with WDR5 to modulate the H3K4me3 modification for LHX8 transcriptional activation. (Related to supplementary figure 8D and 8E.)

      Response Figure 2. LNC1845 binding for WDR5 was verified by CHIRP-western blot.

      Altogether, this study provides a convincing demonstration of the role of a lncRNA on the regulation of a nearby gene in the context of the germline. However, to have a better understanding of the functionality of lncRNA genes in general, it would be interesting to know whether other pairs of lncRNA-PC genes have been functionally investigated in this context, where no function for the lncRNA gene could be demonstrated. Negative results are highly informative and if so, these could be included in the manuscript.

      We appreciate Reviewer #2 suggestion to add other lncRNA-PC gene pairs results. In fact, we have analyzed and presented the results of another 2 pairs in figure 7D. LncRNAs LNC3346 and LNC15266 were also transcriptionally regulated by FOXP3, and they may regulate their neighbor genes TMCO1 and MPP5, as figure 7D showed. Our analysis showed that other lncRNA-PC gene pairs may also have the similar transcriptional regulation as LNC1845-LHX8 during germ cell development.

    1. Author Response

      Reviewer #2 (Public Review):

      Charme is a long non-coding RNA reported by the authors in their previous studies. Their previous work, mainly using skeletal muscles as a model, showed the functional relevance of Charme, and presented data demonstrating its nuclear role, primarily via modulating the sub-nuclear localization of Matrin 3 (MATR3). Their data from skeletal muscles suggested that loss of the intronic region of Charme affects the local 3D genome organization, affecting MATR3 occupancy and this gene expression. Loss of Charme in vivo leads to cardiac defects. In this manuscript, they characterize the cardiac developmental defects and present molecular data supporting how the loss of Charme affects the cardiac transcriptome repertoire. Specifically, by performing whole transcriptome analysis in E12.5 hearts, they identify gene expression changes affected in developing hearts due to loss of Charme. Based on their previous study in skeletal muscles, they assume that Charme regulates cardiac gene expression primarily via MATR3 also in developing cardiomyocytes. They provide CLIP-seq data for MATR3 (transcriptome-wide foot printing of MATR3) in wild-type E15.5 hearts and connect the binding of MATR3 to gene expression changes observed in Charme knockout hearts. I credit the authors for providing CLIP seq data from in vivo embryonic samples, which is technically demanding.

      Major strengths:

      Although, as previously indicated by the authors in Charme knockout mice, the major strength is the effect of Charme on cardiac development. While the phenotype might be subtle, the functional data indicate that the role of Charme is essential for cardiac development and function. The combinatorial analysis of MATR3 CLIP-seq and transcriptional changes in the absence of Charme suggests a role of Charme that could be dependent on MATR3.

      We thank this reviewer for appreciating our methodological efforts and the importance of the MATR3 CLIP-seq data from in vivo embryonic samples.

      Weakness:

      (i) Nuclear lncRNAs often affect local gene expression by influencing the local chromatin.

      Charme locus is in close proximity to MYBPC2, which is essential for cardiac function, sarcomerogenesis, and sarcomere maintenance. It is important to rule out that the cardiac-specific developmental defects due to Charme loss are not due to (a) the influence of Charme on MYBPC2 or, of that matter, other neighboring genes, (b) local chromatin changes or enhancer-promoter contacts of MYBPC2 and other immediate neighbors (both aspects in the developmental time window when Charme expression is prominent in the heart, ideally from E11 to E15.5)

      Although the cis-activity represents a mechanism-of-action for several lncRNAs, our previous work does not reveal this kind of activity for pCharme. To add stronger evidence, we have now analysed the expression of pCharme neighbouring genes in cardiac muscle. Genes were selected by narrowing the analysis not only on the genes in “linear” proximity but also on eventual chromatin contacts, which may underlie possible candidates for in cis regulation. To this purpose, we made use of the analyses that in the meantime were in progress (to answer point iv) on available Hi-C datasets (Rosa- Garrido et al. 2017). Starting from a 1 Mb region around Charme locus, we found that most of the interactions with Charme occur in a region spanning from 240 kb upstream and 115 kb downstream of Charme for a total of 370 Kb (Rev#2_Capture Fig. 1A). This region includes 39 genes, 9 of them expressed in the neonatal heart but none showing significant deregulation (see Table S2). To note, this genomic region also included the MYBPC2 locus, for which we did not find a decreased expression in the heart from our RNA-seq data (Revised Figure 2-figure supplement 1C and Table S2). This trend was confirmed through RT-qPCR analyses of several genes from E15.5 extracts, which revealed no significant difference in their abundance upon Charme ablation (Rev#2_Capture fig. 1B).

      Fig. 1. A) Contact map depicting Hi-C data of left ventricular mice heart retrived from GEO accession ID GSM2544836. Data related to 1 Mb region around Charme locus were visualized using Juicebox Web App (https://aidenlab.org/juicebox/). B) RT-qPCR quantification of Charme and its neighbouring genes in CharmeWT vs CharmeKO E15.5.5 hearts. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools. Data information: p < 0.05; p < 0.01, **p < 0.001 unpaired Student’s t test.

      For a better understanding, we also checked possible “local” Charme activities in skeletal muscle cells, from previous datasets (Ballarino et al., 2018). We found that in murine C2C12 cells treated with two different gapmers against Charme, three of its neighbouring genes were expressed (Josd2, Emc10 and Pold1), but none showed significant alterations in their expression levels in response to Charme knock-down (Rev#2_Capture Fig. 2).

      Taken together, these results would exclude the possibility of Charme in cis activity as responsible for the phenotype.

      Fig. 2: Average expression from RNA-seq (FPKM) quantification of Charme neighbouring genes in C2C12 differentiated myotubes treated with Gap-scr vs Gap-Charme. Values for Gap-Charme represent the average values of gene expression after treatment with two different gapmers (GAP-2 and GAP-2/3).

      (ii) The authors provide data indicating cardiac developmental defects in Charme knockouts. Detailed developmental phenotyping is missing, which is necessary to pinpoint the exact developmental milestones affected by Charme. This is critical when reporting the cell type/ organ-specific developmental function of a newly identified regulator.

      We did our best to answer this concern.

      Let us first emphasise that, since their generation, we have never observed any particular tissue alteration, morphological or physiological, when dissecting the CharmeKO animals other than the muscular ones. The high specificity of pCharme expression, as also shown here by ISH (Figure 1C-D, Figure 1-figure supplement 1A-B, Figure 3A), together with the minimal alteration applied to the locus for CRISPR-Cas-mediated KO (PolyA insertion), strongly excludes the presence of an alteration in other tissues and their involvement in the development of the phenotype.

      Nevertheless, we now add more developmental details to the cardiac phenotype (see also Essential revision point 2).

      1- First of all, gene expression analyses performed at 12.5E, 15.5E, 18.5E and neonatal (PN2) stages allowed us to identify, at the molecular level, the developmental time point when CharmeKO effects on the cardiac muscle can be found. Our new results clearly indicate that the pCharme-mediated regulation of morphogenic and cardiac differentiation genes is detectable from E15.5 fetal stage onward (Rev#2_Capture Fig. 3/Revised Figure 2E). Together with the analysis of pCharme targets and coherently with the altered cardiac maturation and performance, this evidence is also supported by the analysis of the myosins Myh6/Myh7 ratio, which diminution in CharmeKO hearts starts from E15.5 up to 69% of control levels at PN stages (Revised Figure 2F).

      2- Hematoxylin-eosin staining of dorso-ventral cryosections from CharmeWT and CharmeKO hearts confirmed the fetal malformation at the E15.5 stage (Revised Figure 2G). Moreover, the hypotrabeculation phenotype of CharmeKO hearts, which was initially examined by immunofluorescence, now finds confirmation by the analysis of key trabecular markers (Irx3 and Sema3a), which expression significantly decreases upon pCharme ablation (Rev#1_Capture Fig. 3B/Revised Figure 2-figure supplement 1G).

      3- Finally, the gene expression analysis on Ki-67, Birc5 and Ccna2 (Revised Figure 2-figure supplement 1E) definitively rules out the influence of pCharme ablation on cell-cycle genes and cardiomyocytes proliferation, thus allowing a more careful interpretation of the embryonic phenotype. Note that, coherently with the lncRNA implication at later stages of development, the expression of important cardiac regulators, such as Gata4, Nkx2-5 and Tbx5, is not altered by its ablation at any of the tested time points (Rev#2_Capture Fig.3), while pCharme absence mainly affects genes which are expressed downstream of these factors.

      These new results have been included in the revised version of the manuscript and better discussed.

      Fig. 3: RT-qPCR quantification Gata4, Nkx2-5 and Tbx5 in CharmeWT and CharmeKO cardiac extract at E12.5, E15.5 and E18.5 days of embryonal development. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools.

      (iii) Along the same line, at the molecular level, the authors provide evidence indicating a change in the expression of genes involved in cardiogenesis and cardiac function. Based on changes in mRNA levels of the genes affected due to loss of Charme and based on immunofluorescence analysis of a handful of markers, they propose a role of Charme in cell cycle and maturation. Such claims could be toned down or warrant detailed experimental validation.

      See above, response to Reviewer #2 (Public Review) weakness (ii).

      (iv) Authors extrapolate the mechanistic finding in skeletal muscle they reported for Charme to the developing heart. While the data support this hypothesis, it falls short in extending the mechanistic understanding of Charme beyond the papers previously published by the authors. CLIP-seq data is a step in the right direction. MATR3 is a relatively abundant RBP, binding transcriptome-wide, mainly in the intronic region, based on currently available CLIP-seq data, as well as shown by the authors' own CLIP seq in cardiomyocytes. It is also shown to regulate pre-mRNA splicing/ alternative splicing along with PTB (PMID: 25599992) and 3D genome organization (PMID: 34716321). In addition, the authors propose a MATR3 depending molecular function for Charme primarily dependent on the intronic region of Charme and due to the binding of MATR3. Answering the following question would enable a better mechanistic understanding of how Charme controls cardiac development.

      (i) what are the proximal genomic regions in the 3D space to Charme locus in embryonic cardiomyocytes? Authors can re-analysis published Hi-C data sets from embryonic cardiomyocytes or perform a 4-C experiment using Charme locus for this purpose.

      See above, response to Reviewer #2 (Public Review) weakness (i).

      (ii) does the loss of Charme affect the splicing landscape of MATR3 bound pre-mRNAs in E12.5 ventricles in general and those arising from the NCTC region specifically?

      This is an intriguing issue, as also highlighted by new evidence showing that the reactivation of fetal-specific RNA-binding proteins, including MATR3, in the injured heart drives transcriptome-wide switches through the regulation of early steps of RNA transcription and processing (D'Antonio et al., 2022).

      Using the rMATS software on our neonatal RNA-Seq datasets we then investigated the effect of pCharme depletion on splicing, with a focus on NCTC. As shown in the Rev#2_Capture Fig.4A, all classical splicing alterations were investigated, such as exon-skipping, alternative 5’ splice site, alternative 3’ splice site, mutually excluded exons and intron retention. Intriguingly, we did observe a slight alteration in the splicing patterns, in particular considering exon skipping events (62% corresponding to 381 genes). Among them, the majority corresponded to exon exclusion events (237 events = 209 genes) while a smaller fraction to exon inclusion (144 events = 133 genes). Moreover, by intersecting these genes with the MATR3-bound RNAs we found a slightly significant enrichment (p=0,038) for exon inclusion (Rev#2_Capture Fig.4B).

      Regarding the NCTC locus, we demonstrate that in hearts pCharme acts through different target genes. Indeed, none of the NCTC-arising transcripts are bound by MATR3 (see Table S4) or substrate for alternative splicing regulation.

      While these results are very interesting for deepening the investigation of pCharme/MATR3 interplay, their biological significance needs to be further investigated through one-by-one analysis of specific transcripts. As a prosecution of the project, Nanopore sequencing of these samples on a MinION platform is currently undergoing in the lab to obtain a better characterization of alternative splicing events in response to the lncRNA ablation during development.

      Fig. 4: A) Left and middle panel: Pie Chart depicting the proportion of significantly altered (FDR < 0.05) splicing events detected by rMATS comparing neonatal CharmeWT and CharmeKO RNA-seq samples. All classical splicing alterations were investigated, such as exon-skipping, alternative 3’ splice site (A3SS), intron retention, alternative 5’ splice site (A5SS) and mutually excluded exons (MXE). Right panel. Volcano plot depicting significant exon skipping events in CharmeKO (FDR < 0.05, PSI<0 for excluded and included exons, FDR >= 0.05 for invariant exons). X-axis represent exon-inclusion ratio or Percentage Spliced In (PSI) while y-axis represent –log10 of p-value. B) Pie charts representing the fraction of transcripts with at least one significant excluded (left panel), invariant (middle panel) and included (right panel) exons that are bound by MATR3. P-values of MATR3 targets enrichment for each comparison is depicted below. Statistical significance was assessed with Fisher exact test.

      (iii) MATR3 binds DNA, as also shown by authors in previous studies. Is the MATR3 genomic binding altered by Charme loss in cardiomyocytes globally, as well as on the loci differentially expressed in Charme knockout heart? Overlapping MATR3 genomic binding changes and transcriptome binding changes to differentially expressed genes in the absence of Charme would better clarify the MATR3-centric mechanisms proposed here. Further connecting that to 3D genome changes due to Charme loss could provide needed clarity to the mechanistic model proposed here.

      Previous experience from our (Desideri et al., 2020) and other labs (Zeitz et al 2009 J Cell Biochem), indicate that Chromatin IP is not the most suitable approach for identifying MATR3 specific targets because of the broad distribution of MATR3 over the genome. Given the number of animals that would need to be sacrificed, we moved further to strengthen our MATR3 CLIP evidence by adding the i) CharmeKO MATR3 CLIP-seq control and the ii) combinatorial analysis of MATR3 CLIP-seq with the RNA-seq data.

      We have better explained the reasoning within the text, which now reads “The known ability of MATR3 to interact with both DNA and RNA and the high retention of pCharme on the chromatin may predict the presence of chromatin and/or specific transcripts within these MATR3-enriched condensates. In skeletal muscle cells, we have previously observed on a genome-wide scale, a global reduction of MATR3 chromatin binding in the absence of pCharme (Desideri et al., 2020). Nevertheless, the broad distribution of the protein over the genome made the identification of specific targets through MATR3-ChIP challenging.” (lines 274-279).

      Indeed, we found that MATR3 binding was significantly decreased on numerous peaks (434/626), while its increase was observed on a smaller fraction of regions (192/626) (Revised Figure 5C). As a control, we performed MATR3 motif enrichment analysis on the differentially bound regions revealing its proximity to the peak summit (+/- 50 nt) (Revised Figure 5-figure supplement 1D) close to the strongest enrichment of MATR3, further confirming a direct and highly specific binding of the protein to these sites. To better characterise the relationship between MATR3 and pCharme, we then intersected the newly identified regions with the MATR3-bound transcripts whose expression was altered by Charme depletion. While gain peaks were equally distributed across DEGs, loss peaks were significantly enriched in a subset of pCharme down-regulated DEGs (Revised Figure 5D), suggesting a crosstalk between the lncRNA and the protein in regulating the expression of this specific group of genes. Interestingly, these RNAs mainly distribute across the same GO categories as pCharme downregulated DEGs and include genes, such as Cacna1c, Notch3, Myo18B and Rbm20 involved in embryo development and validated as pCharme/Matr3 targets in primary cardiac cells (Revised Figure 5D, lower panel and 5E)

    1. Author Response

      Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:

      The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest.

      We are glad to see that the reviewer finds our study interesting overall and sees value in the experimental design. We agree that in the previous version, we did not provide enough motivation for the specific tasks we employed and the cortical areas studied.

      Navigating to reward locations based on sensory cues is a behavior that is crucial for survival and amenable to a head-fixed laboratory setting in virtual reality for mice. In this context of goal-directed navigation based on sensory cues, we chose to center our study on posterior cortical association areas, PPC and RSC, for several reasons. RSC has been shown to be crucial for navigation across species, poised to enable the transformation between egocentric and allocentric reference frames and to support spatial memory across various timescales (Alexander & Nitz, 2015; Fischer et al., 2020; Pothuizen et al., 2009; Powell et al., 2017). It furthermore has been shown to be involved in cognitive processes beyond spatial navigation, such as temporal learning and value coding (Hattori et al., 2019; Todd et al., 2015), and is emerging as a crucial region for the flexible integration of sensory and internal signals (Stacho & ManahanVaughan, 2022). It thus is a prime candidate area in the study of how cognitive experience may affect cortical involvement in goal-directed navigation.

      RSC is heavily interconnected with PPC, which is generally thought to convert sensory cues into actions (Freedman & Ibos, 2018) and has been shown to be important for navigation-based decision tasks (Harvey et al., 2012; Pinto et al., 2019). Specific task components involving short-term memory have been suggested to cause PPC to be necessary for a given task (Lyamzin & Benucci, 2019), so we chose such task components in our complex tasks to maximize the likelihood of large PPC involvement to compare the simple task to.

      One such task component is a delay period between cue and the ultimate choice report, which is a common design in decision tasks (Goard et al., 2016; Harvey et al., 2012; Katz et al., 2016; Pinto et al., 2019). We agree with the reviewer that traditionally such a task would be referred to as a workingmemory task. However, we refrain from using this terminology because it may cause readers to expect that to solve the task, mice use a working-memory dependent strategy in its strictest and most traditional sense, that is mice show no overt behaviors indicative of the ultimate choice until the end of the delay period. If the ultimate choice is apparent earlier, mice may use what is sometimes referred to as an embodiment-based strategy, which by some readers may be seen as precluding working memory. Indeed, in new choice-decoding analyses from the mice’s running patterns, we show that mice start running towards the side of the ultimate choice during the cue period already (Figure 1—figure supplement 1). Regardless of these seemingly early choices, however, we crucially have found much larger performance decrements from inhibition in mice performing the delay task compared to mice performing the simple task, along with lower overall task performance in the delay task, indicating that the insertion of a delay period increased subjective task difficulty. As traditional working-memory versus embodiment-based strategies are not the focus of our study here and do not seem to inform the performance decrements from inhibition, we chose to label the task descriptively with the crucial task parameter rather than with the supposedly underlying cognitive process.

      For the switching task, we appreciate that the reviewer sees similarities to a two-armed bandit task. However, in a two-armed bandit task, rewards are typically delivered probabilistically, whereas in our task, cue and action values are constant within each of the two rule blocks, and only the rule, i.e. the cuechoice association, reverses across blocks. This is a crucial distinction because in our design, blocks of Rule A in the switching task are identical to the simple task, with fixed cue-choice associations and guaranteed reward delivery if the correct choice is made, allowing a fair comparison of cortical involvement across tasks.

      We have now heavily revised the introduction, results, and discussion sections of the manuscript to better explain the motivation for the tasks and the investigated brain areas. These revisions cover all the points mentioned in this response.

      Furthermore, we agree with the reviewer that the three tasks are qualitatively different and likely depend on at least partially dissociable circuits. We consider the large differences in cortical inhibition effects between the simple and the complex tasks as evidence for this notion. We also want to highlight that in fact, we performed task-specific optogenetic manipulations presented in the Supplementary Material to further understand the involvement of different areas in task-specific processes. In what is now Figure 1—figure supplement 4, we restricted inhibition in the delay task to either the cue period only or delay period only, finding that interestingly, PPC or RSC inhibition during either period caused larger performance drops than observed in the simple task. We also performed epoch-specific inhibition of PPC in the switching task, targeting specifically reward and inter-trial-interval periods following rule switches, in what is now Figure 1—figure supplement 5. With such PPC inhibition during the ITI, we observed no effect on performance recovery after rule switches and thus found PPC activity to be dispensable for rule updates.

      For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      We thank the reviewer for pointing out the lack of information on delay duration and have now added this to the Methods section.

      We agree that in classical working memory tasks where the delay duration is purely defined by the experimenter and varied throughout a session, performance is typically dependent on delay duration. However, in our delay task, the delay distance is kept constant, and thus the delay is not varied by the experimenter. Instead, the time spent in the delay period is determined by the mouse, and the only source of variability in the time spent in the delay period is minor differences in the mice’s running speeds across trials or sessions. Notably, the differences in time in the delay period were greatest between mice because some mice ran faster than others. Within a mouse, the time spent in the delay period was generally rather consistent due to relatively constant running speeds. Also, because the mouse had full control over the delay duration, it could very well speed up its running if it started to forget the cue and run more slowly if it was confident in its memory. Thus, because the delay duration was set by the mouse and not the experimenter, it is very challenging or impossible to interpret the meaning and impact of variations in the delay duration. Accordingly, we had no a priori reason to expect a relationship between task performance and delay duration once mice have become experts at the delay task. Indeed, we do not see such a relationship in our data (see plot here, n = 85 sessions across 7 mice). In order to test the effect of delay duration on behavioral performance, we would have to systematically change the length of the delay period in the maze, which we did not do and which would require an entirely new set of experiments.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      We acknowledge that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      We completely agree with the reviewer that the periods following rule switches are an essential part of the switching task and of high interest. Indeed, ongoing work in the lab is carefully quantifying the mice’s strategy in this task and exploring how mice use errors after switches to update their belief about the rule. In this project, however, a detailed quantification of switching task strategy seemed beyond the scope because our focus was on training history and not on the specifics of each task. While we agree with the reviewer about the interesting nature of the switching period, it would be too much for a single paper to investigate the detailed mechanisms of each task on top of what we already report for training history. Instead, we have now added quantifications of performance recovery after rule switches in Figure 1— figure supplement 2, showing that rule switches cause below-chance performance initially, followed by recovery within tens of trials.

      2) Training history vs learning sets vs behavioral flexibility:

      The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      We thank the reviewer for raising these interesting ideas. We have now evaluated these ideas in the context of our experimental design and results. One of the main points to consider is that for mice transitioned from either of the complex tasks to the simple task, the simple task is not a novel task, but rather a well-known simplification of the previous tasks. Mice that are experts on the delay task have experienced the simple task, i.e. trials without a delay period, during their training procedure before being exposed to delay periods. Switching task expert mice know the simple task as one rule of the switching task and have performed according to this rule in each session prior to the task transition. Accordingly, upon to the transition to the simple task, both delay task expert mice and switching task expert mice perform at very high levels on the very first simple task session. We now quantify and report this in Figure 2—figure supplement 1 (A, B). This is crucial to keep in mind when assessing ‘learning sets’ or ‘behavioral flexibility’ as possible explanations for the persistent cortical involvement after the task transitions. In classical learning sets paradigms, animals are exposed to a series of novel associations, and the learning of previous associations speeds up the learning of subsequent ones (Caglayan et al., 2021; Eichenbaum et al., 1986; Harlow, 1949). This is a distinct paradigm from ours because the simple task does not contain novel associations that are new to the mice already trained on the complex tasks. Relatedly, the simple task is unlikely to present a challenge of behavioral flexibility to these mice given our experimental design and the observation of high simple task performance in the first session after the task transition.

      We now clarify these points in the introduction, results, and discussion sections, also acknowledging that it will be of interest for future work to investigate how learning sets may affect cortical task involvement.

      3) Calcium imaging data versus interventions:

      The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

      We thank the reviewer for pointing out that the relationship between the inhibition dataset and calcium imaging dataset is not clear enough. We restricted analyses of inhibition and calcium imaging data in the switching task to the identical cue-choice associations as present in the simple task (i.e. Rule A trials of the switching task). We did this because we sought to make the fairest and most convincing comparison across tasks for both datasets. However, we can now see that not reporting results with trials from the other rule causes concerns that the reported differences across tasks may only hold for a specific subset of trials.

      We have now added analyses of optogenetic inhibition effects and calcium imaging results considering Rule B trials. In Figure 1—figure supplement 2, we show that when considering only Rule B trials in the switching task, effects of RSC or PPC inhibition on task performance are still increased relative to the ones observed in mice trained on and performing the simple task. We also show that overall task performance is lower in Rule B trials of the switching task than in the simple task, mirroring the differences across tasks when considering Rule A trials only.

      We extended the equivalent comparisons to the calcium imaging dataset, only considering Rule B trials of the switching task in Figure 4—figure supplement 3. With Rule B trials only, we still find larger mean activity and trial-type selectivity levels in RSC and PPC, but not in V1, compared to the simple task, as well as lower noise correlations. We thus find that our conclusions about area necessity and activity differences across tasks hold for Rule B trials and are not due to only considering a subset of the switching task data.

      In Figure 4—figure supplement 4, we further leverage the inclusion of Rule B trials and present new analyses of different single-neuron selectivity categories across rules in the switching task, reporting a prevalence of mixed selectivity in our dataset.

      Furthermore, to clarify the link between the optogenetic inhibition and the calcium imaging datasets, we have revised the motivation for the imaging dataset, as well as the presentation of its results and discussion. Investigating an area’s neural activity patterns is a crucial first step towards understanding how differential necessity of an area across tasks or experience can be explained mechanistically on a circuit level. We now elaborate on the fact that mechanistically, changes in an area’s necessity may or may not be accompanied by changes in activity within that area, as previous work in related experimental paradigms has reported differences in necessity in the absence of differences in activity (Chowdhury & DeAngelis, 2008; Liu & Pack, 2017). This phenomenon can be explained by differences in the readout of an area’s activity. We now make more explicit that in contrast to the scenario where only the readout changes, we find an intriguing correspondence between increased necessity (as seen in the inhibition experiments) and increased activity and selectivity levels (as seen in the imaging experiments) in cortical association areas depending on the current task and previous experience. Rather than attributing the increase in necessity solely to these observed changes in activity, we highlight that in the simple task condition already, cortical areas contain a high amount of task information, ruling out the idea that insufficient local information would cause the small performance deficits from inhibition. Our results thus suggest that differential necessity across tasks and experience may still require changes at the readout level despite changes in local activity. We view our imaging results as an exciting first step towards a mechanistic understanding of how cognitive experience affects cortical necessity, but we stress that future work will need to test directly the relationship between cortical necessity and various specific features of the neural code.

      Reviewer #2 (Public Review):

      The authors use a combination of optogenetics and calcium imaging to assess the contribution of cortical areas (posterior parietal cortex, retrosplenial cortex, S1/V1) on a visual-place discrimination task. Headfixed mice were trained on a simple version of the task where they were required to turn left or right depending on the visual cue that was present (e.g. X = go left; Y = go right). In a more complex version of the task the configurations were either switched during training or the stimuli were only presented at the beginning of the trial (delay).

      The authors found that inhibiting the posterior parietal cortex and retrosplenial cortex affected performance, particularly on the complex tasks. However, previous training on the complex tasks resulted in more pronounced impairments on the simple task than when behaviourally naïve animals were trained/tested on a simple task. This suggests that the more complex tasks recruit these cortical areas to a greater degree, potentially due to increased attention required during the tasks. When animals then perform the simple version of the task their previous experience of the complex tasks is transferred to the simple task resulting in a different pattern of impairments compared to that found in behaviorally naïve animals.

      The calcium imaging data showed a similar pattern of findings to the optogenetic study. There was overall increased activity in the switching tasks compared to the simple tasks consistent with the greater task demands. There was also greater trial-type selectivity in the switching task compared to the simple task. This increased trial-type selectivity in the switching tasks was subsequently carried forward to the simple task so that activity patterns were different when animals performed the simple task after experiencing the complex task compared to when they were trained on the simple task alone

      Strengths:

      The use of optogenetics and calcium-imaging enables the authors to look at the requirement of these brain structures both in terms of necessity for the task when disrupted as well as their contribution when intact.

      The use of the same experimental set up and stimuli can provide a nice comparison across tasks and trials.

      The study nicely shows that the contribution of cortical regions varies with task demands and that longerterm changes in neuronal responses c can transfer across tasks.

      The study highlights the importance of considering previous experience and exposure when understanding behavioural data and the contribution of different regions.

      The authors include a number of important controls that help with the interpretation of the findings.

      We thank the reviewer for pointing out these strengths in our work and for finding our main conclusions supported.

      Weaknesses:

      There are some experimental details that need to be clarified to help with understanding the paper in terms of behavior and the areas under investigation.

      The use of the same stimuli throughout is beneficial as it allows direct comparisons with animals experiencing the same visual cues. However, it does limit the extent to which you can extrapolate the findings. It is perhaps unsurprising to find that learning about specific visual cues affects subsequent learning and use of those specific cues. What would be interesting to know is how much of what is being shown is cue specific learning or whether it reflects something more general, for example schema learning which could be generalised to other learning situations. If animals were then trained on a different discrimination with different stimuli would this previous training modify behavior and neural activity in that instance. This would perhaps be more reflective of the types of typical laboratory experiments where you may find an impairment on a more complex task and then go on to rule out more simple discrimination impairments. However, this would typically be done with slightly different stimuli so you don't introduce transfer effects.

      We agree with the reviewer that investigating the effects of schema learning on cortical task involvement is an exciting future direction and have now explicitly mentioned this in the Discussion section. As the reviewer points out, however, our study was not designed to test this idea specifically. Because investigating schema learning would require developing and implementing an entirely new set of behavioral task variants, we feel this is beyond the scope of the current work. As to the question of how generalized the effects of cognitive experience are, our data in the run-to-target task suggest that if task settings are sufficiently distinct, cortical involvement can be similarly low regardless of complex task experience (now Figure 3—figure supplement 1). This finding is in line with recent work from (Pinto et al., 2019), where cortical involvement appears to change rapidly depending on major differences in task demands. However, work in MT has shown that previous motion discrimination training using dots can alter MT involvement in motion discrimination of gratings (Liu & Pack, 2017), highlighting that cortical involvement need not be tightly linked to the sensory cue identity.

      It is not clear whether length of training has been taken into account for the calcium imaging study given the slow development of neural representations when animals acquire spatial tasks.

      We apologize that the training duration and the temporal relationship between task acquisition and calcium imaging was not documented for the calcium imaging dataset. Please see our detailed reply below the ‘recommendations for the authors’ from Reviewer 2 below.

      The authors are presenting the study in terms of decision-making, however, it is unclear from the data as presented whether the findings specifically relate to decision making. I'm not sure the authors are demonstrating differential effects at specific decision points.

      We understand that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      While we removed the emphasis on the decision-making process in our tasks, we found the reviewer’s suggestion to measure ‘decision points’ a useful additional behavioral characterization across tasks. So, we quantified how soon a mouse’s ultimate choice can be decoded from its running pattern as it progresses through the maze towards the Y-intersection. We now show these results in Figure 1—figure supplement 1. Interestingly, we found that in the delay task, choice decoding accuracy was already very high during the cue period before the onset of the delay. Nevertheless, we had shown that overall task performance and performance with inhibition were lower in the delay task compared to the simple task. Also, in segment-specific inhibition experiments, we had found that inhibition during only the delay period or only the cue period decreased task performance substantially more than in the simple task, thus finding an interesting absence of differential inhibition effects around decision points. Overall, how early a mouse made its ultimate decision did not appear predictive of the inhibition-induced task decrements, which we also directly quantify in Figure 1—figure supplement 1.

    1. Author Response

      Reviewer 2 (Public Review):

      1) The authors developed a novel C.elegans model for studying extracellular amyloid beta aggregation and is therefore likely to be taken up broadly by the field. However, the new model should be fully characterized. Throughout the manuscript, the only method to detect amyloid deposition was the GFP fluorescence intensity and morphology, while direct characterization of amyloid aggregates is lacking.

      We thank the reviewer for the feedback and the foresight that this model might be taken up by the field. To strengthen our model, as the reviewer had suggested, we confirmed that the GFP fluorescence is indeed amyloid aggregations. Please, see point 3 above and the new Supporting Figure 1.1.

      2) A targeted RNA interference (RNAi) screen was used to identify the key regulators of Aβ aggregation and clearance, which is one of the strengths of the study. There should be evidence that RNAi works to knockdown the specific genes. Similarly, there should be evidence indicating that ADM-2 is indeed expressed in the overexpression experiments.

      We aimed to verify our main hits (cri-2 and adm-2) with a mutation in these genes, as RNAi can have off-target effects. The adm-2(ok3178) allele is a 989 bp deletion leading to a splice/acceptor change leading to a probably truncated and out-of-frame protein.

      Author response image 1.

      The cri-2(gk314) allele is a 1213 bp deletion covering the whole cri-2 locus, suggesting to be a null allele.

      Author response image 2.

      For the overexpression, there is no ADM-2 antibody available. We tried to generate an ADM-2 antibody, unfortunately unsuccessfully. Thus, we can only, based on the induction and higher red fluorescence of ADM-2::mScarlet (Supporting Figure 6.1.) infer the ADM-2 overexpression.

      3) It remains unknown whether ADM-2 directly degrades Aβ or facilitates the clearance of Aβ by remoulding the ECM. The effect of ADM-2 on ECM remodeing should be examined.

      We addressed this in point 1 above and also in our discussion section.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Bai et al. investigate in experiments and simulations how cohesion is maintained in chemotactic travelling waves of bacteria. These waves emerge from the bacterial population consuming an attractant, thus carving a gradient which they follow chemotactically. This paper builds up on previous work of some of the authors (Fu et al, Nat Commun 2018), which found that in these waves bacteria with varying degree of chemotactic sensitivity organize spatially in the band, which allows for its cohesiveness despite varying phenotypes. The authors investigate here an additional element for the cohesiveness of the wave: because the sharpness of the gradient increases from the front to the back of the wave, 'late' cells catch up via a stronger chemotactic response, and front cells slow down via a weaker one. This had been already postulated in earlier work on the phenomenon (Saragosti et al. PNAS 2011), but here the authors investigate how this applies to cells with varying chemotactic sensitivity. They also performed agent-based simulations of the cells behavior in the gradient and developed a model of the motion in the gradient. The latter maps the spatial dependence of the gradient steepness onto an effective travelling potential which keeps the cells together in a group as the gradient and the wave propagate. Importantly, the effective potential is predicted to be tighter for cells with higher chemotactic sensitivity, in agreement with the cell behavior they observe in experiments where the chemotactic sensitivity is artificially modulated. This suggests that weakly chemotactic cells are more weakly bound to the group and have a higher chance of being left behind. This last part is interesting in the context of range extension in semi-solid agar, where bacteria are known to be spatially organized and selected according to their chemotactic motility (Ni et al, Cell reports 2017, Liu et al Nature 2019)

      This paper builds its strengths on the extensive experimental characterization of the system and a variety of modeling approaches and makes a fairly convincing case for the way of understanding the mechanism of cohesion maintenance they propose.

      In fact, we have addressed both the mechanism to maintain a coherent group and also the mechanism to form ordered pattern of diverse phenotypes. Thanks to the reviewer, we noticed that the second point was not clearly showed out in our previous version. So that we have largely rewritten the texts and reorganized the results to prominent both mechanism.

      From a methodological perspective, only a few points need to be addressed:

      Control experiments need to quantify the cell-to-cell variability of the induction level of Tar by tetracycline.

      The distributions of the titrate cells are presented by a ptet-Tar-GFP strain, where the GFP is used as a reporter of the expressed Tar protein. The results are shown below:

      Chemical attraction to cues released by other cells is a well-documented way to create cohesive large scale structures in E. coli (Budrene & Berg Nature 1995, Park et al PNAS 2003, Jani et al Microbiology 2017, Laganenka et al Nat commun 2016). The cohesion of the wave have never been analyzed in this optic, despite being a possible alternative explanation to the gradient shape. Since the authors main claim is about the wave cohesion, they should provide evidence that such an explanation can be ruled out or considered secondary.

      We thank the reviewer to point out the self-attractant secretion as a possible mechanism to maintain coherent group. We argue that this mechanism is not necessary for the chemotactic group to maintain coherency, because the migration group keeps without considering these effect in our agent based simulations.

      Moreover, as suggested by the reviewer, we Used a Tar only strain, which do not sense any chemo-attractant other than aspartate, to show that the migration group maintained coherent (see Fig S9). This experiment showed that the secretion of self-attractant is not essential for the coherent group migration.

      Possible effects of physical interactions between cells on the chemotactic response are not accounted for. The consequences should be better discussed, because they are known to influence chemotactic motility at the densities encountered in the present experiments (Colin et al Nat commun 2019).

      As being reported by Colin et al., the effective drift velocity and the chemotactic ability deceases when cells are condensed (volume fraction >0.01). However, the cell density is smaller than this critical value (volume fraction<0.01).

      Additionally, the paper could better emphasize the new results and separate them from the confirmations of previous results.

      In the revised version, we addressed 2 new findings:

      1) The individual drift velocity decreases from back to front of the bacterial migration group, which makes the chemotactic migration wave a pushed wave.

      2) Cells of diversed phenotypes follows the same reversion behavior, ie. drift faster in the back and slower in the front, but with ordered mean positions, to achieve the ordered pattern in the migration group.

      Reviewer #2 (Public Review):

      The manuscript by Bai et al. explores the single-cell motility dynamics within a chemotactic soliton wave in E. coli. They tracked individual cells and measured their trajectory speed and orientation distributions behind and ahead of the wave. They showed cells behind the wave were moving in a more directed fashion towards the center of the wave compared to cells ahead of the wave. This behavior explains the stability of group migration, as confirmed by numerical simulations.

      I do not recommend this manuscript for publication in eLife since it basically reproduces and deepens previous published works. In particular, Saragosti et al (2011) already provided exactly what the authors claim to do here : "How individuals with phenotypic and behavioral variations manage to maintain the consistent group performance and determine their relative positions in the group is still a mystery." (Line 75-77) (See the last sentences from Saragosti et al : "This modulation of the reorientations significantly improves the efficiency of the collective migration. Moreover, these two quantities are spatially modulated along the concentration profile. We recover quantitatively these microscopic and macroscopic observations with a dedicated kinetic model.")

      Saragosti et al.talks about the modulation of reorientation angle of bacteria along directions. It is not equal to the spatial modulation of drift velocities along space. They claim that cells moving along the gradient direction reorient less during a tumble than cells moving against the gradient. This phenomenon increases the migration efficiency of the group. Here, in our paper, we claim that the drift velocity of bacteria is spatially modulated, where cells on the back drifts faster while the cells in the front drift slower. This phenomenon is important because it makes the chemotactic migration front a pushed wave, that helps the group to keep diversed phenotypes.

      Although Saragosti et al. Have also suggested spatial modulation of bias in run length to explain the coherency of the migration group. But they did not quantify such bias nor did they explain the causes and consequences of the spatial modulation. More over, Their model, consisting their proposed mechanism of directional persistence, can not explain their observed phenomenon of the decreasing bias of run length (see their figure 4A and C).In this circumstance, we can’t agree that they already proofed how cells with diversed phenotype to maintain coherent group.

      Moreover, they did not talk about diversities in the group.

      What is novel here is the titration of the behavior with chemo-receptor abundance, but I believe the scope is not wide enough for publication in eLife. I suggest the authors to submit in a more specialized journal.

      The titration of the chemo-receptor abundance of bacteria serves as a tool to explain how diverse individuals manage to form the ordered patterns in a group. This question worth several discussion because diversity is known as an important feature to keep a group to survive. The ordered pattern was found the key for a migrating group to keep the diversity while performing consistent migration speed. In this paper we successfully explained how individuals performing biased random walk are able to form ordered structure.

      Reviewer #3 (Public Review):

      The authors present a study on the collective behaviour of E.coli during migration in a self-generated gradient. Taking into account phenotypic variation within a biological population, they performed experiments and complemented the study with a predictive model used for simulation to understand how bacteria can move as a group and how the individual bacterium defines its own position within the group.

      They observed experimentally that phenotype variation within the bacterial population causes a spatial distribution within the chemotactic band that is not continuous but formed by subpopulations with specific properties such as run length, run duration, angular distribution of trajectories, drift velocity. They attribute this behaviour to the chemotaxis ability, which varies between phenotypes and defines a potential well that anchors each bacterium in its own group. This was proven by the subdiffusive dynamics of the bacteria in each subgroup. Many cases were studied in the experiments and the authors present many controls to clearly demonstrate their hypothesis.

      These are interesting results that prove how a discretised distribution can produce continuous collective behaviour. It presents also an interesting example in the field of active matter about collective behaviour on a large scale that is generated by a different behaviour of individuals on a much smaller scale. However, it is not clear how the subpopulations can be held together in the group.

      The decreasing chemo-attractant gradient makes the migration wavefront a pushed wavefront. So that the balanced position of the subpopulation with larger chemotactic ability is located in the front where the gradient is small. So that diverse phenotypes form ordered pattern to achieve identical migration speed on their balanced positions. This discussion was added in the revised text (see line 268-277).

      Moreover, a link between bacterial dynamics and the biological necessary mechanism is not clear.

      The bacterial individual dynamics is controlled by the bacterial chemotaxis pathway, which is clear according to previous studies. Basically, the biased random motion was controlled by alternating expected run length through a temporal comparison mechanism between received chemo-attractant concentrations.(Jiang et al. 2010 Plos Comp. Biol.)

      They formulate a theoretical description based on the classical Keller-Segel model. Langevin dynamics was used to describe bacterial activity in terms of drift velocity for simulation, which agrees very well with experimental observations.

      One can appreciate the interesting results of the study describing Ecoli chemotaxis as a mean-reversion process with an associated potential, but it is not clear to what extent the results can be generalised to all bacteria or rather relate to the strain the authors investigated.

      The mean reversion process is a result of decreasing drift velocity (or a pushed wave). Although our study focuses on bacterail chemotaxis migration, but the ordering mechanism of diversed phenotypes follows a OU type model, which is not limited to bacterial chemotaxis. In this case, we argue that the ordering mechanism that we proposed is universal to all active particles that generate signals as a global cue of collective motion.

    1. Author Response

      Reviewer #2 (Public Review):

      The time-dependency of the model simulations was not analyzed, and the nature of the observed biphasic time-dependent APAP response remains elusive. It would be interesting to see how the model can explain the time course of the APAP stimulation experiment.

      The alternative model at its current state can only describe steady state conditions. However, we understand that the reviewer is interested in the dynamic behavior of the model. However, our approach provides a proof of principle that the alternative model can phenomenologically explain the changes of YAP localization as a response to APAP treatment. The question of how to model Hippo pathway in a time-dependent manner as a response to APAP treatment is very challenging and would require further investigations and, most notably, further development of the PDE simulation algorithms and the SME software. Hence, a technical update of the software algorithms would be required, which cannot be in the scope of this manuscript.

      Nevertheless, we decided to share our first and preliminary analyses on dynamic processes caused by APAP with the reviewer. For this, we simulated the steady state model in an arbitrary manner, where APAP initiates (early time-point) and slows down (late time-points) YAP phosphorylation in the nucleus (see Figure below).

      The simulated alternative model shows that increased YAP phosphorylation about 50% leads to the cytoplasmic localization of YAP (Rebuttal Figure R5A/B). However, this shuttling is not detectable in our protein fractionation and live-cell imaging experiments (see also Rebuttal Figure R7C/D). At late time points, decreasing YAP phosphorylation (about 60%) led to a clear nuclear enrichment and dephosphorylation of YAP was observed in our experiments. Thus, our mathematical model nicely describes cellular events of Hippo pathway dynamics observed at later stages after APAP treatment (nuclear enrichment). However, early events cannot be completely explained (suggested nuclear YAP exclusion is not detectable).

      We suggest two explanations for this observation. First, other molecular mechanisms (not yet identified and therefore not part of the model topology) oppose the exclusion YAP enrichment that is expected at early time points. Second, detection methods used in this study (Western Blotting and life cell imaging) cannot capture minimal changes and cellular heterogeneity in the chosen experimental setup. We clarify this aspect/limitation of our study in the discussion chapter of the manuscript. Page 12, lines 436-440

      Time-dependency of YAP (orange) localization based on the simulated APAP treatment. (A): Simulated control (ctrl) and APAP treatment for 2 and 48h. The treatment was simulated by changing the phosphorylation coefficient of YAP in the nucleus. (B): Simulated pYAP/YAP ratio during control and APAP treatment for 2 and 48 hours at the steady state of the model. (C): Simulated NCR of the total YAP during control and APAP treatment for 2 and 48 hours at the steady state.

    1. Author Response

      Reviewer #1 (Public Review):

      Because of the importance of brain and cognitive traits in human evolution, brain morphology and neural phenotypes have been the subject of considerable attention. However, work on the molecular basis of brain evolution has tended to focus on only a handful of species (i.e., human, chimp, rhesus macaque, mouse), whereas work that adopts a phylogenetic comparative approach (e.g., to identify the ecological correlates of brain evolution) has not been concerned with molecular mechanism. In this study, Kliesmete, Wange, and colleagues attempt to bridge this gap by studying protein and cis-regulatory element evolution for the gene TRNP1, across up to 45 mammals. They provide evidence that TRNP1 protein evolution rates and its ability to drive neural stem cell proliferation are correlated with brain size and/or cortical folding in mammals, and that activity of one TRNP1 cis-regulatory element may also predict cortical folding.

      There is a lot to like about this manuscript. Its broad evolutionary scope represents an important advance over the narrower comparisons that dominate the literature on the genetics of primate brain evolution. The integration of molecular evolution with experimental tests for function is also a strength. For example, showing that TRNP1 from five different mammals drives differences in neural stem cell proliferation, which in turn correlate with brain size and cortical folding, is a very nice result. At the same time, the paper is a good reminder of the difficulty of conclusively linking macroevolutionary patterns of trait evolution to molecular function. While TRNP1 is a moderate outlier in the correlation between rate of protein evolution and brain morphology compared to 125 other genes, this result is likely sensitive to how the comparison set is chosen; additionally, it's not clear that a correlation with evolutionary rate is what should be expected. Further, while the authors show that changes in TRNP1 sequence have functional consequences, they cannot show that these changes are directly responsible for size or folding differences, or that positive selection on TRNP1 is because of selection on brain morphology (high bars to clear). Nevertheless, their findings contribute strong evidence that TRNP1 is an interesting candidate gene for studying brain evolution. They also provide a model for how functional follow-up can enrich sequence-based comparative analysis.

      We thank the reviewer for the positive assessment. With respect to our set of control genes and the interpretation of the correlation between the evolution of the TRNP1 protein sequence and the evolution of brain size and gyrification, we would like to mention the following: we do think that the set is small, but we took all similarly sized genes with one coding exon that we could find in all 30 species. Furthermore, the control genes are well comparable to TRNP1 with respect to alignment quality and average omega (Figure 1-figure supplement 3). Hence, we think that the selection procedure and the actual omega distribution make them a valid, unbiased set to which TRNP1’s co-evolution with brain phenotypes can be compared to. Moreover, we want to point out that by using Coevol, we correlate evolutionary rates, that is the rate of protein evolution of TRNP1 as measured with omega and the rate of brain size evolution that is modeled in Coevol as a Brownian motion process. We think that this was unclear in the previous version of our manuscript, and appreciate that the reviewer saw some merit in our analyses in spite of it.

      Finding conclusive evidence to link molecular evolution to concrete phenotypes is indeed difficult and necessarily inferential. This said, we still believe that correlating rates of evolution of phenotype and sequence across a phylogeny is one of the most convincing pieces of evidence available.

      Reviewer #2 (Public Review):

      In this paper, Kliesmete et al. analyze the protein and regulatory evolution of TRNP1, linking it to the evolution of brain size in mammals. We feel that this is very interesting and the conclusions are generally supported, with one concern.

      The comparison of dN/dS (omega) values to 125 control proteins is helpful, but an important factor was not controlled. The fraction of a protein in an intrinsically disordered region (IDR) is potentially even more important in affecting dN/dS than the protein length or number of exons. We suggest comparing dN/dS of TRNP1 to another control set, preferably at least ~500 proteins, which have similar % IDR.

      Thank you for this interesting suggestion. As mentioned in the public response to Reviewer #1, we are sorry that we did not explain the rationale of the approach very well in the previous version of the manuscript. As also argued above, we think that our control proteins are an unbiased set as they have a comparable alignment quality and an average omega (dN/dS) similar to TRNP1 (Figure 1-figure supplement 3). While IDR domains tend to have a higher omega than their respective non-IDR counterparts, we do not think that the IDR content should be more relevant than omega itself as we do not interpret this estimate on its own, but its covariance with the rate of phenotypic change. Indeed, the proteins of our control set that have a higher IDR content (D2P2, Oates et al. 2013) do not show stronger evidence to be coevolving with the brain phenotypes (IDR content vs. absolute brain size-omega partial correlation: Kendall's tau = 0.048, p-value = 0.45; IDR content vs. absolute GI-omega partial correlation: Kendall’s tau = -0.025, p-value = 0.68; 88 proteins (71%) contain >0% IDRs; 8 proteins contain >62% (TRNP1 content) IDRs.

      Reviewer #3 (Public Review):

      In this work, Z. Kliesmete, L. Wange and colleagues investigate TRNP1 as a gene of potential interest for the evolution of the mammalian cortex. Previous evidence suggests that TRNP1 is involved in self-renewal, proliferation and expansion in cortical cells in mouse and ferret, making this gene a good candidate for evolutionary investigation. The authors designed an experimental scheme to test two non-exclusive hypotheses: first, that evolution of the TRNP1 protein is involved in the apparition of larger and more convoluted brains; and second, that regulation of the TRNP1 gene also plays a role in this process alongside protein evolution.

      The authors report that the rate of TRNP1 protein evolution is strongly correlated to brain size and gyrification, with species with larger and more convoluted brains having more divergent sequences at this gene locus. The correlation with body mass was not as strong, suggesting a functional link between TRNP1 and brain evolution. The authors directly tested the effects of sequence changes by transfecting the TRNP1 sequences from 5 different species in mouse neural stem cells and quantifying cell proliferation. They show that both human and dolphin sequences induce higher proliferation, consistent with larger brain sizes and gyrifications in these two species. Then, the authors identified six potential cis-regulatory elements around the TRNP1 gene that are active in human fetal brain, and that may be involved in its regulation. To investigate whether sequence evolution at these sites results in changes in TRNP1 expression, the authors performed a massively parallel reporter assay using sequences from 75 mammals at these six loci. The authors report that one of the cis-regulatory elements drives reporter expression levels that are somewhat correlated to gyrification in catarrhine monkeys. Consistent with the activity of this cis-regulatory sequence in the fetal brain, the authors report that this element contains binding sites for TFs active in brain development, and contains stronger binding sites for CTCF in catarrhine monkeys than in other species. However, the specificity or functional relevance of this signal is unclear.

      Altogether, this is an interesting study that combines evolutionary analysis and molecular validation in cell cultures using a variety of well-designed assays. The main conclusions - that TRNP1 is likely involved in brain evolution in mammals - are mostly well supported, although the involvement of gene regulation in this process remains inconclusive.

      Strengths:

      • The authors have done a good deal of resequencing and data polishing to ensure that they obtained high-quality sequences for the TRNP1 gene in each species, which enabled a higher confidence investigation of this locus.

      • The statistical design is generally well done and appears robust.

      • The combination of evolutionary analysis and in vivo validation in neural precursor cells is interesting and powerful, and goes beyond the majority of studies in the field. I also appreciated that the authors investigated both protein and regulatory evolution at this locus in significant detail, including performing a MPRA assay across species, which is an interesting strategy in this context.

      Weaknesses:

      • The authors report that TRNP1 evolves under positive selection, however this seems to be the case for many of the control proteins as well, which suggests that the signal is non-specific and possibly due to misspecifications in the model.

      • The evidence for a higher regulatory activity of the intronic cis-regulatory element highlighted by the authors is fairly weak: correlation across species is only 0.07, consistent with the rapid evolution of enhancers in mammals, and the correlation in catarrhine monkeys is seems driven by a couple of outlier datapoints across the 10 species. It is unclear whether false discovery rates were controlled for in this analysis.

      • The analysis of the regulatory content in this putative enhancer provides some tangential evidence but no reliable conclusions regarding the involvement of regulatory changes at this locus in brain evolution.

      We thank the reviewer for the detailed comments. Indeed, TRNP1 overall has a rather average omega value across the tree and hence also the proportion of sites under selection is not hugely increased compared to the control proteins. This is good because we want to have comparable power to detect a correlation between the rate of protein evolution (omega) and the rate of brain size or GI evolution for TRNP1 and the control proteins. Indeed, what makes TRNP1 special is the rather strong correlation between the rate of brain size change and omega, which was only stronger in 4% of our control proteins. Hence, we do not agree with the weakness of model misspecification for TRNP1 protein evolution.

      We agree that the correlation of the activity induced by the intronic cis regulatory element (CRE) with gyrification is weak, but we dispute that the correlation is due to outliers (see residual plot below) or violations of model assumptions (see new permutation analysis in the Results section). There are many reasons why we would expect such a correlation not to be weak, including that a MPRA takes the CRE out of its natural genomic context. Our conclusions do not solely rest on those statistics, but also on independent corroborating evidence: Reilly et al (2015) found a difference in the activity of the TRNP1 intron between human and macaque samples during brain development. Furthermore, we used their and other public data to show that the intron CRE is indeed active in humans and bound by CTCF (new Figure 4 - figure supplement 2).

      We believe that the combined evidence suggests a likely role for the intron CRE for the co-evolution of TRNP1 with gyrification.

    1. Author Response

      Reviewer #1 (Public Review):

      Trudel and colleagues aimed to uncover the neural mechanisms of estimating the reliability of the information from social agents and non-social objects. By combining functional MRI with a behavioural experiment and computational modelling, they demonstrated that learning from social sources is more accurate and robust compared with that from non-social sources. Furthermore, dmPFC and pTPJ were found to track the estimated reliability of the social agents (as opposed to the non-social objects). The strength of this study is to devise a task consisting of the two experimental conditions that were matched in their statistical properties and only differed in their framing (social vs. non-social). The novel experimental task allows researchers to directly compare the learning from social and non-social sources, which is a prominent contribution of the present study to social decision neuroscience.

      Thank you so much for your positive feedback about our work. We are delighted that you found that our manuscript provided a prominent contribution to social decision neuroscience. We really appreciate your time to review our work and your valuable comments that have significantly helped us to improve our manuscript further.

      One of the major weaknesses is the lack of a clear description about the conceptual novelty. Learning about the reliability/expertise of social and non-social agents has been of considerable concern in social neuroscience (e.g., Boorman et al., Neuron 2013; and Wittmann et al., Neuron 2016). The authors could do a better job in clarifying the novelty of the study beyond the previous literature.

      We understand the reviewer’s comment and have made changes to the manuscript that, first, highlight more strongly the novelty of the current study. Crucially, second, we have also supplemented the data analyses with a new model-based analysis of the differences in behaviour in the social and non-social conditions which we hope makes clearer, at a theoretical level, why participants behave differently in the two conditions.

      There has long been interest in investigating whether ‘social’ cognitive processes are special or unique compared to ‘non-social’ cognitive processes and, if they are, what makes them so. Differences between conditions could arise during the input stage (e.g. the type of visual input that is processed by social and non-social system), at the algorithm stage (e.g. the type of computational principles that underpin social versus non-social processes) or, even if identical algorithms are used, social and non-social processes might depend on distinct anatomical brain areas or neurons within brain areas. Here, we conducted multiple analyses (in figures 2, 3, and 4 in the revised manuscript and in Figure 2 – figure supplement 1, Figure 3 – figure supplement 1, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) that not only demonstrated basic similarities in mechanism generalised across social and non-social contexts, but also demonstrated important quantitative differences that were linked to activity in specific brain regions associated with the social condition. The additional analyses (Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) show that differences are not simply a consequence of differences in the visual stimuli that are inputs to the two systems1, nor does the type of algorithm differ between conditions. Instead, our results suggest that the precise manner in which an algorithm is implemented differs when learning about social or non-social information and that this is linked to differences in neuroanatomical substrates.

      The previous studies mentioned by the reviewer are, indeed, relevant ones and were, of course, part of the inspiration for the current study. However, there are crucial differences between them and the current study. In the case of the previous studies by Wittmann, the aim was a very different one: to understand how one’s own beliefs, for example about one’s performance, and beliefs about others, for example about their performance levels, are combined. Here, however, instead we were interested in the similarities and differences between social and non-social learning. It is true that the question resembles the one addressed by Boorman and colleagues in 2013 who looked at how people learned about the advice offered by people or computer algorithms but the difference in the framing of that study perhaps contributed to authors’ finding of little difference in learning. By contrast, in the present study we found evidence that people were predisposed to perceive stability in social performance and to be uncertain about non-social performance. By accumulating evidence across multiple analyses, we show that there are quantitative differences in how we learn about social versus non-social information, and that these differences can be linked to the way in which learning algorithms are implemented neurally. We therefore contend that our findings extend our previous understanding of how, in relation to other learning processes, ‘social’ learning has both shared and special features.

      We would like to emphasize the way in which we have extended several of the analyses throughout the revision. The theoretical Bayesian framework has made it possible to simulate key differences in behaviour between the social and non-social conditions. We explain in our point-by-point reply below how we have integrated a substantial number of new analyses. We have also more carefully related our findings to previous studies in the Introduction and Discussion.

      Introduction, page 4:

      [...] Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources. However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      Another weakness is the lack of justifications of the behavioural data analyses. It is difficult for me to understand why 'performance matching' is suitable for an index of learning accuracy. I understand the optimal participant would adjust the interval size with respect to the estimated reliability of the advisor (i.e., angular error); however, I am wondering if the optimal strategy for participants is to exactly match the interval size with the angular error. Furthermore, the definitions of 'confidence adjustment across trials' and 'learning index' look arbitrary.

      First, having read the reviewer’s comments, we realise that our choice of the term ‘performance matching’ may not have been ideal as it indeed might not be the case that the participant intended to directly match their interval sizes with their estimates of advisor/predictor error. Like the reviewer, our assumption is simply that the interval sizes should change as the estimated reliability of the advisor changes and, therefore, that the intervals that the participants set should provide information about the estimates that they hold and the manner in which they evolve. On re-reading the manuscript we realised that we had not used the term ‘performance matching’ consistently or in many places in the manuscript. In the revised manuscript we have simply removed it altogether and referred to the participants’ ‘interval setting’.

      Most of the initial analyses in Figure 2a-c aim to better understand the raw behaviour before applying any computational model to the data. We were interested in how participants make confidence judgments (decision-making per se), but also how they adapt their decisions with additional information (changes or learning in decision making). In the revised manuscript we have made clear that these are used as simple behavioural measures and that they will be complemented later by more analyses derived from more formal computational models.

      In what we now refer to as the ‘interval setting’ analysis (Figure 2a), we tested whether participants select their interval settings differently in the social compared to non-social condition. We observe that participants set their intervals closer to the true angular error of the advisor/predictor in the social compared to the non-social condition. This observation could arise in two ways. First, it could be due to quantitative differences in learning despite general, qualitative similarity: mechanisms are similar but participants differ quantitatively in the way that they learn about non-social information and social information. Second, it could, however, reflect fundamentally different strategies. We tested basic performance differences by comparing the mean reward between conditions. There was no difference in reward between conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance in social or non-social contexts but instead might reflect quantitative differences in the processes guiding interval setting in the two cases.

      In the next set of analyses, in which we compared raw data, applied a computational model, and provided a theoretical account for the differences between conditions, we suggest that there are simple quantitative differences in how information is processed in social and nonsocial conditions but that these have the important impact of making long-term representations – representations built up over a longer series of trials – more important in the social condition. This, in turn, has implications for the neural activity patterns associated with social and non-social learning. We, therefore, agree with the reviewer, that one manner of interval setting is indeed not more optimal than another. However, the differences that do exist in behaviour are important because they reveal something about the social and non-social learning and its neural substrates. We have adjusted the wording and interpretation in the revised manuscript.

      Next, we analysed interval setting with two additional, related analyses: interval setting adjustment across trials and derivation of a learning index. We tested the degree to which participants adjusted their interval setting across trials and according to the prediction error (learning index, Figure f); the latter analysis is very similar to a trial-wise learning rate calculated in previous studies11. In contrast to many other studies, the intervals set by participants provide information about the estimates that they hold in a simple and direct way and enable calculation of a trial-wise learning index; therefore, we decided to call it ‘learning index’ instead of ‘learning rate’ as it is not estimated via a model applied to the data, but instead directly calculated from the data. Arguably the directness of the approach, and its lack of dependence on a specific computational model, is a strength of the analysis.

      Subsequently in the manuscript, a new analysis (illustrated in new Figure 3) employs Bayesian models that can simulate the differences in the social and non-social conditions and demonstrate that a number of behavioural observations can arise simply as a result of differences in noise in each trial-wise Bayesian update (Figure 3 and specifically 3d; Figure 3 – figure supplement 1b-c). In summary, the descriptive analyses in Figure 2a-c aid an intuitive understanding of the differences in behaviour in the social and non-social conditions. We have then repeated these analyses with Bayesian models incorporating different noise levels and showed that in such a way, the differences in behaviour between social and non-social conditions can be mimicked (please see next section and manuscript for details).

      We adjusted the wording in a number of sections in the revised manuscript such as in the legend of Figure 2 (figures and legend), Figure 4 (figures and legend).

      Main text, page 5:

      The confidence interval could be changed continuously to make it wider or narrower, by pressing buttons repeatedly (one button press resulted in a change of one step in the confidence interval). In this way participants provided what we refer to as an ’interval setting’.

      We also adjusted the following section in Main text, page 6:

      Confidence in the performance of social and non-social advisors

      We compared trial-by-trial interval setting in relation to the social and non-social advisors/predictors. When setting the interval, the participant’s aim was to minimize it while ensuring it still encompassed the final target position; points were won when it encompassed the target position but were greater when it was narrower. A given participant’s interval setting should, therefore, change in proportion to the participant’s expectations about the predictor’s angular error and their uncertainty about those expectations. Even though, on average, social and non-social sources did not differ in the precision with which they predicted the target (Figure 2 – figure supplement 1), participants gave interval settings that differed in their relationships to the true performances of the social advisors compared to the non-social predictors. The interval setting was closer to the angular error in the social compared to the non-social sessions (Figure 2a, paired t-test: social vs. non-social, t(23)= -2.57, p= 0.017, 95% confidence interval (CI)= [-0.36 -0.4]). Differences in interval setting might be due to generally lower performance in the nonsocial compared to social condition, or potentially due to fundamentally different learning processes utilised in either condition. We compared the mean reward amounts obtained by participants in the social and non-social conditions to determine whether there were overall performance differences. There was, however, no difference in the reward received by participants in the two conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance

      Discussion, page 14:

      Here, participants did not match their confidence to the likely accuracy of their own performance, but instead to the performance of another social or non-social advisor. Participants used different strategies when setting intervals to express their confidence in the performances of social advisors as opposed to non-social advisors. A possible explanation might be that participants have a better insight into the abilities of social cues – typically other agents – than non-social cues – typically inanimate objects.

      As the authors assumed simple Bayesian learning for the estimation of reliability in this study, the degree/speed of the learning should be examined with reference to the distance between the posterior and prior belief in the optimal Bayesian inference.

      We thank the reviewer for this suggestion. We agree with the reviewer that further analyses that aim to disentangle the underlying mechanisms that might differ between both social and non-social conditions might provide additional theoretical contributions. We show additional model simulations and analyses that aim to disentangle the differences in more detail. These new results allowed clearer interpretations to be made.

      In the current study, we showed that judgments made about non-social predictors were changed more strongly as a function of the subjective uncertainty: participants set a larger interval, indicating lower confidence, when they were more uncertain about the non-social cue’s accuracy to predict the target. In response to the reviewer’s comments, the new analyses were aimed at understanding under which conditions such a negative uncertainty effect might emerge.

      Prior expectations of performance First, we compared whether participants had different prior expectations in the social condition compared to the non-social condition. One way to compare prior expectations is by comparing the first interval set for each advisor/predictor. This is a direct readout of the initial prior expectation with which participants approach our two conditions. In such a way, we test whether the prior beliefs before observing any social or non-social information differ between conditions. Even though this does not test the impact of prior expectations on subsequent belief updates, it does test whether participants have generally different expectations about the performance of social advisors or non-social predictors. There was no difference in this measure between social or non-social cues (Figure below; paired t-test social vs. non-social, t(23)= 0.01, p=0.98, 95% CI= [-0.067 0.68]).

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Learning across time We have now seen that participants do not have an initial bias when predicting performances in social or non-social conditions. This suggests that differences between conditions might emerge across time when encountering predictors multiple times. We tested whether inherent differences in how beliefs are updated according to new observations might result in different impacts of uncertainty on interval setting between social and non-social conditions. More specifically, we tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. This approach was inspired by the reviewer’s comments about potential differences in the speed of learning as well as the reduction of uncertainty with increasing predictor encounters. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities 12,13. In these studies, a smaller learning rate was prevalent in stable environments during which reward rates change slower over time, while higher learning rates often reflect learning in volatile environments so that recent observations have a stronger impact on behaviour. Even though most studies derived these learning rates with reinforcement learning models, similar ideas can be translated into a Bayesian model. For example, an established way of changing the speed of learning in a Bayesian model is to introduce noise during the update process14. This noise is equivalent to adding in some of the initial prior distribution and this will make the Bayesian updates more flexible to adapt to changing environments. It will widen the belief distribution and thereby make it more uncertain. Recent information has more weight on the belief update within a Bayesian model when beliefs are uncertain. This increases the speed of learning. In other words, a wide distribution (after adding noise) allows for quick integration of new information. On the contrary, a narrow distribution does not integrate new observations as strongly and instead relies more heavily on previous information; this corresponds to a small learning rate. So, we would expect a steep decline of uncertainty to be related to a smaller learning index while a slower decline of uncertainty is related to a larger learning index. We hypothesized that participants reduce their uncertainty quicker when observing social information, thereby anchoring more strongly on previous beliefs instead of integrating new observations flexibly. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (new Figure 3a).

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) by adding a uniform distribution (equivalent to our prior distribution) to each belief update – we refer to this as noise addition to the Bayesian model14,21 . We varied the amount of noise between δ = [0,1], while δ= 0 equals the original Bayesian model and δ= 1 represents a very noisy Bayesian model. The uniform distribution was selected to match the first prior belief before any observation was made (equation 2). This δ range resulted in a continuous increase of subjective uncertainty around the belief about the angular error (Figure 3b-c). The modified posterior distribution denoted as 𝑝′(σ x) was derived at each trial as follows:

      We applied each noisy Bayesian model to participants’ choices within the social and nonsocial condition.

      The addition of a uniform distribution changed two key features of the belief distribution: first, the width of the distribution remains larger with additional observations, thereby making it possible to integrate new observations more flexibly. To show this more clearly, we extracted the model-derived uncertainty estimate across multiple encounters of the same predictor for the original model and the fully noisy Bayesian model (Figure 3 – figure supplement 1). The model-derived ‘uncertainty estimate’ of a noisy Bayesian model decays more slowly compared to the ‘uncertainty estimate’ of the original Bayesian model (upper panel). Second, the model-derived ‘accuracy estimate’ reflects more recent observations in a noisy Bayesian model compared to the ‘accuracy estimate’ derived from the original Bayesian model, which integrates past observations more strongly (lower panel). Hence, as mentioned beforehand, a rapid decay of uncertainty implies a small learning index; or in other words, stronger integration of past compared to recent observations.

      In the following analyses, we tested whether an increasingly noisy Bayesian model mimics behaviour that is observed in the non-social compared to social condition. For example, we tested whether an increasingly noisy Bayesian model also exhibits a strongly negative ‘predictor uncertainty’ effect on interval setting (Figure 2e). In such a way, we can test whether differences in noise in the updating process of a Bayesian model might reproduce important qualitative differences in learning-related behaviour seen in the social and nonsocial conditions.

      We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made when selecting a particular advisor or non-social cue. We simulated interval setting at each trial and examined whether an increase in noise produced model behaviours that resembled participant behaviour patterns observed in the non-social condition as opposed to social condition. At each trial, we used the accuracy estimate (Methods, equation 6) – which represents a subjective belief about a single angular error -- to derive an interval setting for the selected predictor. To do so, we first derived the point-estimate of the belief distribution at each trial (Methods, equation 6) and multiplied it with the size of one interval step on the circle. The step size was derived by dividing the circle size by the maximum number of possible steps. Here is an example of transforming an accuracy estimate into an interval: let’s assume the belief about the angular error at the current trial is 50 (Methods, equation 6). Now, we are trying to transform this number into an interval for the current predictor on a given trial. To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      Simulating Bayesian choices in that way, we repeated the behavioural analyses (Figure 2b,e,f) to test whether intervals derived from more noisy Bayesian models mimic intervals set by participants in the non-social condition: greater changes in interval setting across trials (Figure 3 – figure supplement 1b), a negative ‘predictor uncertainty' effect on interval setting (Figure 3 – figure supplement 1c), and a higher learning index (Figure 3d).

      First, we repeated the most crucial analysis -- the linear regression analysis (Figure 2e) and hypothesized that intervals that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting. This was indeed the case: irrespective of social or non-social conditions, the addition of noise (increased weighting of the uniform distribution in each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). In Figure 3d, we show the regression weights (y-axis) for the ‘predictor uncertainty’ on confidence judgment with increasing noise (x-axis). This result is highly consistent with the idea that that in the non-social condition the manner in which task estimates are updated is more uncertain and more noisy. By contrast, social estimates appear relatively more stable, also according to this new Bayesian simulation analysis.

      This new finding extends the results and suggests a formal computational account of the behavioural differences between social and non-social conditions. Increasing the noise of the belief update mimics behaviour that is observed in the non-social condition: an increasingly negative effect of ‘predictor uncertainty’ on confidence judgment. Noteworthily, there was no difference in the impact that the noise had in the social and non-social conditions. This was expected because the Bayesian simulations are blind to the framing of the conditions. However, it means that the observed effects do not depend on the precise sequence of choices that participants made in these conditions. It therefore suggests that an increase in the Bayesian noise leads to an increasingly negative impact of ‘predictor uncertainty’ on confidence judgments irrespective of the condition. Hence, we can conclude that different degrees of uncertainty within the belief update is a reasonable explanation that can underlie the differences observed between social and non-social conditions.

      Next, we used these simulated confidence intervals and repeated the descriptive behavioural analyses to test whether interval settings that were derived from more noisy Bayesian models mimic behavioural patterns observed in non-social compared to social conditions. For example, more noise in the belief update should lead to more flexible integration of new information and hence should potentially lead to a greater change of confidence judgments across predictor encounters (Figure 2b). Further, a greater reliance on recent information should lead to prediction errors more strongly in the next confidence judgment; hence, it should result in a higher learning index in the non-social condition that we hypothesize to be perceived as more uncertain (Figure 2f). We used the simulated confidence interval from Bayesian models on a continuum of noise integration (i.e. different weighting of the uniform distribution into the belief update) and derived again both absolute confidence change and learning indices (Figure 3 – figure supplement 1b-c).

      ‘Absolute confidence change’ and ‘learning index’ increase with increasing noise weight, thereby mimicking the difference between social and non-social conditions. Further, these analyses demonstrate the tight relationship between descriptive analyses and model-based analyses. They show that a noise in the Bayesian updating process is a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly as expressed in a higher learning index. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      We thank the reviewer for making this point, as we believe that these additional analyses allow theoretical inferences to be made in a more direct manner; we think that it has significantly contributed towards a deeper understanding of the mechanisms involved in the social and non-social conditions. Further, it provides a novel account of how we make judgments when being presented with social and non-social information.

      We made substantial changes to the main text, figures and supplementary material to include these changes:

      Main text, page 10-11 new section:

      The impact of noise in belief updating in social and non-social conditions

      So far, we have shown that, in comparison to non-social predictors, participants changed their interval settings about social advisors less drastically across time, relied on observations made further in the past, and were less impacted by their subjective uncertainty when they did so (Figure 2). Using Bayesian simulation analyses, we investigated whether a common mechanism might underlie these behavioural differences. We tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities12,13. We tested these ideas using established ways of changing the speed of learning during Bayesian updates14,21. We hypothesized that participants reduce their uncertainty quicker when observing social information. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (Figure 5a).

      We manipulated the amount of uncertainty in the Bayesian model by adding a uniform distribution to each belief update (Figure 3b-c) (equation 10,11). Consequently, the distribution’s width increases and is more strongly impacted by recent observations (see example in Figure 3 – figure supplement 1). We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made by selecting a particular advisor in the social condition or other predictor in the nonsocial condition. We simulated confidence intervals at each trial. We then used these to examine whether an increase in noise led to simulation behaviour that resembled behavioural patterns observed in non-social conditions that were different to behavioural patterns observed in the social condition.

      First, we repeated the linear regression analysis and hypothesized that interval settings that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting resembling the effect we had observed in the nonsocial condition (Figure 2e). This was indeed the case when using the noisy Bayesian model: irrespective of social or non-social condition, the addition of noise (increasing weight of the uniform distribution to each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). The absence of difference between the social and non-social conditions in the simulations, suggests that an increase in the Bayesian noise is sufficient to induce a negative impact of ‘predictor uncertainty’ on interval setting. Hence, we can conclude that different degrees of noise in the updating process are sufficient to cause differences observed between social and non-social conditions. Next, we used these simulated interval settings and repeated the descriptive behavioural analyses (Figure 2b,f). An increase in noise led to greater changes of confidence across time and a higher learning index (Figure 3 – figure supplement 1b-c). In summary, the Bayesian simulations offer a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      Methods, page 23 new section:

      Extension of Bayesian model with varying amounts of noise

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) to test whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. [...] To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      We repeated behavioural analyses (Figure 2b,e,f) to test whether confidence intervals derived from more noisy Bayesian models mimic behavioural patterns observed in the nonsocial condition: greater changes of confidence across trials (Figure 3 – figure supplement 1b), a greater negative ‘predictor uncertainty' on confidence judgment (Figure 3 – figure supplement 1c) and a greater learning index (Figure 3d).

      Discussion, page 14: […] It may be because we make just such assumptions that past observations are used to predict performance levels that people are likely to exhibit next 15,16. An alternative explanation might be that participants experience a steeper decline of subjective uncertainty in their beliefs about the accuracy of social advice, resulting in a narrower prior distribution, during the next encounter with the same advisor. We used a series of simulations to investigate how uncertainty about beliefs changed from trial to trial and showed that belief updates about non-social cues were consistent with a noisier update process that diminished the impact of experiences over the longer term. From a Bayesian perspective, greater certainty about the value of advice means that contradictory evidence will need to be stronger to alter one’s beliefs. In the absence of such evidence, a Bayesian agent is more likely to repeat previous judgments. Just as in a confirmation bias 17, such a perspective suggests that once we are more certain about others’ features, for example, their character traits, we are less likely to change our opinions about them.

      Reviewer #2 (Public Review):

      Humans learn about the world both directly, by interacting with it, and indirectly, by gathering information from others. There has been a longstanding debate about the extent to which social learning relies on specialized mechanisms that are distinct from those that support learning through direct interaction with the environment. In this work, the authors approach this question using an elegant within-subjects design that enables direct comparisons between how participants use information from social and non-social sources. Although the information presented in both conditions had the same underlying structure, participants tracked the performance of the social cue more accurately and changed their estimates less as a function of prediction error. Further, univariate activity in two regions-dmPFC and pTPJ-tracked participants' confidence judgments more closely in the social than in the non-social condition, and multivariate patterns of activation in these regions contained information about the identity of the social cues.

      Overall, the experimental approach and model used in this paper are very promising. However, after reading the paper, I found myself wanting additional insight into what these condition differences mean, and how to place this work in the context of prior literature on this debate. In addition, some additional analyses would be useful to support the key claims of the paper.

      We thank the reviewer for their very supportive comments. We have addressed their points below and have highlighted changes in our manuscript that we made in response to the reviewer’s comments.

      (1) The framing should be reworked to place this work in the context of prior computational work on social learning. Some potentially relevant examples:

      • Shafto, Goodman & Frank (2012) provide a computational account of the domainspecific inductive biases that support social learning. In brief, what makes social learning special is that we have an intuitive theory of how other people's unobservable mental states lead to their observable actions, and we use this intuitive theory to actively interpret social information. (There is also a wealth of behavioral evidence in children to support this account; for a review, see Gweon, 2021).

      • Heyes (2012) provides a leaner account, arguing that social and non-social learning are supported by a common associative learning mechanism, and what distinguishes social from non-social learning is the input mechanism. Social learning becomes distinctively "social" to the extent that organisms are biased or attuned to social information.

      I highlight these papers because they go a step beyond asking whether there is any difference between mechanisms that support social and nonsocial learning-they also provide concrete proposals about what that difference might be, and what might be shared. I would like to see this work move in a similar direction.

      References<br /> (In the interest of transparency: I am not an author on these papers.)

      Gweon, H. (2021). Inferential social learning: how humans learn from others and help others learn. PsyArXiv. https://doi.org/10.31234/osf.io/8n34t

      Heyes, C. (2012). What's social about social learning?. Journal of Comparative Psychology, 126(2), 193.

      Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341-351.

      Thank you for this suggestion to expand our framing. We have now made substantial changes to the Discussion and Introduction to include additional background literature, the relevant references suggested by the reviewer, addressing the differences between social and non-social learning. We further related our findings to other discussions in the literature that argue that differences between social and non-social learning might occur at the level of algorithms (the computations involved in social and non-social learning) and/or implementation (the neural mechanisms). Here, we describe behaviour with the same algorithm (Bayesian model), but the weighing of uncertainty on decision-making differs between social and non-social contexts. This might be explained by similar ideas put forward by Shafto and colleagues (2012), who suggest that differences between social and non-social learning might be due to the attribution of goal-directed intention to social agents, but not non-social cues. Such an attribution might lead participants to assume that advisor performances will be relatively stable under the assumption that they should have relatively stable goal-directed intentions. We also show differences at the implementational level in social and non-social learning in TPJ and dmPFC.

      Below we list the changes we have made to the Introduction and Discussion. Further, we would also like to emphasize the substantial extension of the Bayesian modelling which we think clarifies the theoretical framework used to explain the mechanisms involved in social and non-social learning (see our answer to the next comments below).

      Introduction, page 4:

      [...]<br /> Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources.

      However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      (2) The results imply that dmPFC and pTPJ differentiate between learning from social and non-social sources. However, more work needs to be done to rule out simpler, deflationary accounts. In particular, the condition differences observed in dmPFC and pTPJ might reflect low-level differences between the two conditions. For example, the social task could simply have been more engaging to participants, or the social predictors may have been more visually distinct from one another than the fruits.

      We understand the reviewer’s concern regarding low-level distinctions between the social and non-social condition that could confound for the differences in neural activation that are observed between conditions in areas pTPJ and dmPFC. From the reviewer’s comments, we understand that there might be two potential confounders: first, low-level differences such that stimuli within one condition might be more distinct to each other compared to the relative distinctiveness between stimuli within the other condition. Therefore, simply the greater visual distinctiveness of stimuli in one condition than another might lead to learning differences between conditions. Second, stimuli in one condition might be more engaging and potentially lead to attentional differences between conditions. We used a combination of univariate analyses and multivariate analyses to address both concerns.

      Analysis 1: Univariate analysis to inspect potential unaccounted variance between social and non-social condition

      First, we used the existing univariate analysis (exploratory MRI whole-brain analysis, see Methods) to test for neural activation that covaried with attentional differences – or any other unaccounted neural difference -- between conditions. If there were neural differences between conditions that we are currently not accounting for with the parametric regressors that are included in the fMRI-GLM, then these differences should be captured in the constant of the GLM model. For example, if there are attentional differences between conditions, then we could expect to see neural differences between conditions in areas such as inferior parietal lobe (or other related areas that are commonly engaged during attentional processes).

      Importantly, inspection of the constant of the GLM model should capture any unaccounted differences, whether they are due to attention or alternative processes that might differ between conditions. When inspecting cluster-corrected differences in the constant of the fMRI-GLM model during the setting of the confidence judgment, there were no clustersignificant activation that was different between social and non-social conditions (Figure 4 – figure supplement 4a; results were familywise-error cluster-corrected at p<0.05 using a cluster-defining threshold of z>2.3). For transparency, we show the sub-threshold activation map across the whole brain (z > 2) for the ‘constant’ contrasted between social and nonsocial condition (i.e. constant, contrast: social – non-social).

      For transparency we additionally used an ROI-approach to test differences in activation patterns that correlated with the constant during the confidence phase – this means, we used the same ROI-approach as we did in the paper to avoid any biased test selection. We compared activation patterns between social and non-social conditions in the same ROI as used before; dmPFC (MNI-coordinate [x/y/z: 2,44,36] 16), bilateral pTPJ (70% probability anatomical mask; for reference see manuscript, page 23) and additionally compared activation patterns between conditions in bilateral IPLD (50% probability anatomical mask, 20). We did not find significantly different activation patterns between social and non-social conditions in any of these areas: dmPFC (confidence constant; paired t-test social vs nonsocial: t(23) = 0.06, p=0.96, [-36.7, 38.75]), bilateral TPJ (confidence constant; paired t-test social vs non-social: t(23) = -0.06, p=0.95, [-31, 29]), bilateral IPLD (confidence constant; paired t-test social vs non-social: t(23) = -0.58, p=0.57, [-30.3 17.1]).

      There were no meaningful activation patterns that differed between conditions in either areas commonly linked to attention (eg IPL) or in brain areas that were the focus of the study (dmPFC and pTPJ). Activation in dmPFC and pTPJ covaried with parametric effects such as the confidence that was set at the current and previous trial, and did not correlate with low-level differences such as attention. Hence, these results suggest that activation between conditions was captured better by parametric regressors such as the trial-wise interval setting, i.e. confidence, and are unlikely to be confounded by low-level processes that can be captured with univariate neural analyses.

      Analysis 2: RSA to test visual distinctiveness between social and non-social conditions

      We addressed the reviewer’s other comment further directly by testing whether potential differences between conditions might arise due to a varying degree of visual distinctiveness in one stimulus set compared to the other stimulus set. We used RSA analysis to inspect potential differences in early visual processes that should be impacted by greater stimulus similarity within one condition. In other words, we tested whether the visual distinctiveness of one stimuli set was different to the visual distinctiveness of the other stimuli set. We used RSA analysis to compare the Exemplar Discriminability Index (EDI) between conditions in early visual areas. We compared the dissimilarity of neural activation related to the presentation of an identical stimulus across trials (diagonal in RSA matrix) with the dissimilarity in neural activation between different stimuli across trials (off-diagonal in RSA matrix). If stimuli within one stimulus set are very similar, then the difference between the diagonal and off-diagonal should be very small and less likely to be significant (i.e. similar diagonal and off-diagonal values). In contrast, if stimuli within one set are very distinct from each other, then the difference between the diagonal and off-diagonal should be large and likely to result in a significant EDI (i.e. different diagonal and off-diagonal values) (see Figure 4g for schematic illustration). Hence, if there is a difference in the visual distinctiveness between social and non-social conditions, then this difference should result in different EDI values for both conditions – hence, visual distinctiveness between the stimuli set can be tested by comparing the EDI values between conditions within the early visual processing. We used a Harvard-cortical ROI mask based on bilateral V1. Negative EDI values indicate that the same exemplars are represented more similarly in the neural V1 pattern than different exemplars. This analysis showed that there was no significant difference in EDI between conditions (Figure 4 – figure supplement 4b; EDI paired sample t-test: t(23) = -0.16, p=0.87, 95% CI [-6.7 5.7]).

      We have further replicated results in V1 with a whole-brain searchlight analysis, averaging across both social and non-social conditions.

      In summary, by using a combination of univariate and multivariate analyses, we could test whether neural activation might be different when participants were presented with a facial or fruit stimuli and whether these differences might confound observed learning differences between conditions. We did not find meaningful neural differences that were not accounted for with the regressors included in the GLM. Further, we did not find differences in the visual distinctiveness between the stimuli sets. Hence, these control analyses suggest that differences between social and non-social conditions might not arise because of differences in low-level processes but are instead more likely to develop when learning about social or non-social information.

      Moreover, we also examined behaviourally whether participants differed in the way they approached social and non-social condition. We tested whether there were initial biases prior to learning, i.e. before actually receiving information from either social or non-social information sources. Therefore, we tested whether participants have different prior expecations about the performance of social compared to non-social predictors. We compared the confidence judgments at the first trial of each predictor. We found that participants set confidence intervals very similarly in social and non-social conditions (Figure below). Hence, it did not seem to be the case that differences between conditions arose due to low level differences in stimulus sets or prior differences in expectations about performances of social compared to non-social predictors. However, we can show that differences between conditions are apparent when updating one’s belief about social advisors or non-social cues and as a consequence, in the way that confidence judgments are set across time.

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Main text page 13:

      [… ]<br /> Additional control analyses show that neural differences between social and non-social conditions were not due to the visually different set of stimuli used in the experiment but instead represent fundamental differences in processing social compared to non-social information (Figure 4 – figure supplement 4). These results are shown in ROI-based RSA analysis and in whole-brain searchlight analysis. In summary, in conjunction, the univariate and multivariate analyses demonstrate that dmPFC and pTPJ represent beliefs about social advisors that develop over a longer timescale and encode the identities of the social advisors.

      References

      1. Heyes, C. (2012). What’s social about social learning? Journal of Comparative Psychology 126, 193–202. 10.1037/a0025180.
      2. Chang, S.W.C., and Dal Monte, O. (2018). Shining Light on Social Learning Circuits. Trends in Cognitive Sciences 22, 673–675. 10.1016/j.tics.2018.05.002.
      3. Diaconescu, A.O., Mathys, C., Weber, L.A.E., Kasper, L., Mauer, J., and Stephan, K.E. (2017). Hierarchical prediction errors in midbrain and septum during social learning. Soc Cogn Affect Neurosci 12, 618–634. 10.1093/scan/nsw171.
      4. Frith, C., and Frith, U. (2010). Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Neuron 65, 739–743. 10.1016/j.neuron.2010.03.015.
      5. Frith, C.D., and Frith, U. (2012). Mechanisms of Social Cognition. Annu. Rev. Psychol. 63, 287–313. 10.1146/annurev-psych-120710-100449.
      6. Grabenhorst, F., and Schultz, W. (2021). Functions of primate amygdala neurons in economic decisions and social decision simulation. Behavioural Brain Research 409, 113318. 10.1016/j.bbr.2021.113318.
      7. Lockwood, P.L., Apps, M.A.J., and Chang, S.W.C. (2020). Is There a ‘Social’ Brain? Implementations and Algorithms. Trends in Cognitive Sciences, S1364661320301686. 10.1016/j.tics.2020.06.011.
      8. Soutschek, A., Ruff, C.C., Strombach, T., Kalenscher, T., and Tobler, P.N. (2016). Brain stimulation reveals crucial role of overcoming self-centeredness in self-control. Sci. Adv. 2, e1600992. 10.1126/sciadv.1600992.
      9. Wittmann, M.K., Lockwood, P.L., and Rushworth, M.F.S. (2018). Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118. 10.1146/annurev-neuro080317-061450.
      10. Shafto, P., Goodman, N.D., and Frank, M.C. (2012). Learning From Others: The Consequences of Psychological Reasoning for Human Learning. Perspect Psychol Sci 7, 341– 351. 10.1177/1745691612448481.
      11. McGuire, J.T., Nassar, M.R., Gold, J.I., and Kable, J.W. (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron 84, 870–881. 10.1016/j.neuron.2014.10.013.
      12. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10, 1214– 1221. 10.1038/nn1954.
      13. Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., and Rushworth, M.F.S. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat Commun 8, 1942. 10.1038/s41467-017-02169-w.
      14. Allenmark, F., Müller, H.J., and Shi, Z. (2018). Inter-trial effects in visual pop-out search: Factorial comparison of Bayesian updating models. PLoS Comput Biol 14, e1006328. 10.1371/journal.pcbi.1006328.
      15. Wittmann, M., Trudel, N., Trier, H.A., Klein-Flügge, M., Sel, A., Verhagen, L., and Rushworth, M.F.S. (2021). Causal manipulation of self-other mergence in the dorsomedial prefrontal cortex. Neuron.
      16. Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., and Rushworth, M.F.S. (2016). Self-Other Mergence in the Frontal Cortex during Cooperation and Competition. Neuron 91, 482–493. 10.1016/j.neuron.2016.06.022.
      17. Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., and Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nat Neurosci 23, 130–137. 10.1038/s41593-019-0549-2.
      18. Trudel, N., Scholl, J., Klein-Flügge, M.C., Fouragnan, E., Tankelevitch, L., Wittmann, M.K., and Rushworth, M.F.S. (2021). Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav. 10.1038/s41562-020-0929-3.
      19. Yu, Z., Guindani, M., Grieco, S.F., Chen, L., Holmes, T.C., and Xu, X. (2022). Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35. 10.1016/j.neuron.2021.10.030.
      20. Mars, R.B., Jbabdi, S., Sallet, J., O’Reilly, J.X., Croxson, P.L., Olivier, E., Noonan, M.P., Bergmann, C., Mitchell, A.S., Baxter, M.G., et al. (2011). Diffusion-Weighted Imaging Tractography-Based Parcellation of the Human Parietal Cortex and Comparison with Human and Macaque Resting-State Functional Connectivity. Journal of Neuroscience 31, 4087– 4100. 10.1523/JNEUROSCI.5102-10.2011.
      21. Yu, A.J., and Cohen, J.D. Sequential effects: Superstition or rational behavior? 8.
      22. Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., and Kriegeskorte, N. (2014). A Toolbox for Representational Similarity Analysis. PLoS Comput Biol 10, e1003553. 10.1371/journal.pcbi.1003553.
      23. Lockwood, P.L., Wittmann, M.K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., and Apps, M.A.J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Current Biology 32, 4172-4185.e7. 10.1016/j.cub.2022.08.010.
    1. Author Response

      Reviewer #2 (Public Review):

      1) Although the images and videos were of great quality, the results derived from them provided little new knowledge and few conceptual insights into male reproductive tract biology and basically confirmed what has been published using traditional methods. For example, the high intensity of the vascular network in the initial segment was previously reported by Abe in 1984 and Suzuki in 1982; the pattern of the major lymphatic vessel and drainage was beautifully depicted by Perez-Clavier, 1982.

      We thank the reviewer for his/her appreciative comments regarding the quality of the images/videos we provide in this study. We do not fully agree with his/her assessment of the lack of novelty. Our work confirms earlier reports that are now dated (1980s), which in itself is worth mentioning for the interested community, especially when the confirmation uses the most advanced technologies available today. We have never said that nothing was done in the past, and we have acknowledged all past contributors (including those mentioned by the reviewer) by pointing out the limitations of the technical tools that were available at the time. In addition, our current work provides a more comprehensive and global view by extending our approach to the entire mouse epididymis, whereas previous work was much more limited.

      2) The authors were very cautious when interpreting the results of marker immunostaining however these markers were not specific for a definite cell type. For example, as the authors stated, VEGFR3 marks both lymphatic vessels and fenestrated blood vessels. how could the authors claim the VEGFR3+ network was lymphatic? The authors claimed that they used three markers for the lymphatic vessel. But staining results of the networks were very different. How could the author make conclusions about the network of lymphatic vessels in the epididymis?

      We broadly agree with the reviewer and have made it clear that one cannot be 100% sure that all the VEGFR3+ structures we present are lymphatic. However, in total, we used 4 documented lymphatic markers (not 3 as mentioned by the reviewer) which are (VEGFR3, LYVE1, PROX1 and PDPN). Three of them give very similar profiles, while only PDPN shows some differences. We are currently studying in more detail the expression of PDPN in the mouse epididymis because we speculate that this marker may target a population of pluripotent cells in this tissue. Therefore, with the 3 similar profiles and with the subtraction of PVLAP+ structures, we are pretty confident that what we show corresponds to the different lymphatic structures.

      3) To understand the vascular network development in the epididymis, would the authors please look at the fetal stage when the vascular network is established in the first place? Wolffian duct tissues are much smaller and thinner and would be amenable for 3D imaging probably even without clearing.

      We generally agree with the reviewer that this could be an interesting addition. However, it represents a significant amount of additional work. Organ clearing will certainly be required because it is unlikely that Wolffian duct will be sufficiently transparent to allow lightsheet microscopy. In the literature, the study of Wolffian duct relies primarily on whole mounts, inclusions, and cryosections. Besides the fact that this represents a lot of extra work, we are not totally convinced that this would be of much use. A key reason is that the epididymis is an organ that differentiates completely after birth (Robaire and Hinton, 2015). It is reported that differentiation of mouse caput segment 1 occurs around 19DPN (Xu et al., 2016) and is intimately related to the development of the vasculature (Lebarr et al., 1986). Regarding the lymphatic network, Swingen et al, (2012) reports that lymphangiogenesis in the mouse testis and epididymis is initiated late in gestation after 15DPC. Videos showing the external lymphatic vessels of the testis and epididymis at 17.5DPC can be seen at https://doi.org/10.1371/journal.pone.0052620.s002. The authors indicate that lymphangiogenesis occurs via sprouting from the adjacent mesonephros. We hypothesize that the more internal lymphatics evolve between birth and 10DPN, which corresponds to the time when we observed LEPC Lyve1pos cells.

      4) Immunofluorescence staining of VEGF factors was not convincing. As a secreted factor, VEGF will be secreted out of the cells, would it be detected more in the interstitium? I am always skeptical about the results of immunostaining secreted growth factors. Would it be possible to perform in situ or RNAscope to confirm the spatial expression pattern of VEGFs?

      Well, active VEGF factors result from alternative mRNA splicing events and posttranslational proteolytic cleavage. Therefore, in our opinion, the study of VEGF mRNA by in situ hybridization or RNAscope analysis will not be very informative about the actual presence of active forms of VEGF in the epididymis. If necessary, we can provide as supplementary material immunohistochemistry data showing the presence of VEFG-A in the epididymal principal cells. Our major objective with these data was to show that VEGF factors and their respective receptors were present in the epididymis. Nevertheless, in an attempt to convince the reviewer, we provide as accompanying data to this rebuttal letter new sets of figures (Figures VEGF-A-response editor & VEGFC /VEGF-D-response editor) that we believe can improve the perception of our data. If the editorial office feels it is necessary, these figures could be added to the supplementary figure set (as Figure 6figure supplement 1 and Figure 6-figure supplement 2). For VEGF-A the data exists already in the literature as we have indicated (Korpelainen, 1998). In fine, our goal was not to show which cell types of the epididymis epithelium produce VEGFs but rather than VEGF factors and their receptors where there in order to support angiogenesis or lymphangiogenic activity in the tissue. In addition, we hypothesize that because septa have been reported to constitute barriers between segments restricting passive diffusion of molecules (Turner et al., 2003; Stammler et al., 2015), the VEGF factors are expected to be produced locally.

      Figure VEGF-A - response editor : Immunofluorescence of the angiogenic ligand VEGF-A in the epididymis. Figure 6 shows that this ligand is mainly found in the caput and more precisely in S1.It is very strongly expressed in the peritubular microvascularization of the SI which expresses the VEGFR3:YFP transgene whereas it is less expressed by intertubular blood vessels (asterisk). This seems to indicate that it is the peritubular vessels that are in the majority responsible for the angiogenic activity measured in our study. Furthermore, it is expressed by the epithelium as secretory vesicles (IS, and S3 and enlargement) which is in agreement with in situ hybridization work performed by Korpelainene E.I et al J.Cell.biol 1998). The enlargement shown in S3_Z shows the sagital plane of the tubule where one can distinguish VEGFR:YFP positive cells that strongly express are also VEGF-A positive indicating that the same cells of the epithelium express both the receptor and the ligand. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      Figure VEGF-C / VEGF-D - response editor : Immunofluorescence of VEGF-C and VEGF-D lymphangiogenic ligands in the epididymis. This figure shows that these ligands are mainly found in the interstitial tissue throughout the organ with a higher proportion in the caudal part. This expression may be largely driven by fibroblasts, which are widely represented in the interstitium, or by endothelial cells, since these two ligands are expressed by these cell types. However, as shown in the figures and in the enlargement of panel A, VEGF-C is also produced by epithelial cells within what may appear as secretory vesicles. In contrast, for VEGF-D, we observe only few weakly positive epithelial cells (panel B). These ligands are also detected in the lumen of epididymal tubules (visible for VEGF-C Panel A S2). This presence may be explained by lumicrine transfer from the testis, in addition to secretion from epithelial cells. Here the transgene is detected directly without the use of an anti-GFP which allows to enhance the signal.

      5) The study is descriptive and does not provide functional and mechanistic insights. Maybe, the combination of 3D imaging with lineage tracing of endothelium cells or ligation study (removal/ligation of the certain vessel) would help better understand how the vascular network is established and their functional significance.

      The technical approaches suggested by the reviewer could certainly improve our understanding of the rather complex epididymal vascular network. Taken together, they represent the body of a comprehensive follow-up study that is worth undertaking.

      6) Immune response is among many physiological processes in which vascular networks play significant roles. Discussion would be needed in other physiological processes, such as tissue metabolism and stem/progenitor cell niche microenvironment.

      We agree with the reviewer that the mammalian vasculature is involved in other physiological processes beyond immune/inflammatory responses. We have deliberately chosen to focus our discussion on the inflammatory and immune context of the epididymis, as we believe this is the most relevant aspect. It is also in full agreement with the research that our team has been conducting for 15 years to try to understand the complex orchestration of tolerance versus immune surveillance in this territory. This is a finely tuned process that, if properly understood, can help to understand and appropriately treat clinical situations of infertility and/or urological problems. As our discussion section is already quite long, we feel that it was not justified to extend it further on other aspects. However, in response to the reviewer's suggestion, we now mention at the end of the first paragraph of the discussion that the epididymal vascular network is likely to serve different processes in this tissue (page 9, lines 299 to 303).

      7) How could the author determine the Cd-A labeled vessel in Fig 1 was an artery, not a vein? This leads to another critical question. Would it be possible to stain with artery and vein markers to help illustrate the blood flow directions of the vessel?

      The reviewer is right on the fact that we arbitrarily called the Cd-A vessel in Figure 1 an artery. Cd-A is not an acronym we use anymore. What we have done is to use the acronym SEA (superior epididymal artery) to indicate what we firmly believe to be an artery, as also suggested by previous literature (e.g., Suzuki, 1982; Abe et al, 1982) in which this same structure has been consistently referred to as an artery. For other blood vessels, we now have used the acronym "Cd-BV" because we do not know whether we are dealing with a vein or an artery as rightfully pointed out by the reviewer. This is clearly stated in the legend of Figure 1.

    1. Author Response:

      Reviewer #1:

      The manuscript “A computationally designed fluorescent biosensor for D-serine" by Vongsouthi et al. reports the engineering of a fluorescent biosensor for D-serine using the D-alanine-specific solute-binding protein from Salmonella enterica (DalS) as a template. The authors engineer a DalS construct that has the enhanced cyan fluorescent protein (ECFP) and the Venus fluorescent protein (Venus) as terminal fusions, which serve as donor and acceptor fluorophores in resonance energy transfer (FRET) experiments. The reporters should monitor a conformational change induced by solute binding through a change of the FRET signal. The authors combine homology-guided rational protein engineering, in-silico ligand docking and computationally guided, stabilizing mutagenesis to transform DalS into a D-serine-specific biosensor applying iterative mutagenesis experiments. Functionality and solute affinity of modified DalS is probed using FRET assays. Vongsouthi et al. assess the applicability of the finally generated D-serine selective biosensor (D-SerFS) in-situ and in-vivo using fluorescence microscopy.

      Ionotropic glutamate receptors are ligand-gated ion channels that are importantly involved in brain development, learning, memory and disease. D-serine is a co-agonist of ionotropic glutamate receptors of the NMDA subtype. The modulation of NMDA signalling in the central nervous system through D-serine is hardly understood. Optical biosensors that can detect D-serine are lacking and the development of such sensors, as proposed in the present study, is an important target in biomedical research.

      The manuscript is well written and the data are clearly presented and discussed. The authors appear to have succeeded in the development of D-serine-selective fluorescent biosensor. But some questions arose concerning experimental design. Moreover, not all conclusions are fully supported by the data presented. I have the following comments.

      1) In the homology-guided design two residues in the binding site were mutated to the ones of the D-serine specific homologue NR1 (i.e. F117L and A147S), which lead to a significant increase of affinity to D-serine, as desired. The third residue, however, was mutated to glutamine (Y148Q) instead of the homologous valine (V), which resulted in a substantial loss of affinity to D-serine (Table 1). This "bad" mutation was carried through in consecutive optimization steps. Did the authors also try the homologous Y148V mutation? On page 5 the authors argue that Q instead of V would increase the size of the side chain pocket. But the opposite is true: the side chain of Q is more bulky than the one of V, which may explain the dramatic loss of affinity to D-serine. Mutation Y148V may be beneficial.

      Yes, we have previously tested the mutation of position 148 to valine (V). We have now included this data in the paper as Supplementary Information Figure 1 (below). The fluorescence titration showed that the 148V variant displayed poor D-serine specificity compared to Q148 at the same position (the sequence background of the variant was F117L/A147S/D216E/A76D. Thus, Q was superior to V at this position and V was not taken forward for further engineering. In the text, we meant that Q would increase the size of the side chain pocket relative to the wild-type amino acid, Y. We can see that this is unclear and have updated this sentence.

      Supplementary Figure 1. Dose-response curves for F117L/A147S/Y148V/D216E/A76D (LSVED) with glycine, D-alanine and D-serine. Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. No significant change is detected in response to glycine. The KD for D-alanine and D-serine are estimated to be > 4000 mM based on fitting curves with the following equation:

      2) Stabilities of constructs were estimated from melting temperatures (Tm) measured using thermal denaturation probed using the FRET signal of ECFP/Venus fusions. I am not sure if this methodology is appropriate to determine thermal stabilities of DalS and mutants thereof. Thermal unfolding of the fluorescence labels ECFP and Venus and their intrinsic, supposedly strongly temperature-dependent fluorescence emission intensities will interfere. A deconvolution of signals will be difficult. It would be helpful to see raw data from these measurements. All stabilities are reported in terms of deltaTm. What is the absolute Tm of the reference protein DalS? How does the thermal stability of DalS compare to thermal stabilities of ECFP and Venus? A more reliable probe for thermal stability would be the far-UV circular dichroism (CD) spectroscopic signal of DalS without fusions. DalS is a largely helical domain and will show a strong CD signal.

      We agree that raw data for the thermal denaturation experiments should be shown and have included this in the supporting information of an updated manuscript (Supplementary Data Figure 7). The data plots ECFP/Venus fluorescence ratio against temperature. When the temperature is increased from 20 to 90 °C, we observe two transitions in the ECFP/Venus fluorescence ratio. The fluorescent proteins are more thermostable than the DalS binding protein, and that temperature transition does not vary (~90 °C); thus, the first transition corresponds to the unfolding of the binding protein and the second transition to the unfolding or loss of fluorescence from the fluorescent proteins. This is an appropriate method for characterising the thermostability of the binding protein in the sensor for two main reasons. Firstly, the calculated melting temperature from the first sigmoidal transition changes upon mutation to the binding protein in a predictable way (e.g. mutations to the binding site/protein core are destabilising), while the second transition occurs consistently at ~ 90 °C. This supports that the first transition corresponds to the unfolding of the binding protein. Secondly, characterising the stability of the binding protein in the context of the full sensor is more relevant to the end-application. Excising the binding domain and testing that in isolation would results in data that are not directly relevant to the sensor. The absolute thermostabilities for all variants can be found in Table 1 of the manuscript.

      Supplementary Figure 7. The (475 nm/530 nm) fluorescence ratio as a function of increasing temperature (20 – 90 °C) for key variants in the engineering trajectory of D-serFS. Values are normalised as a percentage of the same ratio for the sensor at 20 °C and are represented as mean ± s.e.m. (n = 3). The first sigmoidal transition in the data changes upon mutation to the binding protein while the second transition begins at ~ 90 °C for all variants. The second transition is not observed in full as the upper temperature limit for the experiment is 90 °C.

      3) The final construct D-SerFS has a dynamic range of only 7%, which is a low value. It seems that the FRET signal change caused by ligand binding to the construct is weak. Is it sufficient to reliably measure D-serine levels in-situ and in-vivo?

      First, we have modified the sensor, which now has a dynamic range of 14.7% (Figure 5, below). The magnitude of the change is reasonable for this sensor class; they function with relative low dynamic range because they are ratiometric sensors, i.e. they are accurate even with low dynamic range because of their ratiometric property. For example, the Gly-sensor GlyFS published in 2018 (Nature Chem. Biol.) has one of the highest dynamic ranges in this sensor class of only ~28%. The Glu sensor described by Okumuto et al., (2005) (PNAS, 102, 8740) has a dynamic range of ~9%. So, the FRET change is not a low value for ratiometric sensors of this class (which have been used very effectively for over a decade). Most importantly, the data from experiments with biological tissue and in vivo (Fig. 6) demonstrate a detectable (and statistically significant) response to changes in D-serine concentration in tissue.

      Figure 5. Characterization of full-length D-serFS. (A) Schematic showing the ECFP (blue), D-serFS binding protein (D-serFS BP; grey) and Venus (yellow) domains in D-serFS. The C-terminal residues of the Venus fluorescent protein sequence are labelled, showing the truncated (top) and full-length (bottom) C-terminal sequences. The underlined amino acids in truncated D-serFS represent residues introduced from the backbone vector sequence during cloning. Represents the STOP codon. (B) Sigmoidal dose response curves for truncated and full-length D-serFS with D-serine (n = 3). Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. (C) Binding affinities (M) determined by fluorescence titration of truncated and full-length D-serFS, for glycine, D-alanine and D-serine (n = 3).*

      In Figure 5H in-vivo signal changes show large errors and the signal of the positive sample is hardly above error compared to the signal of the control.

      We have removed the in vivo data. Regardless, the comment is incorrect. Statistical analysis confirms that there is no significant change in the control (P = 0.08411), whereas the change for the sample with D-serine was significant to P = 0.00998.

      “H) ECFP/Venus ratio recorded in vivo in control recordings (left panel, baseline recording first, control recording after 10 minutes; paired two-sided Student’s t-test vs. baseline, t(6) = -2.07,P = 0.08411; n = 6 independent experiments) and during D-serine application (right panel, baseline recording first, second recording after D-serine injection, 1 mM; paired two-sided Student’s t-test vs. baseline, t(3) = -5.85,P = 0.00998; n = 4 independent experiments). Values are mean +- s.e.m. throughout. **P < 0.01.”

      Figure 5G is unclear. What does the fluorescence image show?

      We have removed the in-vivo data from the manuscript. However, Figure 6 in the original manuscript shows a schematic of how the sensor is applied to the brain for in-vivo experiments (biotin injection, followed by sensor injection and then imaging). The fluorescence image shows the detected Venus fluorescence following pressure loading of the sensor into the brain.

      Work presented in this manuscript that assesses functionality and applicability of the developed sensor in-situ and in-vivo is limited compared to the work showing its design. For example, control experiments showing FRET signal changes of the wild-type ECFP-DalS-Venus construct in comparison to the designed D-SerFS would be helpful to assess the outcome.

      Indeed, the in situ and in vivo work was never the focus of the study, which is already a large paper. To avoid confusion, the in vivo work is now omitted and the in situ work is present to show proof, in principle, that the sensor can be used to image D-serine. We reiterate – this is a protein engineering paper, not a neuroscience paper.

      4) The FRET spectra shown in Supplementary Figure 2, which exemplify the measurement of fluorescence ratios of ECFP/Venus, are confusing. I cannot see a significant change of FRET upon application of ligand. The ratios of the peak fluorescence intensities of ECFP and Venus (scanned from the data shown in Supplementary Figure 2) are the same for apo states and the ligand-saturated states. Instead what happens is that fluorescence emission intensities of both the donor and the acceptor bands are reduced upon application of ligand.

      We thank the reviewer for bringing this to our attention. The spectra were not normalised to account for the effect of dilution when saturating with ligand, giving rise to an observed decrease in emission intensity from both ECFP and Venus. We can also see how the figure is hard to interpret when both variants are displayed on the same axes, so we have separated them in an updated figure shown below and normalised the data as a percentage of the maximum emission intensity from ECFP at 475 nm. This has been changed in the supporting information of an updated manuscript. Hopefully it is now clear that there is a ratiometric change upon addition of ligand.

      Figure 3. Emission spectra (450 – 550 nm) of (A) LSQED and (B) LSQED-T197Y (LSQEDY) upon excitation of ECFP (lexc = 433 nm), normalised to the maximum emission intensity from ECFP (475 nm). For all sensor variants, the FRET efficiency decreases in response to saturation with D-serine (A, B; orange), leading to decreased emission from Venus (530 nm) relative to ECFP (475 nm). When comparing the apo states of LSQED and LSQEDY (A, B; dark green), it can be seen that the T197Y mutation results in a decreased Venus emission (lower FRET efficiency). This suggests a shift in the apo population of the sensor towards the spectral properties of the saturated, closed state and explains the decreased dynamic range of LSQEDY compared to LSQED. Values are mean ± s.e.m (n = 3).

      Reviewer #2:

      The authors describe the development and use of a D-Serine sensor based on a periplasmic ligand binding protein (DalS) from Salmonella enterica in conjunction with a FRET readout between enhanced cyan fluorescent protein and Venus fluorescent protein. They rationally identify point mutations in the binding pocket that make the binding protein somewhat more selective for D-serine over glycine and D-alanine. Ligand docking into the binding site, as well as algorithms for increasing the stability, identified further mutants with higher thermostability and higher affinity for D-serine. The combined computational efforts lead to a sensor for D-serine with higher affinity for D-serine (Kd = ~ 7 µM), but also showed affinity for the native D-alanine (Kd = ~ 13 uM) and glycine (Kd = ~40 uM). Molecular simulations were then used to explain how remote mutations identified in the thermostability screen could lead to the observed alteration of ligand affinity. Finally, the D-SerFS was tested in 2P-imaging in hippocampal slices and in anesthetized mice using biotin-straptavidin to anchor exogenously applied purified protein sensor to the brain tissue and pipetting on saturating concentrations of D-serine ligand.

      Although presented as the development of a sensor for biology, this work primarily focuses on the application of existing protein engineering techniques to alter the ligand affinity and specificity of a ligand-binding protein domain. The authors are somewhat successful in improving specificity for the desired ligand, but much context is lacking. For any such engineering effort, the end goals should be laid out as explicitly as possible. What sorts of biological signals do they desire to measure? On what length scale? On what time scale? What is known about the concentrations of the analyte and potential competing factors in the tissue? Since the authors do not demonstrate the imaging of any physiological signals with their sensor and do not discuss in detail the nature of the signals they aim to see, the reader is unable to evaluate what effect (if any) all of their protein engineering work had on their progress toward the goal of imaging D-serine signals in tissue.

      As a paper describing a combination of protein engineering approaches to alter the ligand affinity and specificity of one protein, it is a relatively complete work. In its current form trying to present a new fluorescent biosensor for imaging biology it is strongly lacking. I would suggest the authors rework the story to exclusively focus on the protein engineering or continue to work on the sensor/imaging/etc until they are able to use it to image some biology.

      Additional Major Points:

      1) There is no discussion of why the authors chose to use non-specific chemical labeling of the tissue with NHS-biotin to anchor their sensor vs. genetic techniques to get cell-type specific expression and localization. There is no high-resolution imaging demonstrating that the sensor is localized where they intended.

      We use non-specific chemical labelling for proof-of-concept experiments that show the sensor can respond to changes in D-serine concentration in the extracellular environment of brain tissue. Cell-type specific expression of the sensor is possible based on our previous development of a similar sensor for glycine (Zhang et al., 2018; doi: https://doi.org/10.1038/s41589-018-0108-2) where the sensor was expressed by HEK293 cells and neurons, and targeted to the membrane. However, this is beyond the scope of this manuscript. Figure 5G of the original manuscript shows that the sensor (identified by Venus fluorescence) is localized to the area where D-serFS is pressure-loaded into the brain.

      2) Why does the fluorescence of both the CFP and they YFP decrease upon addition of ligand (see e.g. Supplementary Figure 2)? Were these samples at the same concentration? Is this really a FRET sensor or more of an intensiometric sensor? Is this also true with 2P excitation? How does the Venus fluorescence change when Venus is excited directly? Perhaps fluorescence lifetime measurements could help inform what is happening.

      Please see response to major comments from reviewer #1 and Figure 3. We hope this clarifies that the sensor is ratiometric. The sensor behaves similarly under two-photon excitation (2PE) as shown in Figure 5A.

      3) How reproducible are the spectral differences between LSQED and LSQED-T197Y? Only one trace for each is shown in Supplementary Figure 2 and the differences are very small, but the authors use these data to draw conclusions about the protein open-closed equilibrium.

      We have updated this to show data points representing the mean ± s.e.m (n = 3).

      4) The first three mutations described are arrived upon by aligning DalS (which is more specific for D-Ala) with the NMDA receptor (which binds D-Ser). The authors then mutate two of the ligand pocket positions of DalS to the same amino acid found in NMDAR, but mutate the third position to glutamine instead of valine. I really can't understand why they don't even test Y148V if their goal is a sensor that hopefully detects D-Ser similar to the native NMDAR. I'm sure most readers will have the same confusion.

      Please see response to major comments from reviewer #1. Additionally, while the NR1 binding domain of the NMDAR was used a structural guide for rational design of the DalS binding site, the high affinity of the NMDAR for both D-serine and glycine was not desirable in a D-serine-specific sensor.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors ask an interesting question as to whether working memory contains more than one conjunctive representation of multiple task features required for a future response with one of these representations being more likely to become relevant at the time of the response. With RSA the authors use a multivariate approach that seems to become the standard in modern EEG research.

      We appreciate the reviewer’s helpful comments on the manuscript and their encouraging comments regarding its potential impact.

      I have three major concerns that are currently limiting the meaningfulness of the manuscript: For one, the paradigm uses stimuli with properties that could potentially influence involuntary attention and interfere in a Stroop-like manner with the required responses (i.e., 2 out of 3 cues involve the terms "horizontal" or "vertical" while the stimuli contain horizontal and vertical bars). It is not clear to me whether these potential interactions might bring about what is identified as conjunctive representations or whether they cause these representations to be quite weak.

      We agree it is important to rule out any effects of involuntary attention that might have been elicited by our stimulus choices. To address the Reviewer’s concern, we conducted control analyses to test if there was any influence of Stroop-like interference on our measures of behavior or the conjunctive representation. To summarize these analyses (detailed in our responses below and in the supplemental materials), we found no evidence of the effect of compatibility on behavior or on the decoding of conjunctions during either the maintenance or test periods. Furthermore, we found that the decoding of the bar orientation was at chance level during the interval when we observe evidence of the conjunctive representations. Thus, we conclude that the compatibility of the stimuli and the rule did not contribute to the decoding of conjunctive representations or to behavior.

      Second, the relatively weak conjunctive representations are making it difficult to interpret null effects such as the absence of certain correlations.

      The reviewer is correct that we cannot draw strong conclusions from null findings. We have revised the main text accordingly. In certain cases, we have also included additional analyses. These revisions are described in detail in response the reviewer’s comments below.

      Third, if the conjunctive representations truly are reflections of working memory activity, then it would help to include a control condition where memory load is reduced so as to demonstrate that representational strength varies as a function of load. Depending on whether these concerns or some of them can be addressed or ruled out this manuscript has the potential of becoming influential in the field.

      This is a clever suggestion for further experimentation. We agree that observing the adverse effect of memory load is one of the robust ways to assess the contributions of working memory system for future studies. However, given that decoding is noisy during the maintenance period (particularly for the low-priority conjunctive representation) even with a relatively low set-size, we expect that in order to further manipulate load, we would need to alter the research design substantially. Thus, as the main goal of the current study is to study prioritization and post-encoding selection of action-related information, we focused on the minimum set-size required for this question (i.e., load 2). However, we now note this load manipulation as a direction for future research in the discussion (pg. 18).

      Reviewer #2 (Public Review):

      Kikumoto and colleagues investigate the way visual-motor representations are stored in working memory and selected for action based on a retro-cue. They make use of a combination of decoding and RSA to assess at which stages of processing sensory, motor, and conjunctive information (consisting of sensory and motor representations linked via an S- R mapping) are represented in working memory and how these mental representations are related to behavioral performance.

      Strengths

      This is an elaborate and carefully designed experiment. The authors are able to shed further light on the type of mental representations in working memory that serve as the basis for the selection of relevant information in support of goal- directed actions. This is highly relevant for a better understanding of the role of selective attention and prospective motor representations in working memory. The methods used could provide a good basis for further research in this regard.

      We appreciate these helpful comments and the Reviewer’s positive comments on the impact of the work.

      Weaknesses

      There are important points requiring further clarification, especially regarding the statistical approach and interpretation of results.

      • Why is there a conjunction RSA model vector (b4) required, when all information for a response can be achieved by combining the individual stimulus, response, and rule vectors? In Figure 3 it becomes obvious that the conjunction RSA scores do not simply reflect the overlap of the other three vectors. I think it would help the interpretation of results to clearly state why this is not the case.

      Thank you for the suggestion, we’ve now added the theoretical background that motivates us to include the RSA model of conjunctive representation (pg. 4 and 5). In particular, several theories of cognitive control have proposed that over the course of action planning, the system assembles an event (task) file which binds all task features at all levels – including the rule (i.e., context), stimulus, and response – into an integrated, conjunctive representation that is essential for an action to be executed (Hommel 2019; Frings et al. 2020). Similarly, neural evidence of non-human primates suggests that cognitive tasks that require context-dependency (e.g., flexible remapping of inputs to different outputs based on the context) recruit nonlinear conjunctive representations (Rigotti et al. 2013; Parthasarathy et al. 2019; Bernardi et al. 2020; Panichello and Buschman, 2021). Supporting these views, we previously observed that conjunctive representations emerge in the human brain during action selection, which uniquely explained behavior such as the costs in transition of actions (Kikumoto & Mayr, 2020; see also Rangel & Hazeltine & Wessel, 2022) or the successful cancelation of actions (Kikumoto & Mayr, 2022). In the current study, by using the same set of RSA models, we attempted to extend the role of conjunctive representations for planning and prioritization of future actions. As in the previous studies (and as noted by the reviewer), the conjunction model makes a unique prediction of the similarity (or dissimilarity) pattern of the decoder outputs: a specific instance of action that is distinct from others actions. This contrasts to other RSA models of low-level features that predict similar patterns of activities for instances that share the same feature (e.g., S-R mappings 1 to 4 share the diagonal rule context). Here, we generally replicate the previous studies showing the unique trajectories of conjunctive representations (Figure 3) and their unique contribution on behavior (Figure 5).

      • One of the key findings of this study is the reliable representation of the conjunction information during the preparation phase while there is no comparable effect evident for response representations. This might suggest that two potentially independent conjunctive representations can be activated in working memory and thereby function as the basis for later response selection during the test phase. However, the assumption of the independence of the high and low priority conjunction representations relies only on the observation that there was no statistically reliable correlation between the high and low priority conjunctions in the preparation and test phases. This assumption is not valid because non-significant correlations do not allow any conclusion about the independence of the two processes. A comparable problem appeared regarding the non-significant difference between high and low-priority representations. These results show that it was not possible to prove a difference between these representations prior to the test phase based on the current approach, but they do not unequivocally "suggest that neither action plan was selectively prioritized".

      We appreciate this important point. We have taken care in the revision to state that we find evidence of an interference effect for the high-priority action and do not find evidence for such an effect from the low-priority action. Thus, we do not intend to conclude that no such effect could exist. Further, although it is not our intention to draw a strong conclusion from the null effect (i.e., no correlations), we performed an exploratory analysis where we tested the correlation in trials where we observed strong evidence of both conjunctions. Specifically, we binned trials into half within each time point and individual subject and performed the multi-level model analysis using trials where both high and low priority conjunctions were above their medians. Thus, we selected trials in such a way that they are independent of the effect we are testing. The figure below shows the coefficient of associated with low-priority conjunction predicting high-priority conjunction (uncorrected). Even when we focus on trials where both conjunctions are detected (i.e., a high signal-to-noise ratio), we observed no tradeoff. Again, we cannot draw strong conclusions based on the null result of this exploratory analysis. Yet, we can rule out some causes of no correlation between high and low priority conjunctions such as the poor signal-to-noise ratio of the low priority conjunctions. We have further clarified this point in the result (pg. 14).

      Fig. 1. Trial-to-trial variability between high and low priority conjunctions, using above median trials. The coefficients of the multilevel regression model predicting the variability in trial-to-trial highpriority conjunction by low-priority conjunction.

      • The experimental design used does not allow for a clear statement about whether pure motor representations in working memory only emerge with the definition of the response to be executed (test phase). It is not evident from Figure 3 that the increase in the RSA scores strictly follows the onset of the Go stimulus. It is also conceivable that the emergence of a pure motor representation requires a longer processing time. This could only be investigated through temporally varying preparation phases.

      We agree with the reviewer. Although we detected no evidence of response representations of both high and low priority action plans during the preparation phase, t(1,23) = -.514, beta = .002, 95% CI [-.010 .006] for high priority; t(1,23) = -1.57, beta = -.008, 95% CI [-.017 .002] for low priority, this may be limited by the relatively short duration of the delay period (750 ms) in this study. However, in our previous studies using a similar paradigm without a delay period (Kikumoto & Mayr, 2020; Kikumoto & Mayr, 2022), response representations were detected less than 300ms after the response was specified, which corresponds to the onset of delay period in this study. Further, participants in the current study were encouraged to prepare responses as early as possible, using adaptive response deadlines and performance-based incentives. Thus, we know of no reason why responses would take longer to prepare in the present study. But we agree that we can’t rule this out. We have added the caveat noted above, as well as this additional context in the discussion (pg. 16-17).

      • Inconsistency of statistical approaches: In the methods section, the authors state that they used a cluster-forming threshold and a cluster-significance threshold of p < 0.05. In the results section (Figure 4) a cluster p-value of 0.01 is introduced. Although this concerns different analyses, varying threshold values appear as if they were chosen in favor of significant results. The authors should either proceed consistently here or give very good reasons for varying thresholds.

      We thank the reviewer for noting this oversight. All reported significant clusters with cluster P-value were identified using a cluster-forming threshold, p < .05. We fixed the description accordingly.

      • Interpretation of results: The significant time window for the high vs. low priority by test-type interaction appeared quite late for the conjunction representation. First, it does not seem reasonable that such an effect appears in a time window overlapping with the motor responses. But more importantly, why should it appear after the respective interaction for the response representation? When keeping in mind that these results are based on a combination of time-frequency analysis, decoding, and RSA (quite many processing steps), I find it hard to really see a consistent pattern in these results that allows for a conclusion about how higher-level conjunctive and motor representations are selected in working memory.

      Thank you for raising this important point. First, we fixed reported methodological inconsistencies such as the cluster P-value and cluster-forming threshold). Further, we fully agree that the difference in the time course for the response and conjunctive representations in the low priority, tested condition is unexpected and would complicate the perspective that the conjunctive representation contributes to efficient response selection. However, additional analysis indicates that this apparent pattern in the stimulus locked result is misleading and there is a more parsimonious explanation. First, we wish to caution that the data are relatively noisy and likely are influenced by different frequency bands for different features. Thus, fine-grained temporal differences should be interpreted with caution in the absence of positive statistical evidence of an interaction over time. Indeed, though Figure 4 in the original submission shows a quantitative difference in timing of the interaction effect (priority by test type) across conjunctive representation and response representation, the direct test of this four way interaction [priority x test type x representation type (conjunction vs. response), x time interval (1500 ms to 1850 ms vs. 1850 to 2100 ms)] is not significant, t(1,23) = 1.65, beta = .058, 95% CI [-.012 .015]). The same analysis using response-aligned data is also not significant, t(1,23) = -1.24, beta = -.046, 95% CI [-.128 .028]). These observations were not dependent on the choice of time interval, as other time intervals were also not significant. Therefore, we do not have strong evidence that this is a true timing difference between these conditions and believe this is likely driven by noise.

      Further, we believe the apparent late emergence of difference in two conjunctions when the low priority action is tested is more likely due to a slow decline in the strength of the untested high priority conjunction rather than a late emergence of the low priority conjunction. This pattern is clearer when the traces are aligned to the response. The tested low priority conjunction emerges early and is sustained when it is the tested action and declines when it is untested (-226 ms to 86 ms relative to the response onset, cluster-forming threshold, p < .05). These changes eventually resulted in a significant difference in strength between the tested versus untested low priority conjunctions just prior to the commission of the response (Figure 4 - figure supplement 1, the panel on right column of the middle row, the black bars at the top of panel). Importantly, the high priority conjunction also remains active in its untested condition and declines later than the untested low priority conjunction does. Indeed, the untested high priority conjunction does not decline significantly relative to trials when it is tested until after the response is emitted (Figure 4 - figure supplement 1, the panel on right column of the middle row, the red bars at the top of panel). This results in a late emerging interaction effect of the priority and test type, but this is not due to a late emerging low priority conjunctive representation.

      In summary, we do not have statistical evidence of a time by effect interaction that allows us to draw strong inferences about timing. Nonetheless, even the patterns we observe are inconsistent with a late emerging low priority conjunctive representation. And if anything, they support a late decline in the untested high priority conjunctive representation. This pattern of the result of the high priority conjunction being sustained until late, even when it is untested, is also notable in light of our observation that the strength of the high priority conjunctive representation interferes behavior when the low priority item is tested, but not vice versa. We now address this point about the timing directly in the results (pg. 15-16) and the discussion (pg. 21), and we include the response locked results in the main text along with the stimulus locked result including exploratory analyses reported here.

      Reviewer #3 (Public Review):

      This study aims to address the important question of whether working memory can hold multiple conjunctive task representations. The authors combined a retro-cue working memory paradigm with their previous task design that cleverly constructed multiple conjunctive tasks with the same set of stimuli, rules, and responses. They used advanced EEG analytical skills to provide the temporal dynamics of concurrent working memory representation of multiple task representations and task features (e.g., stimulus and responses) and how their representation strength changes as a function of priority and task relevance. The results generally support the authors' conclusion that multiple task representations can be simultaneously manipulated in working memory.

      We appreciate these helpful comments, and were pleased that the reviewer shares our view that these results may be broadly impactful.

    1. Author Response

      Reviewer #2 (Public Review):

      Reviewer #2 was critical of every aspect of our manuscript and we were disappointed that they failed to appreciate the significance of our findings. However, we have responded to each point as described below:

      1) The experiment displayed in Figure 5 is deeply flawed for multiple reasons and should be removed from the manuscript entirely. A Michaelis-Menton plot compares the initial rate of a reaction versus substrate concentration. Instead, the authors plotted the fraction of SsrB that is phosphorylated after 10 minutes at various substrate concentrations. Such a plot must reach saturation because the enzyme is limiting, whereas it is not always possible to achieve saturation in a genuine Michaelis-Menton plot. Because no reaction rates were measured, it is not possible to derive kcat values from the data.

      Mea culpa. We now plot our phosphorylation data and describe the mid-point as a k0.5 and have removed Fig. 1g. When we directly compare the H12 mutant to wt at neutral pH, its phosphorylation level is less compared to the wt (see new Fig. 4a). The wt phosphorylation is reduced at acid pH, (Fig 4b), but with His12Q, there was no difference in phosphorylation between neutral and acid pH (Fig 4c). It is important to include this data, because in RcsB, a close homolog of SsrB, an H12A mutant was not phosphorylated by acetyl phosphate and it was incapable of binding to DNA, unlike what we show here with SsrB.

      (i) Increasing the concentration of the phosphoramidite substrate increased ionic strength. Response regulator active sites contain many charged moieties and autophosphorylation of at least one response regulator (CheY) is inhibited by increasing ionic strength (PMID 10471801).

      The reviewer raises some interesting points and they are based on CheY phosphorylation by small molecules. We have a long history of studying OmpR and SsrB as well as other RRs and we know that they can all behave very differently from “canonical signaling”. We examined the effect of ionic strength on SsrB phosphorylation and it was relatively insensitive to changes in ionic strength (our original buffer was 267-430 mOsm and in each case, we have 90% phosphorylation). However, we repeated all of the phosphorylation experiments and kept ionic strength constant. These data are now presented in the revised manuscript.

      (ii) Autophosphorylation with phosphoramidite is pH dependent because the nitrogen on the donor must be protonated to form a good leaving group (PMID 9398221). The pKa of phosphoramidite is ~8. Therefore, the fraction of phosphoramidite that is reactive (i.e., protonated) will be very different at pH 6.1 and 7.4.

      We are aware of those findings, but we are comparing the H12 mutant with the wt protein in each case. There is no reason to believe that the presence of the mutant should alter the phosphoramidate substrate, so we are comparing how the wt phosphorylation compares with the mutant (Fig 4b, c).

      (iii) Response regulator autophosphorylation absolutely depends on the presence of a divalent metal ion (usually Mg2+) in the active site (PMID 2201404). There is no guarantee that the 20 mM Mg2+ included in the reaction is sufficient to saturate SsrB. Furthermore, as the authors themselves note, the amino acid at SsrB position 12 is likely to affect the affinity of Mg2+ binding. Therefore, the fraction of SsrB that is reactive (i.e. has Mg2+ bound) may differ between wildtype and the H12Q mutant, and/or between wildtype at different pHs (because the protonation state of His12 changes).

      This is exactly the point that we are making. And why we varied the magnesium concentration (increasing to 50-100 mM). There was a slight increase in phosphorylation at 50 mM MgCl2 compared to 20 mM, and only a slight increase between 50 and 100 mM at pH 6.1. The revised phosphorylation experiments all contain 100 mM MgCl2.

      2) The data in Figures 1abcd and 3de are clearly sigmoidal rather than hyperbolic, indicating cooperativity. However, there are insufficient data points between the upper and lower bounds to accurately calculate the Hill coefficient or KD values. This limitation of the data means that comparisons of apparent Hill coefficient or KD values under different conditions cannot be the basis of credible conclusions.

      We respectfully disagree. In every curve that we provide, there is at least one data point in the transition between low and high binding. With the mutant H12Q, we did manage to get two data points in the transition and the KD was the same as the wildtype (Fig. 2). We provide an analysis of the binding curve which nicely demonstrates the range of KD values based on the lowest and highest error in the point (132-168 nM) and it doesn’t significantly change the value (this is now shown in Fig.1– figure supplement 1). The very high affinity we observed at pH 6.1 (KD ~5 nM) makes the range of possibilities between 4-8 nM (i.e. still VERY high affinity). These range in affinities at neutral and acid pH are very reminiscent of affinities we measured for OmpR and OmpR~P at the porin promoters, suggesting that acid pH puts SsrB in an activated state even in the absence of phosphorylation. A similar argument holds for the Hill coefficient (see Figure).

      3) There are hundreds of receiver domain structures in PDB. There is some variation, but to a first approximation receiver domain structures, all exhibit an (alpha/beta)5 fold. The structure of SsrB predicted by i-TASSER breaks the standard beta-2 strand into two parts, which throws off the numbering for subsequent beta strands. Given the highly conserved receiver domain fold, I am skeptical that the predicted i-TASSER structure is correct or adds any value to the manuscript. If the authors wish to retain the structure of the manuscript, then they should point out the unusual feature and the consequence of strand numbering.

      We now include a new model based on the RcsB/DNA crystal structure that eliminates this problem (see new Fig.2– figure supplement 2). We have replaced this model with an Alphafold prediction that was energy minimized to align with the RcsB dimer crystal structure (Fig.5– figure supplement 2). This model retains the original (beta/alpha)5 fold, so the classical numbering is retained.

      4) The detailed predictions of active site structure in Supplementary Figure 5 are not physiologically relevant because Mg2+ was not included in the simulation. The presence of a divalent cation binding to Asp10 and Asp11 is likely to substantially alter interactions between Asp 10, Asp11, His12, and Lys109.

      See response to 1iii, above and new Fig.5– figure supplement 2. Author response image 1 is a zoomed-in snapshot of supplementary Figure 8c that has been modelled using the RcsB dimer bound to BeF3 and Mg2+(6ZIX). Both the i-TASSER and Alphafold model receiver domains align well with this structure, and the polar contacts and pi-cation interactions made by His12 are maintained.

      Author response image 1.

      5) The authors present an AlphaFold model of an SsrB dimer, and note that His12 is at the dimer interface. However, the authors also believe that a higher-order oligomer of SsrB binds to DNA in a pH-dependent manner. Do the authors have any suggestions or informed speculation about how His12 might affect higher-order oligomerization than dimerization?

      As mentioned to point 3, above, we now include a new model of an SsrB dimer bound to DNA based on our NMR structure of the CTD and the RcsB/DNA structure. In the RcsB paper, they also have evidence for a higher-order oligomer in the crystal structure of unphosphorylated (and BeF3-) RcsB, which showed an asymmetric unit containing 6 molecules of RcsB, which form 3 dimers arranged in a hexameric structure that resembles a cylinder. This configuration involves a crossed conformation with the REC of one molecule interacting with the DBD of another and interestingly, His12 is interacting with the DBD of another molecule. We modelled an SsrB oligomer structure using the RcsB hexamer as a template and have included it as a new figure (see Fig.5– figure supplement 3) and in the revised discussion (lines 432-448).

    1. Author Response

      Reviewer #1 (Public Review):

      1) One nagging concern is that the category structure in the CNN reflects the category structure baked into color space. Several groups (e.g. Regier, Zaslavsky, et al) have argued that color category structure emerges and evolves from the structure of the color space itself. Other groups have argued that the color category structure recovered with, say, the Munsell space may partially be attributed to variation in saturation across the space (Witzel). How can one show that these properties of the space are not the root cause of the structure recovered by the CNN, independent of the role of the CNN in object recognition?

      We agree that there is overlap with the previous studies on color structure. In our revision, we show that color categories are directly linked to the CNN being trained on the objectrecognition task and not the CNN per se. We repeated our analysis on a scene-trained network (using the same input set) and find that here the color representation in the final layer deviates considerably from the one created for object classification. Given the input set is the same, it strongly suggests that any reflection of the structure of the input space is to the benefit of recognizing objects (see the bottom of “Border Invariance” section; Page 7). Furthermore, the new experiments with random hue shifts to the input images show that in this case stable borders do not arise, as might be expected if the border invariance was a consequence of the chosen color space only.

      A crucial distinction to previous results is also, is that in our analysis, by replacing the final layer, specifically, we look at the representation that the network has built to perform the object classification task on. As such the current finding goes beyond the notion that the color category structure is already reflected in the color space.

      2) In Figure 1, it could be useful to illustrate the central observation by showing a single example, as in Figure 1 B, C, where the trained color is not in the center of the color category. In other words, if the category structure is immune to the training set, then it should be possible to set up a very unlikely set of training stimuli (ones that are as far away from the center of the color category while still being categorized most of the time as the color category). This is related to what is in E, but is distinctive for two reasons: first, it is a post hoc test of the hypothesis recovered in the data-driven way by E; and second, it would provide an illustration of the key observation, that the category boundaries do not correspond to the median distance between training colors. Figure 5 begins to show something of this sort of a test, but it is bound up with the other control related to shape.

      We have now added a post-hoc test where we shift the training bands from likely to unlikely positions using the original paradigm: Retraining output layers whilst shifting training bands from the left to the right category-edge (in 9 steps) we can see the invariance to the category bounds specifically (see Supp. Inf.: Figure S11). The most extreme cases (top and bottom row) have the training bands right at the edge of the border, which are the interesting cases the reviewer refers to. We also added 7 steps in between to show how the borders shift with the bands.

      Similarly, if the claim is that there are six (or seven?) color categories, regardless of the number of colors used to train the data, it would be helpful to show the result of one iteration of the training that uses say 4 colors for training and another iteration of the training that uses say 9 colors for training.

      We have now included the figure presented in 1E, but for all the color iterations used (see SI: Figure S10. We are also happy to include a single iteration, but believe this gives the most complete view for what the reviewer is asking.

      The text asserts that Figure 2 reflects training on a range of color categories (from 4 to 9) but doesn’t break them out. This is an issue because the average across these iterations could simply be heavily biased by training on one specific number of categories (e.g. the number used in Figure 1). These considerations also prompt the query: how did you pick 4 and 9 as the limits for the tests? Why not 2 and 20? (the largest range of basic color categories that could plausibly be recovered in the set of all languages)?

      The number of output nodes was inspired by the number of basic color categories that English speakers observe in the hue spectrum (in which a number of the basic categories are not represented). We understand that this is not a strong reason, however, unfortunately the lack of studies on color categories in CNNs forced us to approach this in an explorative manner. We have adapted the text to better reflect this shortcoming (Bottom page 4). Naturally if the data would have indicated that these numbers weren’t a good fit, we would have adapted the range. (if there were more categories, we would have expected more noise and we would have increased the number of training bands to test this). As indicated above, we have now also included the classification plots for all the different counts, so the reader can review this as well (SI: Section 9).

      3) Regarding the transition points in Figure 2A, indicated by red dots: how strong (transition count) and reliable (consistent across iterations) are these points? The one between red and orange seems especially willfully placed.

      To answer the question on the consistency we have now included a repetition of the ResNet18, with the ResNet34, ResNet50 and ResNet101 in the SI (section 1). We have also introduced a novel section presenting the result of alternate CNNs to the SI (section S8). Despite small idiosyncrasies the general pattern of results recurs.

      Concerning the red-orange border, it was not willfully placed, but we very much understand that in isolation it looks like it could simply be the result of noise. Nevertheless, the recurrence of this border in several analyses made us confident that it does reflect a meaningful invariance. Notably:

      • We find a more robust peak between red and orange in the luminance control (SI section 3).

      • The evolutionary algorithm with 7 borders also places a border in this position.

      • We find the peak recurs in the Resnet-18 replication as well as several of the deeper ResNets and several of the other CNNs (SI section 1)

      • We also find that the peak is present throughout the different layers of the ResNet-18.

      4) Figure 2E and Figure 5B are useful tests of the extent to which the categorical structure recovered by the CNNs shifts with the colors used to train the classifier, and it certainly looks like there is some invariance in category boundaries with respect to the specific colors uses to train the classifier, an important and interesting result. But these analyses do not actually address the claim implied by the analyses: that the performance of the CNN matches human performance. The color categories recovered with the CNN are not perfectly invariant, as the authors point out. The analyses presented in the paper (e.g. Figure 2E) tests whether there is as much shift in the boundaries as there is stasis, but that’s not quite the test if the goal is to link the categorical behavior of the CNN with human behavior. To evaluate the results, it would be helpful to know what would be expected based on human performance.

      We understand the lack of human data was a considerable shortcoming of the previous version of the manuscript. We have now collected human data in a match-to-sample task modeled on our CNN experiment. As with the CNN we find that the degree of border invariance does fluctuate considerably. While categorical borders are not exact matches, we do broadly find the same category prototypes and also see that categories in the red-to-yellow range are quite narrow in both humans and CNNs. Please, see the new “Human Psychophysics” (page 8) addition in the manuscript for more details.

      5) The paper takes up a test of color categorization invariant to luminance. There are arguments in the literature that hue and luminance cannot be decoupled-that luminance is essential to how color is encoded and to color categorization. Some discussion of this might help the reader who has followed this literature.

      We have added some discussion of the interaction between luminance and color categories (e.g., Lindsay & Brown, 2009) at the bottom of page 6/ top of page 7. The current analysis mainly aimed at excluding that the borders are solely based on luminance.

      Related, the argument that “neighboring colors in HSV will be neighboring colors in the RGB space” is not persuasive. Surely this is true of any color space?

      We removed the argument about “neighboring colors”. Our procedure requires the use of a hue spectrum that wraps around the color space while including many of the highly saturated colors that are typical prototypes for human color categories. We have elected to use the hue spectrum from the HSV color space at full saturation and brightness, which is represented by the edges of the RGB color cube. As this is the space in which our network was trained, it does not introduce any deformations into the color space. Other potential choices of color space either include strong non-linear transformations that stretch and compress certain parts of the RGB cube, or exclude a large portion of the RGB gamut (yellow in particular).

      We have adapted the text to better reflect our reasoning (page 6, top of paragraph 2).

      6) The paper would benefit from an analysis and discussion of the images used to originally train the CNN. Presumably, there are a large number of images that depict manmade artificially coloured objects. To what extent do the present results reflect statistical patterns in the way the images were created, and/or the colors of the things depicted? How do results on color categorization that derive from images (e.g. trained with neural networks, as in Rosenthal et al and presently) differ (or not) from results that derive from natural scenes (as in Yendrikhovskij?).

      We initially hoped we could perhaps analyze differences between colors in objects and background like in Rosenthal, unfortunately in ImageNet we did not find clear differences between pixels in the bounding boxes of objects provided with ImageNet and pixels outside these boxes (most likely because the rectangular bounding boxes still contain many background pixels). However, if we look at the results from the K-means analysis presented in Figure S6 (Suppl. Inf.) of the supplemental materials and the color categorization throughout the layers in the objecttrained network (end of the first experiment on page 7) as well as the color categorization in humans (Human Psychophysics starting on page 8), we see very similar border positions arise.

      7) It could be quite instructive to analyze what's going on in the errors in the output of the classifiers, as e.g. in Figure 1E. There are some interesting effects at the crossover points, where the two green categories seem to split and swap, the cyan band (hue % 20) emerges between orange and green, and the pink/purple boundary seems to have a large number of green/blue results. What is happening here?

      One issue with training the network on the color task, is that we can never fully guarantee that the network is using color to resolve the task and we suspected that in some cases the network may rely on other factors as well, such as luminance. When we look at the same type of plots for the luminance-controlled task (see below left) presented in the supplemental materials we do not see these transgressions. Also, when we look at versions of the original training, but using more bands, luminance will be less reliable and we also don’t see these transgressions (see right plot below).

      8) The second experiment using an evolutionary algorithm to test the location of the color boundaries is potentially valuable, but it is weakened because it pre-determines the number of categories. It would be more powerful if the experiment could recover both the number and location of the categories based on the "categorization principle" (colors within a category are harder to tell apart than colors across a color category boundary). This should be possible by a sensible sampling of the parameter space, even in a very large parameter space.

      The main point of the genetic algorithm was to see whether the border locations would be corroborated by an algorithm using the principle of categorical perception. Unfortunately, an exact approach to determining the number of borders is difficult, because some border invariances are clearly stronger than others. Running the algorithm with the number of borders as a free parameter just leads to a minimal number of borders, as 100% correct is always obtained when there is only one category left. In general, as the network can simply combine categories into a class at no cost (actually, having less borders will reduce noise) it is to be expected that less classes will lead to better performance. As such, in estimating what the optimal category count would be, we would need to introduce some subjective trade-off between accuracy and class count.

      9) Finally, the paper sets itself up as taking "a different approach by evaluating whether color categorization could be a side effect of learning object recognition", as distinct from the approach of studying "communicative concepts". But these approaches are intimately related. The central observation in Gibson et al. is not the discovery of warm-vscool categories (these as the most basic color categories have been known for centuries), but rather the relationship of these categories to the color statistics of objects-those parts of the scene that we care about enough to label. This idea, that color categories reflect the uses to which we put our color-vision system, is extended in Rosenthal et al., where the structure of color space itself is understood in terms of categorizing objects versus backgrounds (u') and the most basic object categorization distinction, animate versus inanimate (v'). The introduction argues, rightly in our view, that "A link between color categories and objects would be able to bridge the discrepancy between models that rely on communicative concepts to incorporate the varying usefulness of color, on the one hand, and the experimental findings laid out in this paragraph on the other". This is precisely the link forged by the observation that the warmcool category distinction in color naming correlates with object-color statistics (Gibson, 2017; see also Rosenthal et al., 2018). The argument in Gibson and Rosenthal is that color categorization structure emerges because of the color statistics of the world, specifically the color statistics of the parts of the world that we label as objects, which is the same approach adopted by the present work. The use of CNNs is a clever and powerful test of the success of this approach.

      We are sorry we did not properly highlight the enormous importance of these two earlier papers in our previous version of the manuscript. We have now elaborated our description of Gibson’s work to better reflect the important relation between the usefulness of colors and color categories (Page 2, middle and Page 19 par. above methods). We think our work nicely extends the earlier work by showing that their approach works even at a more general level with more color categories,

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Abdellatef et al. describe the reconstitution of axonemal bending using polymerized microtubules (MTs), purified outer-arm dyneins, and synthesized DNA origami. Specifically, the authors purified axonemal dyneins from Chlamydomonas flagella and combined the purified motors with MTs polymerized from purified brain tubulin. Using electron microscopy, the authors demonstrate that patches of dynein motors of the same orientation at both MT ends (i.e., with their tails bound to the same MT) result in pairs of MTs of parallel alignment, while groups of dynein motors of opposite orientation at both MT ends (i.e., with the tails of the dynein motors of both groups bound to different MTs) result in pairs of MTs with anti-parallel alignment. The authors then show that the dynein motors can slide MTs apart following photolysis of caged ATP, and using optical tweezers, demonstrate active force generation of up to ~30 pN. Finally, the authors show that pairs of anti-parallel MTs exhibit bidirectional motion on the scale of ~50-100 nm when both MTs are cross-linked using DNA origami. The findings should be of interest for the cytoskeletal cell and biophysics communities.

      We thank the reviewer for these comments.

      We might be misunderstanding this reviewer’s comment, but the complexes with both parallel and anti-parallel MTs had dynein molecules with their tails bound to two different MTs in most cases, as illustrated in Fig.2 – suppl.1. The two groups of dyneins produce opposing forces in a complex with parallel MTs, and majority of our complexes had parallel arrangement of the MTs. To clarify the point, we have modified the Abstract:

      “Electron microscopy (EM) showed pairs of parallel MTs crossbridged by patches of regularly arranged dynein molecules bound in two different orientations depending on which of the MTs their tails bind to. The oppositely oriented dyneins are expected to produce opposing forces when the pair of MTs have the same polarity.”

      Reviewer #2 (Public Review):

      Motile cilia generate rhythmic beating or rotational motion to drive cells or produce extracellular fluid flow. Cilia is made of nine microtubule doublets forming a spoke-like structure and it is known that dynein motor proteins, which connects adjacent microtubule doublet, are the driving force of ciliary motion. However the molecular mechanism to generate motion is still unclear. The authors proved that a pair of microtubules stably linked by DNA-origami and driven by outer dynein arms (ODA) causes beating motion. They employed in vitro motility assay and negative stain TEM to characterize this complex. They demonstrated stable linking of microtubules and ODAs anchored on the both microtubules are essential for oscillatory motion and bending of the microtubules.

      Strength

      This is an interesting work, addressing an important question in the motile cilia community: what is the minimum system to generate a beating motion? It is an established fact that dynein power stroke on the microtubule doublet is the driving force of the beating motion. It was also known that the radial spoke and the central pair are essential for ciliary motion under the physiological condition, but cilia without radial spokes and the central pair can beat under some special conditions (Yagi and Kamiya, 2000). Therefore in the mechanistic point of view, they are not prerequisite. It is generally thought that fixed connection between adjacent microtubules by nexin converts sliding motion of dyneins to bending, but it was never experimentally investigated. Here the authors successfully enabled a simple system of nexin-like inter-microtubule linkage using DNA origami technique to generate oscillatory and beating motions. This enables an interesting system where ODAs form groups, anchored on two microtubules, orienting oppositely and therefore cause tag-of-war type force generation. The authors demonstrated this system under constraints by DNA origami generates oscillatory and beating motions.

      The authors carefully coordinated the experiments to demonstrate oscillations using optical tweezers and sophisticated data analysis (Fourier analysis and a step-finding algorithm). They also proved, using negative stain EM, that this system contains two groups of ODAs forming arrays with opposite polarity on the parallel microtubules. The manuscript is carefully organized with impressive movies. Geometrical and motility analyses of individual ODAs used for statistics are provided in the supplementary source files. They appropriately cited similar past works from Kamiya and Shingyoji groups (they employed systems closer to the physiological axoneme to reproduce beating) and clarify the differences from this study.

      We thank the reviewer for these comments.

      Weakness

      The authors claim this system mimics two pairs of doublets at the opposite sites from 9+2 cilia structure by having two groups of ODAs between two microtubules facing opposite directions within the pair. It is not exactly the case. In the real axoneme, ODA makes continuous array along the entire length of doublets, which means at any point there are ODAs facing opposite directions. In their system, opposite ODAs cannot exist at the same point (therefore the scheme of Dynein-MT complex of Fig.1B is slightly misleading).

      Actually, opposite ODAs can exist at the same point in our system as well, and previous work using much higher concentration of dyneins (e.g, Oda et al., J. Cell biol., 2007) showed two continuous arrays of dynein molecules between a pair of microtubules. To observe the structures of individual dynein molecules we used low concentrations of dynein and searched for the areas where dynein could be observed without superposition, but there were some areas where opposite dyneins existed at the same point.

      We realize that we did not clearly explain this issue, so we have revised the text accordingly.

      In the 1st paragraph of Results: “In the dynein-MT complexes prepared with high concentrations of dynein, a pair of MTs in bundles are crossbridged by two continuous arrays of dynein, so that superposition of two rows of dynein molecules is observed in EM images (Haimo et al., 1979; Oda et al., 2007). On the other hand, when a low concentration of the dynein preparation (6.25–12.5 µg/ml (corresponding to ~3-6 nM outer-arm dynein)) was mixed with 20-25 µg/ml MTs (200-250 nM tubulin dimers), the MTs were only partially decorated with dynein, so that we were able to observe single layers of crossbridges without superposition in many regions.” Legend of Fig. 1(C): “Note that the geometry of dyneins in the dynein-MT complex shown in (B) mimics that of a combination of the dyneins on two opposite sides of the axoneme (cyan boxes), although the dynein arrays in (B) are not continuous.”

      If they want to project their result to the ciliary beating model, more insight/explanation would be necessary. For example, arrays of dyneins at certain positions within the long array along one doublet are activated and generate force, while dyneins at different positions are activated on another doublet at the opposite site of the axoneme. This makes the distribution of dyneins and their orientations similar to the system described in this work. Such a localized activation, shown in physiological cilia by Ishikawa and Nicastro groups, may require other regulatory proteins.

      We agree that the distributions of activated dyneins in 3D are extremely important in understanding ciliary beating, and that other regulatory proteins would be required to coordinate activation in different places in an axoneme. However, the main goal of this manuscript is to show the minimal components for oscillatory movements, and we feel that discussing the distributions of activated dyneins along the length of the MTs would be too complicated and beyond the scope of this study.

      They attempted to reveal conformational change of ODAs induced by power stroke using negative stain EM images, which is less convincing compared to the past cryo-ET works (Ishikawa, Nicastro, Pigino groups) and negative stain EM of sea urchin outer dyneins (Hirose group), where the tail and head parts were clearly defined from the 3D map or 2D averages of two-dynein ODAs. Probably three heavy chains and associated proteins hinder detailed visualization of the tail structure. Because of this, Fig.2C is not clear enough to prove conformational change of ODA. This reviewer imagines refined subaverage (probably with larger datasets) is necessary.

      As the reviewer suggests, one of the reasons for less clear averaged images compared to the past images of sea urchin ODA is the three-headed structure of Chlamydomonas ODA. Another and perhaps the bigger reason is the difficulty of obtaining clear images of dynein molecules bound between 2 MTs by negative stain EM: the stain accumulates between MTs that are ~25 nm in diameter and obscures the features of smaller structures. We used cryo-EM with uranyl acetate staining instead of negative staining for the images of sea urchin ODA-MT complexes we previously published (Ueno et al., 2008) in order to visualize dynein stalks. We agree with the reviewer that future work with larger datasets and by cryo-ET is necessary for revealing structural differences.

      That having been said, we did not mean to prove structural changes, but rather intended to show that our observation suggests structural changes and thus this system is useful for analyzing structural changes in future. In the revised manuscript, we have extensively modified the parts of the paper discussing structural changes (Please see our response to the next comment).

      It is not clear, from the inset of Fig.2 supplement3, how to define the end of the tail for the length measurement, which is the basis for the authors to claim conformational change (Line263-265). The appearance of the tail would be altered, seen from even slightly different view angles. Comparison with 2D projection from apo- and nucleotide-bound 3-headed ODA structures from EM databank will help.

      We agree with the reviewer that difference in the viewing angle affects the apparent length of a dynein molecule, although the 2 MTs crossbridged by dyneins lie on the carbon membrane and thus the variation in the viewing angle is expected to be relatively small. To examine how much the apparent length is affected by the view angle, we calculated 2D-projected images of the cryo-ET structures of Chlamydomonas axoneme (emd_1696 and emd_1697; Movassagh et al., 2010) with different view angles, and measured the apparent length of the dynein molecule using the same method we used for our negative-stain images (Author response image 1). As shown in the plot, the effect of view angles on the apparent lengths is smaller than the difference between the two nucleotide states in the range of 40 degrees measured here. Thus, we think that the length difference shown in Fig.2-suppl.4 reflects a real structural difference between no-ATP and ATP states. In addition, it would be reasonable to think that distributions of the view angles in the negative stain images are similar for both absence and presence of ATP, again supporting the conclusion.

      Nevertheless, since we agree with the reviewer that we cannot measure the precise length of the molecule using these 2D images, we have revised the corresponding parts of the manuscript, adding description about the effect of view angles on the measured length in the manuscript.

      Author response image 1. Effects of viewing angles on apparent length. (A) and (B) 2D-projected images of cryo-electron tomograms of Chlamydomonas outer arm dynein in an axoneme (Movassagh et al., 2010) viewed from different angles. (C) apparent length of the dynein molecule measured in 2D-projected images.

      In this manuscript, we discuss two structural changes: 1) a difference in the dynein length between no-nucleotide and +ATP states (Fig.2-suppl.4), and 2) possible structural differences in the arrangement of the dynein heads (Fig.2-suppl.3). Although we realize that extensive analysis using cryo-ET is necessary for revealing the second structural change, we attempted to compare the structures of oppositely oriented dyneins, hoping that it would lead to future research. In the revised manuscript, we have added 2D projection images of emd_1696 and emd_1697 in Fig.2-suppl.3, so that the readers can compare them with our negative stain images. We had an impression that some of our 2D images in the presence of ATP resembled the cryo-ET structure with ADP.Vi, whereas some others appeared to be closer to the no-nucleotide cryo-ET structure. We have also attempted to calculate cross-correlations, but difficulties in removing the effect of MTs sometimes overlapped with a part of dynein, adjusting the magnifications and contrast of different images prevented us from obtaining reliable results.

      To address this and the previous comments, we have extensively modified the section titled ‘Structures of dynein in the dynein-MT-DNA-origami complex’.

      In Fig.5B (where the oscillation occurs), the microtubule was once driven >150nm unidirectionally and went back to the original position, before oscillation starts. Is it always the case that relatively long unidirectional motion and return precede oscillation? In Fig.7B, where the authors claim no oscillation happened, only one unidirectional motion was shown. Did oscillation not happen after MT returned to the original position?

      Long unidirectional movement of ~150 nm was sometimes observed, but not necessarily before the start of oscillation. For example, in Figure 5 – figure supplement 1A, oscillation started soon after the UV flash, and then unidirectional movement occurred.

      With the dynein-MT complex in which dyneins are unidirectionally aligned (Fig.7B, Fig.7-suppl.2), the MTs kept moving and escaped from the trap or just stopped moving probably due to depletion of ATP, so we did not see a MT returning to the original position.

      Line284-290: More characterization of bending motion will be necessary (and should be possible). How high frequency is it? Do they confirm that other systems (either without DNA-origami or without ODAs arraying oppositely) cannot generate repetitive beating?

      The frequencies of the bending motions measured from the movies in Fig.8 and Fig.8-suppl.1 were 0.6 – 1 Hz, and the motions were rather irregular. Even if there were complexes bending at high frequencies, it would not have been possible to detect them due to the low time resolution of these fluorescence microscopy experiments (~0.1 s). Future studies at a higher time resolution will be necessary for further characterization of bending motions.

      To observe bending motions, the dynein-MT complex should be fixed to the glass or a bead at one part of the complex while the other end is free in solution. With the dynein-MT-DNA-origami complexes, we looked for such complexes and found some showing bending motions as in Fig. 8. To answer the reviewer’s question asking if we saw repetitive bending in other systems, we checked the movies of the complexes without DNA-origami or without ODAs arraying oppositely but did not notice any repetitive bending motions. However, future studies using the system with a higher temporal resolution and perhaps with an improved method for attaching the complex would be necessary in these cases as well.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, this study is well designed with convincing experimental data. The following critiques should be considered:

      1) It is important to examine whether the phenotype of METTL18 KO is mediated through change with RPL3 methylation. The functional link between METTL18 and RPL3 methylation on regulating translation elongation need to be examined in details.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

      2) The obvious discrepancy between the recent NAR an this study lies in the ribosomal profiling results (such as Fig.S5). The cell line specific regulation between HAP1 (previously used in NAR) vs 293T cell used here ( in this study) needs to be explored. For example, would METLL18 KO in HAP1 cells cause polysome profiling difference in this study? Some of negative findings in this study (such as Fig.S3B, Fig.S5A) would need some kind of positive control to make sure that the assay condition would be working.

      According to the reviewer’s suggestion, we conducted polysome profiling of the HAP1 cells with METTL18 knockout. For this assay, we used the same cell line (HAP1 METTL18 KO, 2-nt del.) as in the earlier NAR paper. As shown in Figure 9 — figure supplement 2A and 2B, we observed reduced polysomes in this cell line, as observed in the NAR paper.

      We did not find the abundance of 40S and 60S by assessing the rRNAs and the complex mass in the sucrose gradient (see Figure 9 — figure supplement 2C-E) by METTL18 KO in HAP1 cells. This observation was again consistent with earlier reports.

      Overall, our experiments in sucrose density gradient (polysome and 40S/60S ratio) were congruent with NAR paper. A difference from our finding in HEK293T cells was the limited effect on polysome formation by METTL18 deletion (Figure 4 — figure supplement 1A and 1B). To further provide a careful control for this observation, we induced a 60S biogenesis delay, as requested by the Reviewer. Here, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction of 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation in HEK293T cells. We note that all the sucrose density gradient experiments were repeated 3 times, quantified, and statistically tested.

      To further assess the difference between our data and those in the earlier NAR paper, we also performed ribosome profiling on 3 independent KO lines in HAP1 cells, including the one used in the NAR paper (METTL18 KO, 2-nt del.). Indeed, all METTL18 KO HAP1 cells showed a reduction in footprints on Tyr codons, as observed in HEK293 cells (see Figure 4H), and thus, there was a consistent effect of RPL3 methylation on elongation irrespective of the cell type. On the other hand, we could not find such a trend (see figure below) by reanalysis of the published data (Małecki et al. NAR 2021).

      Thus far, we could not find the origin of the difference in ribosome profiling compared to the earlier paper. Culture conditions or other conditions may affect the data. Given that, we amended the discussion to cover the potential of context/situation-dependent effects on RPL3 methylation.

      3) For loss-of-function studies of METLL18, it will be beneficial to have a second sgRNA to KO METLL18 to solidify the conclusion.

      We thank the reviewer for the constructive suggestion. Instead of screening additional METTL18 KO in HEK293T cells, we conducted additional ribosome profiling experiments in HAP1 cells with 3 independent KO lines. In addition to ensuring reproducibility, these experiments should assess whether our results are specific to the HEK293T cells that we mainly used. As mentioned above, even in the different cell lines, we observed faster elongation of the Tyr codon by METTL18 deficiency.

      4) In addition to loss-of-function studies for METLL18, gain-of-function studies for METLL18 would be helpful for making this study more convincing.

      Again, we thank the reviewer for the constructive suggestion. To address this issue, we conducted RiboTag-IP and subsequent ribosome profiling. Here, we expressed Cterminal FLAG-tagged RPL3 of its WT and His245Ala mutant, in which METTL18 could not add methylation (Figure 2A), in HEK293T cells, treated the lysate with RNase, immunoprecipitated FLAG-tagged ribosomes, and then prepared a ribosome profiling library (see figure below, left). This experiment assessed the translation driven by the tagged ribosomes. Indeed, we observed that, compared to the difference in Tyr codon elongation in METTL18 KO vs. naïve cells, His245Ala provided weaker impacts (see figure below, right). Given that METTL18 KO provides unmodified His, the enhanced Tyr elongation may be mediated by the bare His but not by Ala in that position. Since this point may be beyond the scope of this study, we omitted it from the manuscript. However, we are happy to add the data to the supplementary figures if requested.

      Reviewer #3 (Public Review):

      In this article, Matsuura-Suzuki et al provided strong evidence that the mammalian protein METTL18 methylates a histidine residue in the ribosomal protein RPL3 using a combination of Click chemistry, quantitative mass spectrometry, and in vitro methylation assays. They showed that METTL18 was associated with early sucrose gradient fractions prior to the 40S peak on a polysome profile and interpreted that as evidence that RPL3 is modified early in the 60S subunit biogenesis pathway. They performed cryo-EM of ribosomes from a METTL18-knockout strain, and show that the methyl group on the histidine present in published cryo-EM data was missing in their new cryo-EM structure. The missing methyl group gave minor changes in the residue conformation, in keeping with the minor effects observed on translation. They performed ribosome profiling to determine what is being translated efficiently in cells with and without METTL18, and found decreased enrichment of Tyrosine codons in the A site of ribosomes from cells lacking METTL18. They further showed that longer ribosome footprints corresponding to sequences within ribosomes that have already bound to A-site tRNA contained less Tyrosine codons in the A site when lacking METTL18. This suggests methylation normally slows down elongation after tRNA loading but prior to EF-2 dissociation. They hypothesize that this decreased rate affects protein folding and follow up with fluorescence microscopy to show that EGFP aggregated more readily in cells lacking METTL18, suggesting that translation elongation slow down mediated by METTL18 leads to enhanced folding. Finally, they performed SILAC on aggregated proteins to confirm that more tyrosine was incorporated into protein aggregates from cells lacking METTL18.

      The article is interesting and uses a large number of different techniques to present evidence that histidine methylation of RPL3 leads to decreased elongation rates at Tyrosine codons, allowing time for effective protein folding.

      We thank the reviewer for the positive comments.

      I agree with the interpretation of the results, although I do have minor concerns:

      1) The magnitude of each effect observed by ribosome profiling is very small, which is not unusual for ribosome modifications or methylation. Methylation seems to occur on all ribosomes in the cell since the modification is present in several cryo-EM structures. The authors suggest that the modification occurs during biogenesis prior to folding and being inaccessible to METTL18, so it is unlikely to be removed. For that reason, I do not think it is warranted to claim that this is an example of a ribosome code, or translation tuning. Those terms would indicate regulated modifications that come on and off of proteins, but the authors have not presented evidence that the activity is regulated (and don't really need to for this paper to be impactful).

      We thank the reviewer for making this point, and we agree that the nuance of the wording may not fit our results. We amended the corresponding sentences to avoid using the terms “ribosome code” and “translation tuning” throughout the manuscript.

      2) In Figure 4-supplement 1, it appears there are slightly more 80S less 60S in the METTL18 knockout with no change in 40S. It might be normal variability in this cell type, but quantitation of the peaks from 2 or more experiments is needed to make the claim that ribosome biogenesis is unaffected by METTL18 deletion. Likewise, the authors need to quantitate the area under the curve for 40S and 60S levels from several replicates and show an average -/+ error for figure 3, supplement 1 because that result is essential to claim that ribosome biogenesis is unaffected.

      Accordingly, we repeated all the sucrose density gradient experiments 3 times, quantified the data, and statistically tested the results. Even in the quantification, we could not find a significant change in either the 40S or 60S levels by METTL18 deletion in HEK293T cells (see Figure 3 — figure supplement 1B and 1C).

      Moreover, for the positive control of 60S biogenesis delay, we treated cells with siRNA targeting RPL17, which is needed for proper 60S assembly (Wang et al. RNA 2015). The quantification of SDG showed a reduction in 60S (see figure below and Figure 3 — figure supplement 1D-F) and polysomes (see Figure 4 — figure supplement 1C and 1D), highlighting the weaker effects of METTL18 depletion on 60S and polysome formation.

      3) The effect of methylation could be any step after accommodation of tRNA in the A site and before dissociation of EF-2, including peptidyl transfer. More evidence is needed for claiming strongly that methylation slows translocation specifically. This could be followed up in vitro in a new study.

      We truly thank the reviewer for the suggestion. Accordingly, we set up experiments combined with hybrid in vitro translation (Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) and the Renilla–firefly luciferase fusion reporter system (Kisly et al. NAR 2021) (see Figure 5A).

      To test the impact of RPL3 methylation on translation directly, we purified ribosomes from METTL18 KO cells or naïve HEK293T cells supplemented with ribosome-depleted rabbit reticulocyte lysate (RRL) and then conducted an in vitro translation assay (i.e., hybrid translation, Panthu et al. Biochem J 2015 and Erales et al. PNAS 2017) (see figure above and Figure 5A). Indeed, we observed that removal of the ribosomes from RRL decreased protein synthesis in vitro and that the addition of ribosomes from HEK293T cells efficiently recovered the activity (see Figure 5 — figure supplement 1A).

      To test the effect on Tyr codon elongation, we harnessed the fusion of Renilla and firefly luciferases; this system allows us to detect the delay/promotion of downstream firefly luciferase synthesis compared to upstream Renilla luciferase and thus to focus on elongation affected by the sequence inserted between the two luciferases (Kisly et al. NAR 2021) (see figure above and Figure 5A). For better detection of the effects on Tyr codons, we used the repeat of the codon (×39, the number was due to cloning constraints in our hands). We note that the insertion of Tyr codon repeats reduced the elongation rate (or processivity), as we observed a reduced slope of downstream Fluc synthesis (see Figure 5 — figure supplement 1B).

      Using this setup, we observed that, compared to ribosomes from naïve cells, RPL3 methylation-deficient ribosomes led to faster elongation at Tyr repeats (see Figure 5B). These data, which are directly reflected by the ribosomes possessing unmethylated RPL3, provided solid evidence of a link between RPL3 methylation and translation elongation at Tyr codons.

    1. Author Response

      Reviewer #1 (Public Review):

      Adefuin and colleagues examined the interaction between components of binary odor mixtures in odor responses in mice. The authors used two-photon calcium imaging from the soma and apical dendrites of mitral/tufted cells in the olfactory bulb. Odor responses were measured in various conditions: under anesthesia (ketamine/xylazine), while well-trained mice were engaged in an odor discrimination task, or disengaged. The authors first show that mixture components interacted sublinearly in a large fraction of mitral/tufted cells (46%; Fig. 6D) consistent with previous studies. However, when odor responses were measured in awake animals, very few mitral/tufted cells showed sublinear responses at soma (8-9%; Fig. 6D). Interestingly, sublinear interaction was evident in apical dendrites of mitral/tufted cells (45%). Whether mixture components are represented linearly or not in the olfactory system is an important question, related to the animal's ability to identify or segment mixture components. Somewhat contrary to previous studies, this study demonstrate largely linear interactions. Furthermore, this study compares various behavioral conditions. These results are important and of interest to those who study sensory systems. I have a few concerns regarding data analysis.

      Thank you for your helpful review, and for recognising the relevance our work. We hope that the reviewer finds the our point-by-point responses satisfactory.

      1) Non-linear interactions are detected by the activity showing a deviation from linearity greater than 2 standard deviations. Using this criterion, non-linear interactions might decrease if the trial-by-trial activity becomes more variable. This is concerning because the activity might be less variable in the anesthetized condition, and the reduction in sublinear interactions in awake conditions may be due to a general increase in response variability during awake. Can the authors exclude the possibility that the decrease in sublinear interactions is merely due to an increase in response variability in the awake conditions. This issue also applies to the comparison between apical dendrites versus soma; are the signals in apical dendrite less variable (maybe due to some averaging across dendrites from multiple cells; see the following point 5)?

      Thank you for raising this valid point and for suggesting alternative analyses. We agree that the index we used previously is susceptible to noise, and not appropriate for comparing two datasets with different trial-by-trial variability. To quantify the deviation from linear sum more robustly, we now use the “Median fractional deviation”, which expresses a deviation from the linear sum as a fraction of predicted, linear sum - not normalised by the standard deviation – and take the median of the distribution from each field of view. As we describe in the revised Figure 4, this measure is more robust to noise. Notably, our finding that mixture summation is generally less sublinear in awake mice still stands for the early phase.

      In the revised manuscript, we use the median fractional deviation whenever we compare linearity of summation across different conditions, which includes the comparison of anaesthetised vs. awake, behaving conditions (revised Fig. 4), comparison of dendrites vs. somata (revised Fig. 4-figure supplement 1), and comparisons of awake states (revised Fig. 6). This has given us, too, more confidence about our interpretation, so we are grateful for the reviewer’s suggestions.

      2) Related to the above issue, it would be useful to analyze the difference between conditions using different metrics to fully understand what really are different between conditions. The scatter plots shown in various figures do not show drastic differences between awake and anesthetized conditions, as might be indicated by the percent of sublinear responses. It would be useful to characterize the magnitude of sublinear/supralinear effects. For example, one can calculate a fractional change in the mean response. Does this measure show consistent difference between awake and anesthetized conditions?

      Thank you for suggesting this analysis. As described above, we now use the fractional deviation to quantify how mixture summations differ from linear sums, which turned out to be a very useful way to express the property of summation (N.B.: noise is amplified for small responses when fractional deviation is used, which is another reason we use the median now). We thank the reviewer for suggesting this analysis.

      Reviewer #2 (Public Review):

      This study addresses how complex stimuli are represented in neural responses. This is particularly relevant to olfaction because the vast majority of stimuli are complex mixtures that perceptually, are not easy to decompose into parts. Nonetheless, the ability to discern a relevant odor from background odors is essential. This process is easier when neural responses to mixtures reflect the linear sum of the responses to the individual components. The main conclusion of this study is that the linearity of olfactory bulb responses to two-component mixtures increases awake versus anesthetized states. The authors provide some evidence to support this claim. However, this could be better quantified and there is a temporal aspect of linearization that is not addressed. Perhaps the most interesting aspect of the study is the difference in linearity between the dendrites and the somata of the mitral/tufted cells. But a statistical analysis of this finding was not evident. Overall a mechanistic or functional approach to understanding these findings is lacking. The differences linearity between the anesthetized and awake are simply explained by response saturation anesthetized animals. There are hints at mechanism by which linearity is supported in the OB with comparisons between soma and dendrite but these are not well developed. There is a model that addresses the functional significance of linearity but this is only supplemental and not well described.

      Thank you for appreciating the significance of our work, and for your constructive comments.

      Reviewer #3 (Public Review):

      Adefuin et al use multiphoton imaging of M/T cell responses to investigate whether neuronal representations of binary mixtures can be explained as a sum of the components. The current view in the field (built largely from studies in anesthetized animals), is that mixture summation is non-linear and increases with the degree in glomerular response overlap elicited by the components. The authors reproduce these results and ask whether the same phenomenon is observed in the awake state, in particular when the animals are engaged in an odor discrimination task. Unlike in the anesthetized state, the authors find that mixture representations are linear in the awake brain. They use a series of systematic behavioral paradigms to show that the observed linearity in the awake state (compared to anesthetized) is not dependent on task engagement (reward is given randomly, post-odor) or stimulus relevance (reward is given before odor). While the experiments are well done and the data is presented clearly, I have several major concerns about the interpretation of their results.

      1) Given the data the authors present, it is unclear if one can conclude that the olfactory system is more or less linear in the awake state compared to the anaesthetised one. What seems to change most across the awake vs. anesthetized state is the response amplitude. Responses appear to be ~3x smaller in the awake mice. In the anesthetized state, non-linearity seems most apparent for large response amplitudes (>5 dF/F) with mixture responses being sub-linear, most likely due to saturation effects. The authors themselves do an analysis in Figure 6 - supplement 1 to show that most of the observed non-linearity in the anesthetized animals can be explained away after accounting for amplitude normalisation. The authors use this analysis to comment that the level of linearity is the same across all the three awake states, but the same figure shows that it is in fact the same even for the anaesthetized state.

      To put it differently, it is indeed true from the authors data that the OB response gain is significantly lower in the awake state, but it is unclear if the summation is more linear if measured at similar response amplitude regimes in both awake and anaesthetised mice.

      Thank you for the valuable comments. We agree that many differences between the anaesthetised vs. awake states should have been taken into account when comparing the linearity of summation. We address the reviewer’s concern now by expressing the deviation as a fraction of the predicted, linear sum of component responses. Further, we also considered another factor that could influence the anaesthetised vs. awake comparison, namely, the trial-by-trial variability. This is reproduced below.

      Figure R1: comparison of mixture summation for the early phase of responses, expressed as the fractional deviation.

      2) The authors argue that keeping response amplitudes small in the awake brain prevents sub-linear summation and therefore may lead to better mixture decomposition. They do a decoding analysis in anaesthetised mice to show that linear mixture representations (instead of using observed sub-linear representations) make odor classification easier. However, I find this analysis uninformative and misleading. It is no surprise that the decoders trained on single odor representations should perform better (or equivalent) when using linear sums as input instead of observed sub-linear representations. The authors use this observation to suggest that this mechanism aids discrimination ability in the awake state. However, given that even the single odor responses are much weaker and noisier in the awake state, it is likely that even the single odor discrimination ability is poorer in the awake state. By the same logic, mixture decomposition might be also much poorer in the awake brain than the anesthetized brain, even though summation is more linear, just because responses are weaker and noisier. In my opinion, the authors should compare decoding accuracy across awake vs. anesthetized responses if they want to assert that linearisation of responses in the awake brain leads to easier decomposition. Because otherwise, while linearisation in principle can aid decomposition, at least in the form that the authors observe here, it may come at a high cost on signal-to-noise ratio which would undo the gain that linearity provides, in principle, for discrimination.

      Thank you very much for the insight and for the excellent suggestion to consider the discriminability of stimuli. In particular, we now include an analysis where a decoder trained on single responses is tested on observed mixture responses. Surprisingly, despite the substantial differences in the amplitudes of response and trial-by-trial variability, decoders using data from awake mice performed well, even better than anaesthetised data for the late phase of responses. This is now described in the revised figures (revised Fig. 5). We thank the reviewer for the excellent suggestion.

      Interestingly, though, the time course of the decoder performance does not correlate well with the linearity of summation. This observation is now described in the abstract (lines 19-21): “…decoding analyses indicated that the data from behaving mice was able to encode mixture responses well, though the time course of decoding accuracy did not correlate with the linearity of summation“.

      3) At a more philosophical level, to this Reviewer, it is unclear if anesthesia vs. awake state difference in response should constitute the main focus of the manuscript. The authors explore summation properties under four different brain states, one of which is anaesthesia (also least behaviorally relevant). In three out of four states, they observe that summation is linear. In the fourth (anaesthesia), they observe that summation is sub-linear, but this happens at much larger response amplitude regimes compared to the three awake states sampled, presumably due to saturation. To me, it seems that the Authors here show that mixture summation in the OB, is largely independent of brain state since it is unaffected by whether the animal is task engaged or motivated etc.

      Thank you for this thoughtful comment. This has made us reflect on the essence of our study. We believe we make three main observations. First, the anaesthesia vs. awake difference in the property of summation differ, and should be reported, because of the large volume of prior works reporting sublinear summations. However, as the reviewer recommends and as mentioned next, this is no longer the sole focus of our study. Our second observation is that the linearity of summation does not necessarily correlate with the ability to analyse mixtures, based on the decoder performance. We believe it is important to share this observation, since a number of previous studies speculated that nonlinear summation contributes to perceptual difficulty (Bell et al., 1987; Laing, 1994). Third, the decoder performance - especially one that is trained on single odour responses and tested on mixtures - shows differences depending on the awake states, where data from disengaged mice performed particularly poorly. This result is shown in the revised Figure 6. Further, we have edited the abstract and results to ensure that these are clearly communicated. We hope that this is more balanced and reflects the data better.

      4) It is unclear how to interpret the dendritic imaging comparison. First, the dendritic signal is pooled across many cells. If any of the cells that are being pooled shows sub-linearity, the pooled population response will look sub-linear, albeit less so than at the single cell level. Second, again like for the anesthetized vs. awake comparison, there is a discrepancy in response amplitudes - dendritic responses are ~2x stronger than the somatic responses and sub-linear summation would be more apparent as one approaches the saturation regime. Third, dendritic responses pool both mitral and tufted, while the somatic data the authors present is predominantly from tufted cells.

      Thank you for commenting on ways to further understand the dendritic signal. Indeed, the early prevalence of sublinearity in the apical dendrites does seem to relate to the time course of responses. This is treated more directly in the revised Fig.4 – supplement 1.

      To address the averaging effect, we tested how pulled signals may look like in terms of linearity of summation. To roughly approximate pooled responses, we reasoned that neighbouring TC/MC somata have higher chances of belonging to the same glomerulus. Thus, we averaged signals from somatic ROIs (TCs and MCs) from each field of view and calculated the fractional deviation from the linear sum (Fig. R2). While a simplistic averaging of neighbouring somata may not be perfectly accurate, but this analysis indicates that the difference between the apical dendrites vs. somata may not be simply explained by the averaging effect.

      Figure R2: Analysis of pooled somatic signals

      To approximate how dendritic signals might look like if they were simple averages of somatic responses, we pooled together signals from all TC/MC somata from each field of view, and treated it as “an approximate glomerular signal”. The plot above shows the fractional deviation from the linear sum. MC somata data comes from an additional set of experiments conducted for this rebuttal).

      In terms of the unmatched amplitude distributions and trial-by-trial variability across conditions, as the reviewer points out, the issue is similar to the comparison of anaesthetised vs. awake data. To address this, all comparisons are now presented in terms of the median fractional deviations. Further, to explain if mitral cells contributed to the discrepancy in the linearity between the dendritic signal vs. somatic signal, we now provide additional data from 137 MCs (5 fields of view, 3 trained mice performing the mixture task). These changes are described in the revised manuscript (Figure 4- supplement 1).

    1. Author Response

      Reviewer #1 (Public Review):

      Using health insurance claims data (from 8M subjects), a retrospective propensity score matched cohort study was performed (450K in both groups) to quantify associations between bisphosphonate (BP) use and COVID- 19 related outcomes (COVID-19 diagnosis, testing and COVID-19 hospitalization. The observation periods were 1-1-2019 till 2-29-2020 for BP use and from 3-1-2020 and 6-30-2020 for the COVID endpoints. In primary and sensitivity analyses BP use was consistently associated with lower odds for COVID-19, testing and COVID-19 hospitalization.

      The major strength of this study is the size of the study population, allowing a propensity-based matched- cohort study with 450K in both groups, with a sizeable number of COVID-19 related endpoints. Health insurance claims data were used with the intrinsic risk of some misclassification for exposure. In addition there probably is misclassification of endpoints as testing for COVID-19 was limited during the study period. Furthermore, the retrospective nature of the study includes the risk of residual confounding, which has been addressed - to some extent - by sensitivity analyses.

      In all analyses there is a consistent finding that BP exposure is associated with reduced odds for COVID-19 related outcomes. The effect size is large, with high precision.

      The authors extensively discuss the (many) potential limitations inherent to the study design and conclude that these findings warrant confirmation, preferably in intervention studies. If confirmed BP use could be a powerful adjunct in the prevention of infection and hospitalization due to COVID-19.

      We thank the reviewer for this overall very positive feedback. We appreciate the reviewer's comments regarding the potential risks associated with misclassification of exposure and other potential limitations, which we have sought to address in a number of sensitivity analyses and are also addressing in the discussion of our paper. In addition, as noted by the reviewer, the observed effect size of BP use on COVID-19 related outcomes is large, with high precision, which we feel is a strong argument to explore this class of drugs in further prospective studies.

      Reviewer #2 (Public Review):

      The authors performed a retrospective cohort study using claims data to assess the causal relationship between bisphosphonate (BP) use and COVID-19 outcomes. They used propensity score matching to adjust for measured confounders. This is an interesting study and the authors performed several sensitivity analyses to assess the robustness of their findings. The authors are properly cautious in the interpretation of their results and justly call for randomized controlled trials to confirm a causal relationship. However, there are some methodological limitations that are not properly addressed yet.

      Strengths of the paper include:

      (A) Availability of a large dataset.

      (B) Using propensity score matching to adjust for confounding.

      (C) Sensitivity analyses to challenge key assumptions (although not all of them add value in my opinion, see specific comments)

      (D) Cautious interpretation of results, the authors are aware of the limitations of the study design.

      Limitation of the paper are:

      (A) This is an observational study using register data. Therefore, the study is prone to residual confounding and information bias. The authors are well aware of that.

      (B) The authors adjusted for Carlson comorbidity index whereas they had individual comorbidity data available and a dataset large enough to adjust for each comorbidity separately.

      (C) The primary analysis violates the positivity assumption (a substantial part of the population had no indication for bisphosphonates; see specific comments). I feel that one of the sensitivity analyses 1 or 2 would be more suited for a primary analysis.

      (D) Some of the other sensitivity analyses have underlying assumptions that are not discussed and do not necessarily hold (see specific comments).

      In its current form the limitations hinder a good interpretation of the results and, therefore, in my opinion do not support the conclusion of the paper.

      The finding of a substantial risk reduction of (severe) COVID-19 in bisphosphonate users compared to non- users in this observational study may be of interest to other researchers considering to set up randomized controlled trials for evaluation of repurpose drugs for prevention of (severe) COVID-19.

      We thank the reviewer for the insightful comments and questions related to our manuscript. Our response to the concerns regarding limitations of our study is as follows:

      (A) We agree that there is likely residual confounding and information bias due to use of US health insurance claims datasets which do not include information on certain potentially relevant variables. Nonetheless, given the large effect size and precision of our analysis, we feel that our findings support our main conclusion that additional prospective trials appear warranted to further explore whether BPs might confer a meaure of protection against severe respiratory infections, including COVID-19. We have added a sentence on the second page of our Discussion (line 859-860) to emphasize this point: "Specifically, there is the potential that key patient characteristics impacting outcomes could not be derived from claims data."

      (B) The progression of this study mirrors the real-world performance of the analysis where we initially used the CCI in matching to control for comorbidity burden on a broader scale. This was our a priori approach. After observing large effect sizes, we performed more stringent matching for sensitivity analyses 1 and 2. Irrespective of the matching strategy chosen, effect sizes remained similar for all outcome parameters. Therefore, we elected to include both the primary analysis and the sensitivity analyses with more stringent matching in order to more transparently show what was done in entirety during our analyses, as we feel it displays all of the efforts taken to identify sources of unmeasured confounding which could have impacted our results.

      (C) We agree that the positivity assumption is a key factor to consider when building comparable treatment cohorts. We also agree that it is the important to separately perform the analysis for either all patients with an indication for use of BPs and for other anti-osteoporosis medications, as we have done in our analysis of the Osteo-Dx-Rx cohort and Bone-Rx cohort, respectively. However, we did not have sufficient data, a priori, to determine whether BP users would be more similar in their risk of COVID-19 outcomes to non- users or to other users of anti-resorptive medications. In addition, we believe that this specific limitation does not negate our findings in the primary analysis for the following reasons: (1) ‘Type of Outcome’: the outcomes in this study are related to infectious disease and are not direct clinical outcomes of any known treatment benefits of BPs. The clinical benefits being assessed - impact of BP use on COVID-19-related outcomes - were essentially unknown at the time of the study data; this fact mitigates the impact of any violation of the positivity assumption; and (2) ‘Clinical Population’: after propensity score matching, both the BP user and the BP non-user group in the primary analysis mainly consisted of older females (90.1% female, 97.2% age>50), which is the main population with clinical indications for BP use. According to NCHS Data Brief No. 93 (April 2012) released by the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively, and essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Women aged 60 and older make up ~75% of our study population (Table 1). Although bone density measurements are not available for non- BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average. This justifies the assumption that BP therapy was indicated for most non-BP users in the matched primary cohort. Arguably, for these patients the positivity assumption was not violated.

      (D) We will discuss in detail below the specific issues raised by the reviewer regarding our sensitivity analyses. In general we acknowledge that individual analytical and/or matching approaches may each have their own limitations, but the analyses performed herein were done to test in a systematic fashion the different critical threats to the validity of our initial results in the primary cohort analysis, which were based on a priori-defined methods and yielded a large and robust effect size. Thus, the individual sensitivity analyses should be considered in the greater context of the entire project.

      Specific comments (in order of manuscript):

      Methods:

      Line 158: it is unclear how the authors dealt with patients who died during the follow-up period. The wording suggests they were excluded which would be inappropriate.

      When this study was executed, we were unable to link the patient-level US insurance claims data with patient-level mortality data due to HIPAA concerns. Therefore, line 158 (now 177) defines continuous insurance coverage during the observation period as a verifiable eligibility criterion we used for patient inclusion. It was necessary to disqualify individuals who discontinued insurance coverage for a variety of reasons, e.g. due to loss or change of coverage, relocation etc., but our approach also eliminated patients who died. Appendix 3 (line 2449ff) describes methods we employed post hoc to assess how censoring due to death could have impacted our analyses. We discuss our conclusions from this post hoc analysis in the main text (lines 1053-1058) as follows: "An additional limitation is potential censoring of patients who died during the observation period, resulting in truncated insurance eligibility and exclusion based on the continuous insurance eligibility requirement. However, modelling the impact of censoring by using death rates observed in BP users and non-users in the first six months of 2020 and attributing all deaths as COVID-19-related did not significantly alter the decreased odds of COVID-19 diagnosis in BP users (see Appendix 3)."

      Why did the authors use CCI for propensity matching rather than the individual comorbid conditions? I presume using separate variables will improve the comparability of the cohorts. The authors discuss imbalances in comorbidities as a limitation but should rather have avoided this.

      CCI was the a priori approach defined at the study outset and was chosen due to the widespread use and understanding of this score. The general CCI score was originally planned for matching in order to have the largest possible study population since we did not know how many patients would meet all criteria as well as have an event of interest. After realizing we had adequate sample size to power matching using stricter criteria, we proceeded to perform subsequent sensitivity analyses on more stringently matched cohorts (sensitivity analysis 2).

      Line 301-10: it seems unnecesary to me to adjust for the given covariates while these were already used for propensity score matching (except comorbidities, but see previous comment). The manuscript doesn't give a rationale why did the authors choose for this 'double correction'.

      The following language was added to the methods section (lines 325-327): “Demographic characteristics used in the matching procedure were also included in the final outcome regressions to control for the impact of those characteristics on outcomes modelled.”

      The following language was added to the Discussion section regarding the potential limitations of our srudy (lines 1078-1085): “Another limitation in the current study is related to a potential ‘double correction’ of patient characteristics that were included in both the propensity score matching procedure as well as the outcome regression modelling, which could lead to overfitting of the regression models and an overestimation of the measured treatment effect. Covariates were included in the regression models since these characteristics could have differential impacts on the outcomes themselves, and our results show that the adjusted ORs were in fact larger (showing a decreased effect size) when compared to the unadjusted ORs, which show the difference in effect sizes of the matched populations alone.”

      In causal research a very important assumption is the 'positivity assumption', which means that none of the individuals has a probability of zero or one to be exposed. Including everyone would therefore not be appropriate. My suggestion is to include either all patients with an indication (based on diagnosis) or all that use an anti-osteoporosis (AOP) drug (or one as the primary and the other as the sensitivity analysis) instead of using these cohorts as sensitivity analyses. The choice should in my opinion be based on two aspects: whether it is likely that other AOP drugs have an effect on the COVID-19 outcomes and whether BP users are deemed to be more similar (in their risk of COVID-19 outcomes) to non-users or to other AOP drug users. Or alternatively, the authors might have discussed the positivity assumption and argue why this is not applicable to their primary analysis.

      The following text has been added to the Discussion section addressing potential limitations of our study (lines 987-1009): " Another potential limitation of this study relates to the positivity assumption, which when building comparable treatment cohorts is violated when the comparator population does not have an indication for the exposure being modelled 56. This limitation is present in the primary cohort comparisons between BP users and BP non-users, as well as in the sensitivity analyses involving other preventive medications. This limitation, however, is mitigated by the fact that the outcomes in this study are related to infectious disease and are not direct clinical outcomes of known treatment benefits of BPs. The fact that the clinical benefits being assessed – the impact of BPs on COVID-related outcomes – was essentially unknown clinically at the time of the study data minimizes the impact of violation of the positivity assumption. Furthermore, our sensitivity analyses involving the “Bone-Rx” and “Osteo-Dx- Rx” cohorts did not suffer this potential violation, and the results from those analyses support those from the primary analysis cohort comparisons. Moreover, we note that the propensity score matched BP users and BP non-users in the primary analysis cohort mainly consisted of older females. According to the CDC, ~75% and 95% of US women between 60-69 and 70-79 suffer from either low bone mass or osteoporosis, respectively (https://www.cdc.gov/nchs/data/databriefs/db93.pdf). Essentially all women (and 70% of men) above age 80 suffer from these conditions, which often go undiagnosed. Women aged 60 and older represent ~75% of our study population (Table 1). Although bone density measurements are not available for non-BP users in the matched primary cohort, there is a high probability that the incidence of osteoporosis and/or low bone mass in these patients was similar to the national average.Thus, BP therapy would have been indicated for most non-BP users in the matched primary cohort, and arguably, for these patients the positivity assumption was not violated."

      Sensitivity Analysis 3: Association of BP-use with Exploratory Negative Control Outcomes: what is the implicit assumption in this analysis? I think the assumption here is that any residual confounding would be of the same magnitude for these outcomes. But that depends on the strength of the association between the confounder and the outcome which needs not be the same. Here, risk avoiding behavior (social distancing) is the most obvious unmeasured confounder, which may not have a strong effect on other health outcomes. Also it is unclear to me why acute cholecystitis and acute pancreatitis-related inpatient/emergency-room were selected as negative controls. Do the authors have convincing evidence that BPs have no effect on these outcomes? Yet, if the authors believe that this is indeed a valid approach to measure residual confounding, I think the authors might have taken a step further and present ORs for BP → COVID-19 outcomes that are corrected for the unmeasured confounding. (e.g. if OR BP → COVID-19 is ~ 0.2 and OR BP → acute cholecystitis is ~ 0.5, then 'corrected' OR of BP → COVID-19 would be ~ 0.4.

      We appreciate the reviewer’s thoughtful comments regarding the differential strength of the association between unmeasured confounders and outcome. We had initially selected acute cholecystitis and pancreatitis-related inpatient and emergency room visits as negative controls because we deemed them to be emergent clinical scenarios that should not be impacted by risk avoiding behavior. However, upon further search, we identified several publications that suggest a potential impact of osteoporosis and/or BPs on gallbladder diseases (DOIhttps://doi.org/10.1186/s12876-014-0192-z; http://dx.doi.org/10.1136/annrheumdis-2017-eular.3900), thus calling the validity our strategy into question. We therefore agree that the designation of negative control outcomes is problematic and adds relatively little to the overall story. Therefore, we have removed these analyses from the revised manuscript.

      Sensitivity Analysis 4: Association of BP-use with Exploratory Positive Control Outcomes: this doesn't help me be convinced of the lack of bias. If previous researchers suffered from residual confounding, the same type of mechanisms apply here. (It might still be valuable to replicate the previous findings, but not as a sensitivity analysis of the current study).

      We agree that the same residual confounding in previous research papers could be present in our study. Nonetheless, it was important to assess whether our analysis would be potentially subject to additional (or different) confounding due to the nature of insurance claims data as compared to the previous electronic record-based studies. Therefore, it was relevant to see if previous findings of an association between BP use and upper respiratory infections are observable in our cohort.

      The second goal of sensitivity analysis #4 (now #3) was to see whether associations could be found on different sets of respiratory infection-based conditions, both during the time of the pandemic/study period as well as during the pre-pandemic time, i.e. before medical care in the US was significantly impacted by the pandemic. In light of these considerations, we feel that sensitivity analysis 4 adds value by showing consistency in our core findings.

      Sensitivity Analysis 5: Association of Other Preventive Drugs with COVID-19-Related Outcomes: Same here as for sensitivity analysis 3: the assumption that the association of unmeasured confounders with other drugs is equally strong as for BPs. Authors should explicitly state the assumptions of the sensitivity analyses and argue why they are reasonable.

      The following sentence was added to the Discussion section (lines 1019-1020): “ "These analyses were based on the assumption that the association of unmeasured confounders with other drugs is comparable in magnitude and quality as for BPs."

      Results: The data are clearly presented. The C-statistic / ROC-AUC of the propensity model is missing.

      Unfortunately, a significant amount of time has passed since execution of our original analysis of the Komodo dataset by our co-authors at Cerner Enviza. To date, our ability to perform follow-up studies with the Komodo dataset (which is exclusively housed on Komodo's secure servers) has become limited because business arrangements between these companies have been terminated, and the pertinent statistical software is no longer active. This issue prevents us from attaining the original C-statistic and ROC-AUC information, however, we were able to extract the actual; propensity scores themselves for the base cohort matching (BP-users versus non-users). The table below illustrates that the distribution of propensity scores for the base cohort match ranged from <0.01 to a max of 0.49, with 81.4% of patients having a propensity score of 10-49%, and 52.9% of patients having a propensity score of 20-49%. This distribution is unlikely to reflect patients who had a propensity score of either all 0 or all 1.

      Discussion:

      When discussing other studies the authors reduce these results to 'did' or 'did not find an association'. Although commonly practiced, it doesn't justify the statistical uncertainty of both positive and negative findings. Instead I encourage the authors to include effect estimates and confidence intervals. This is particularly relevant for studies that are inconclusive (i.e. lower bound of confidence interval not excluding a clinically relevant reduction while upper bound not excluding a NULL-effect).

      We appreciate the reviewer’s suggestion and have added this information on p.21/22 in the Discussion.

      Line 1145 "These retrospective findings strongly suggest that BPs should be considered for prophylactic and/or therapeutic use in individuals at risk of SARS-CoV-2 infection." I agree for prophylactic use but do not see how the study results suggest anything for therapeutic use.

      We have removed “and/or therapeutic use” from this sentence (line 1088-1090).

      The authors should discuss the acceptability of using BPs as preventive treatment (long-term use in persons without osteoporosis or other indication for BPs). This is not my expertise but I reckon there will be little experience with long-term inhibiting osteoblasts in people with healthy bones. The authors should also discuss what prospective study design would be suitable and what sample size would be needed to demonstrate a reasonable reduction. (Say 50% accounting for some residual confounding being present in the current study.)

      Although BPs are also used in pediatric populations and in patients without osteoporosis (for example, patients with malignancy), we do recognize the lack of long-term safety data in use of BPs as preventative treatments. We tried to partially address this concern in our sub-stratified analysis of COVID-19 related outcomes and time of exposure to BP. Reassuringly, we observed that patients newly prescribed alendronic acid in February 2020 also had decreased odds of COVID-19 related outcomes (Figure 3B), suggesting that the duration of BP treatment may not need to be long-term. This was further discussed in the last paragraph of our Discussion where we state that " BP use at the time of infection may not be necessary for protection against COVID-19. Rather, our results suggest that prophylactic BP therapy may be sufficient to achieve a potentially rapid and sustained immune modulation resulting in profound mitigation of the incidence and/or severity of infections by SARS- CoV-2."

      We agree that a future prospective study on the effect of BPs on COVID-19 related outcomes will require careful consideration of the study design, sample size, statistical power etc. However, we feel that a detailed discussion of these considerations is beyond the scope of the present study.

      The authors should discuss the fact that confounders were based on registry data which is prone to misclassification. This can result in residual confounding.

      Some potential sources of misclassification have been discussed on line 932-948. In addition, the following language was added (line 970-985): "Additionally, limitations may be present due to misclassification bias of study outcomes due to the specific procedure/diagnostic codes used as well as the potential for residual confounding occurring for patient characteristics related to study outcomes that are unable to be operationalized in claims data, which would impact all cohort comparisons. For SARS- CoV-2 testing, procedure codes were limited to those testing for active infection, and therefore observations could be missed if they were captured via antibody testing (CPT 86318, 86328). These codes were excluded a priori due to the focus on the symptomatic COVID-19 population. Furthermore, for the COVID-19 diagnosis and hospitalization outcomes, all events were identified using the ICD-10 code for lab-confirmed COVID-19 (U07.1), and therefore events with an associated diagnosis code for suspected COVID-19 (U07.2) were not included. This was done to have a more stringent algorithm when identifying COVID-19-related events, and any impact of events identified using U07.2 is considered minimal, as previous studies of the early COVID-19 outbreak have found that U07.1 alone has a positive predictive value of 94%55, and for this study U07.1 captured 99.2%, 99.0%, and 97.5% of all COVID-19 patient-diagnoses for the primary, “Bone-Rx”, and “Osteo-Dx-Rx” cohorts, respectively."

    1. Author Response:

      Reviewer #1:

      In this paper, authors did a fine job of combining phylogenetics and molecular methods to demonstrate the parallel evolution across vRNA segments in two seasonal influenza A virus subtypes. They first estimated phylogenetic relationships between vRNA segments using Robinson-Foulds distance and identified the possibility of parallel evolution of RNA-RNA interactions driving the genomic assembly. This is indeed an interesting mechanism in addition to the traditional role for proteins for the same. Subsequently, they used molecular biology to validate such RNA-RNA driven interaction by demonstrating co-localization of vRNA segments in infected cells. They also showed that the parallel evolution between vRNA segments might vary across subtypes and virus lineages isolated from distinct host origins. Overall, I find this to be excellent work with major implications for genome evolution of infectious viruses; emergence of new strains with altered genome combination.

      Comments:

      I am wondering if leaving out sequences (not resolving well) in the phylogenic analysis interferes with the true picture of the proposed associations. What if they reflect the evolutionary intermediates, with important implications for the pathogen evolution which is lost in the analyses?

      We fully appreciate this concern and have explored this extensively. One principle assumption underlying the approach we outline in this manuscript is that the trees analyzed are robust and well- resolved. We use tree similarity as a correlate for relationships between genomic segments, so the trees must be robust enough to support our claims, as we have clarified in lines 128-131. We initially set out to examine a broader range of viral isolates in each set of trees, but larger trees containing more isolates consistently failed to be supported by bootstrapping. Bootstrapping is by far the most widely used methodology for demonstrating support for tree nodes. We provided the closest possible example to the trees presented in this manuscript for comparison. We took all 84 H3N2 strains from 2005-2014 analyzed in replicate trees 1-7 and collapsed these sequences into one tree for each vRNA segment. Figure X-A, specifically provided for the reviewers, illustrates the resultant collapsed PB2 tree, with bootstrap values of 70 or higher shown in red and individual strains coded by cluster and replicate. As expected, the majority of internal nodes on such a tree are largely unsupported by bootstrapping, indicating that relaxing our constraint of 97% sequence identity increases the uncertainty in our trees.

      Because we agree with Reviewers #1 and #3 on the critical importance of validating our approach, we determined the distances between these new collapsed trees using a complementary approach, Clustering Information Distances (CID), that is independent of tree size (Supplemental Figure 4B and Figure X-B & X-C). Larger trees containing all sequences yielded pairwise vRNA relationships that are largely similar to those we report in the manuscript (R2 = 0.6408; P = 3.1E-07; Figure X-B vs. X-C), including higher tree similarity between PB2 and NA over NS. This observation strengthens the rationale to focus on these segments for molecular validation and correlate parallel evolution to intracellular localization in our manuscript (Figure 7). However, tree distances are generally higher in Figure X-C than in Figure X-B, which we might expect if poorly supported nodes in larger trees artificially inflate phylogenetic signal. Given the overall similarity between Figures X-B and X-C, both methods yield largely comparable results. We ultimately relied upon the more robust replicate trees with stronger bootstrap support.

      Lines 50-51: Can you please elaborate? I think this might be useful for the reader to better understand the context. Also, a brief description on functional association between different known fragments might instigate curiosity among the readers from the very beginning. At present, it largely caters to people already familiar with the biology of influenza virus.

      We have added additional information to reflect the complexity of intersegmental interactions and the current standing of the field (lines 49-52).

      Lines 95-96 Were these strains all swine-origin? More details on these lineages will be useful for the readers.

      We have clarified that all strains analyzed were isolated from humans, but were of different lineages (lines 115-120).

      Lines 128-132: I think it will be nice to talk about these hypotheses well in advance, may be in the Introduction, with more functional details of viral segments.

      We incorporated our hypotheses regarding tree similarity into the existing discussion of epistasis in the Introduction (lines 74-75 and 89-106).

      Lines 134-136: Please rephrase this sentence to make it more direct and explain the why. E.g. "... parallel evolution between PB1 and HA is likely to be weaker than that of PB1 and PA".

      The text has been modified (lines 165-168).

      Lines 222-223: Please include a set of hypotheses to explain you results? Please add a perspective in the discussion on how this contribute might to the pandemic potential of H1N!?.

      We have added in our interpretation of the results (lines 259-264) and expanded upon this in the Discussion (lines 418-422).

      Lines 287-288: I am wondering how likely is this to be true for H1N1.

      We have expanded on this in the Discussion (lines 409-410).

      Reviewer #2:

      The influenza A genome is made up of eight viral RNAs. Despite being segmented, many of these RNAs are known to evolve in parallel, presumably due to similar selection pressures, and influence each other's evolution. The viral protein-protein interactions have been found to be the mechanism driving the genomic evolution. Employing a range of phylogenetic and molecular methods, Jones et al. investigated the evolution of the seasonal Influenza A virus genomic segments. They found the evolutionary relationships between different RNAs varied between two subtypes, namely H1N1 and H3N2. The evolutionary relationships in case of H1N1 were also temporally more diverse than H3N2. They also reported molecular evidence that indicated the presence of RNA-RNA interaction driving the genomic coevolution, in addition to the protein interactions. These results do not only provide additional support for presence of parallel evolution and genetic interactions in Influenza A genome and but also advances the current knowledge of the field by providing novel evidence in support of RNA-RNA interactions as a driver of the genomic evolution. This work is an excellent example of hypothesis-driven scientific investigation.

      The communication of the science could be improved, particularly for viral evolutionary biologists who study emergent evolutionary patterns but do not specialise in the underlying molecular mechanisms. The improvement can be easily achieved by explaining jargon (e.g., deconvolution) and methodological logics that are not immediately clear to a non-specialist.

      We have clarified or eliminated jargon wherever possible throughout the text.

      The introduction section could be better structured. The crux of this study is the parallel molecular evolution in influenza genome segments and interactions (epistasis). The authors spent the majority of the introduction section leading to those two topics and then treated them summarily. This structure, in my opinion, is diluting the story. Instead, introducing the two topics in detail at the beginning (right after introducing the system) then discussing their links to reassortments, viral emergence etc. could be a more informative, easily understandable and focused structure. The authors also failed to clearly state all the hypotheses and predictions (e.g., regarding intracellular colocalisation) near the end of the introduction.

      We restructured the Introduction with more background on genomic assembly in influenza viruses, as requested by two reviewers (lines 43-52), more discussion of epistasis (lines 58-63) and provided a more thorough discussion of all hypotheses (lines 74-77, 88-92, 94-95, 97-106).

      The authors used Robinson-Foulds (RF) metric to quantify topological distance between phylogenetic trees-a key variable of the study. But they did not justify using the metric despite its well-known drawbacks including lack of biological rational and lack of robustness, and particularly when more robust measures, such as generalised RF, are available.

      We agree that RF has drawbacks. To address this, we performed a companion analysis using the Clustering Information Distance (CID) recently described by Smith, 2020. The mean CID can be found in Figure S4, the standard error of the mean in Figure S5, and networks depicting overall relationships between segments by CID in Figure S7E-S7H. To better assess how well RF and CID correlate with each other across influenza virus subtypes and lineages, we reanalyzed all data from both sets of distance measures by linear regression (Figure 3B, 4B-C, 5B, S6 and S9). Our results from both methods are highly comparable, which we believe strengthens our conclusions. Both analyses are included in the resubmission (lines 86-89; 162; 164; 187-188; 199-200; 207-208; 231-234; 242-244; 466-470).

      Figure 1 of the paper is extremely helpful to understand the large number of methods and links between them. But it could be more useful if the authors could clearly state the goal of each step and also included the molecular methods in it. That would have connected all the hypotheses in the introduction to all the results neatly. I found a good example of such a schematic in a paper that the authors have cited (Fig. 1 of Escalera-Zamudio et al. 2020, Nature communications). Also this methodological scheme needs to be cited in the methods section.

      We provided the molecular methods in a schematic in Figure 1D and the figure is cited in the Methods (lines 310; 440; 442; 456; 501).

      Finally, I found the methods section to be difficult to navigate, not because it lacked any detail. The authors have been excellent in providing a considerable amount of methodological details. The difficulty arose due to the lack of a chronological structure. Ideally, the methods should be grouped under research aims (for example, Data mining and subsampling, analysis of phylogenetic concordance between genomic segments, identifying RNA-RNA interactions etc.), which will clearly link methods to specific results in one hand and the hypotheses, in the other. This structure would make the article more accessible, for a general audience in particular. The results section appeared to achieve this goal and thus often repeat or explain methodological detail, which ideally should have been restricted to the methods section.

      We organized the Methods section by research aims as suggested. However, some discussion of the methods were retained in the Results section to ensure that the manuscript is accessible to audiences without formal training in phylogenetics.

      Reviewer #3:

      The authors sought to show how the segments of influenza viruses co-evolve in different lineages. They use phylogenetic analysis of a subset of the complete genomes of H3N2 or the two H1N1 lineages (pre and post 2009), and use a method - Robinson-Foulds distance analysis - to determine the relationships between the evolutionary patterns of each segment, and find some that are non-random.

      1) The phylogenetic analysis used leaves out sequences that do not resolve well in the phylogenic analysis, with the goal of achieving higher bootstrap values. It is difficult to understand how that gives the most accurate picture of the associations - those sequences represent real evolutionary intermediates, and their inclusion should not alter the relationships between the more distantly related sequences. It seems that this creates an incomplete picture that artificially emphasizes differences among the clades for each segment analyzed?

      Reviewer #1 raised the same concern. Please refer to our response at the beginning of this letter where we address this issue in depth.

      2) It is not clear what the significance is of finding that sequences that share branching patterns in the phylogeny, and how that informs our understanding of the likelihood of genetic segments having some functional connection. What mechanism is being suggested - is this a proxy for the gene segments having been present in the same viruses - thereby revealing the favored gene segment combinations? Is there some association suggested between the RNA sequences of the different segments? The frequently evoked HA:NA associations may not be a directly relevant model as those are thought to relate to the balance of sialic acid binding and cleavage associated with mutations focused around the receptor binding site and active site, length of NA stalk, and the HA stalk - does that show up in the overall phylogeny of the HA and NA segments? Is there co-evolution of the polymerase gene segments, or has that been revealed in previous studies, as is suggested?

      We clarified our working hypotheses in the Introduction (lines 89-106) and what is known about the polymerase subunits (lines 92-93). Our data do suggest that polymerase subunits share similar evolutionary trajectories that are more driven by protein than RNA (lines 291-293; Figure 2A and 6). The point about epistasis between HA and NA arising from indirect interactions is entirely fair, but these studies are nonetheless the basis for our own work. We have clarified the distinction between these prior studies and our own in the text (lines 60-63 and 74-75). Moreover, our protein trees built from HA and NA recapitulate what has been shown previously, which we highlight in the text (lines 293-296; Figure 6 and Figure S10). We also clarified our interpretation of tree similarity throughout the text (lines 165-168; 190-191; 261-264; 323-326; 419-423).

      The mechanisms underlying the genomic segment associations described here are not clear. By definition they would be related to the evolution of the entire RNA segment sequence, since that is being analyzed - (1) is this because of a shared function (seems unlikely but perhaps pointing to a new activity), or is it (2) because of some RNA sequence-associated function (inter-segment hybridization, common association of RNA with some cellular or viral protein)? (3) Related to specific functions in RNA packaging - please tell us whether the current RNA packaging models inform about a possible process. Is there a known packaging assembly process based on RNA sequences, where the association leads to co-transport and packaging - in that case the co-evolution should be more strongly seen in the region involved in that function and not elsewhere? The apparent increased association in the cytoplasm of the subset of genes examined for the single virus looks mainly in the cytoplasm close to the nucleus - suggesting function (2) and/or (3)?.

      It is difficult to figure out how the data found correlates with the known data on reassortment efficiency or mechanisms of systems for RNA segment selection for packaging or transport - if that is not obvious, maybe you can suggest processes that might be involved.

      We provided more context on genomic packaging in the Introduction, including the current model in which direct RNA interactions are thought to drive genomic assembly (lines 43-53). Although genomic segments are bound by viral nucleoprotein (NP), accurate genomic assembly is theorized to be a result of intersegment hybridization rather than driven by viral or cellular protein. We further clarified our hypotheses regarding the colocalization data in the Results section to make the proposed mechanism clearer (lines 313-326).

    1. Author Response:

      Evaluation Summary:

      The study provides evidence that specific transcriptional responses may underpin the observation that metabolic rates often scale inversely with body mass. The conclusions are supported by direct measurement of metabolic fluxes in mouse and rat livers, although generalizations to other settings remain to be rigorously tested. The study has broad implications for researching and studying animal metabolism and physiology.

      We thank the reviewers and editors for this summary. We are pleased that they agree that the conclusions “are supported by direct measurements of metabolic fluxes in mouse and rat livers,” and that “the study has broad implications for researching and studying animal metabolism and physiology. While we fully agree that “generalizations to other settings remain to be rigorously tested,” we have now added a comment comparing our measured liver fluxes in rodents to those recently measured in people:

      “While we did not have the capacity to measure liver fluxes in larger mammals in the current study, endogenous glucose production, VPC, and VCS previously measured using PINTA were 50-60% lower in overnight fasted humans than in rats (Petersen et al., 2019), assuming a liver size of 1,500 g in humans.”

      Reviewer #1 (Public Review):

      It is well established that the energy expenditure and metabolic rate of metazoan organisms scale inversely to body mass, based on the measurement of oxygen consumption and caloric intake. However, the underlying regulatory mechanisms for this observation are poorly defined. To investigate whether metabolic scaling is associated with reduced levels of transcription of metabolic genes in larger animals, the authors reviewed existing transcriptional datasets from liver tissues of five animals (mice, rats, monkeys, humans and cattle) with a 30,000-fold range in average adult body weights. They identified a number of metabolic genes in different pathways of central carbon metabolism whose expression inversely scaled with body size, a majority of which required oxygen, NAD/H or ATP/ADP. Metabolic flux studies on intact liver sections, as well as in live animals also revealed decreased liver metabolic fluxes in rats compared to mice. Interestingly, these differences were not observed in primary hepatocyte cultures, indicating that metabolic scaling is primarily regulated by cell-extrinsic factors and tissue context. These are interesting findings and highlight the importance of measuring metabolic processes in vivo. The measurement of cellular metabolic fluxes in different contexts (cultured, ex vivo tissue sections and live animals) is a major strength of this study. The lack of direct evidence that enzyme levels correlate with mRNA, and the absence of both transcriptional and enzyme activity measurements in cultured cells are potential weaknesses.

      We are delighted, and thank Reviewer #1 for stating that “These are interesting findings and highlight the importance of measuring metabolic processes in vivo” and that “The measurement of cellular metabolic fluxes in different contexts (cultured, ex vivo tissue sections and live animals) is a major strength of this study.” In addition, we sincerely thank the reviewer for raising important weaknesses related to the importance of proteomics, transcriptional and enzyme activity measurements in cultured cells, and are pleased to have had the opportunity to add data to address each of these points.

      Reviewer #2 (Public Review):

      Akingbesote et al. aim to determine the molecular basis of metabolic scaling - the phenomenon that metabolic rates scale inversely with (0.75) body mass. More specifically, they test the hypothesis that expression of genes involved in the regulation of oxygen consumption and substrate metabolism as well as respective fluxes provide a molecular basis for metabolic scaling across five species: mice, rats, monkeys, humans, and cattle. To this end, Akingbesote et al. use publicly available transcriptomics data and identify genes that show decreasing (normalized) expression with increasing mass of organisms. This descriptive analysis is followed by discussing a few relevant examples and (KEGG) pathway enrichment analysis. The authors then used their published PINTA approach with data from their experiments with mice and rats to provide estimates of selected cytosolic and mitochondrial fluxes in vitro, ex vivo, and in vivo; these estimates are then employed in determining if metabolic fluxes scale. The conclusion drawn from these analyses is that estimates of selected fluxes do not differ in vitro between plated hepatocytes of mice and rats, but that differences can be detected using metabolic flux analysis in vivo. As a result, in vivo flux profiling is more relevant to assessing metabolic scaling.

      The conclusions are only in part supported by the data and clarifications are needed both with respect to the analysis of transcriptomics data as well as flux estimates:

      1. In looking for scaling in gene expression, the authors rely on the assumption that mRNA expression correlates well with protein abundance (citing Schwanhäusser et al., 2011); however, transcripts explain about 40% of variance in protein abundance (this observation holds across multiple species). Hence, the identified patterns based on the transcript data may have little implications for protein abundance or flux.

      We agree that, despite the data in the cited publication, gene expression should not be assumed to directly correlate with protein expression, and the two certainly cannot be assumed – without data to equate to metabolic flux. We have removed the citation, and replaced it with proteomics data. Half of the genes available in the proteomics analysis which were found to correlate negatively with body size in our liver transcriptomics analysis also correlated negatively with body size at the level of liver protein expression:

      Author Response Figure 1

      Additionally, we analyzed available proteomics assessment of left ventricular expression of the three proteins observed to correlate negatively with body mass in the liver proteomics analysis. One of the three genes observed to correlate negatively with body mass in the proteomics analysis of liver, GLUL, was also shown to correlate negatively with body mass when its expression was assessed in the heart:

      Author Response Figure 2

      However, as discussed in our response to the editor’s point 1, we are limited by the available data, and fully acknowledge that without the capacity to statistically compare groups, we cannot make conclusive statements regarding the proteomics data.

      Additionally, we have substantially softened the description of the implications of the transcriptomics data in the Abstract, Introduction, and Discussion, including: - Editing “Together, these data reveal that metabolic scaling extends beyond oxygen consumption to numerous other metabolic pathways, and is likely regulated at the level of gene and protein expression, enzyme activity, and substrate supply” to add the parameters in red. - Removing “Considering that mRNA expression correlates well with protein expression under basal conditions, especially for metabolic genes (Schwanhäusser et al., 2011), we used mRNA expression as a proxy for the relative abundance of metabolic enzymes.” - Added “Further analysis of liver proteomics revealed that approximately half of the genes in liver that scaled at the transcriptional level also scaled at the level of protein expression,” now linking gene expression to protein expression to metabolic flux. - Editing “Numerous metabolic genes…followed the pattern of metabolic scaling, and informed our isotope tracer based in vitro and in vivo metabolic flux studies” to “Numerous metabolic genes…followed the pattern of metabolic scaling. Further analysis of liver proteomics revealed that approximately half of the genes in liver that scaled at the transcriptional level also scaled at the level of protein expression. To determine if gene and protein expression would correlate with scaling at the level of metabolic flux, we performed a comprehensive assessment of liver metabolism in vivo and in vitro using modified Positional Isotopomer NMR Tracer Analysis (PINTA)…” - Edited “Taken together, this study demonstrates systems regulation of metabolic scaling: gene expression in livers showed that scaling occurs to regulate oxygen consumption and substrate supply, isotope-based tracer studies in mice and rats demonstrated the mechanistic function of these enzymes in vivo which was only apparent in the living organism rather than plated cells” to “Taken together, this study demonstrates systems regulation of the ordering of metabolic fluxes according to body size, and provides unique insight into the regulation of metabolic flux across species.” - Removed “Interestingly, the scaling of GPT and ADIPOR1 further suggest that there is dependence on extra-hepatic organs in the scaling of in vivo gluconeogenesis and fatty acid oxidation: that is, skeletal muscle supply of alanine for the liver mediated glucose-alanine cycle and adipose tissue-derived adiponectin signaling. These findings also suggests that the scaling of mitochondrial mass (Porter and Brand, 1995) or mitochondrial proton leak (Porter and Brand, 1993) cannot fully explain metabolic scaling.” - Added “However, it should be noted that metabolic scaling cannot fully be explained at the transcriptional level, because many rate-limiting enzymes in the metabolic processes measured in vivo did not scale at the transcriptional level, and only approximately half of genes that scaled at the level of mRNA scaled at the level of protein. Thus, it is likely that both transcriptional and other mechanisms – such as enzyme activity – are responsible for variations in metabolic flux per unit mass, inversely proportionally to body size. Additionally, the currently available data do not allow us to assess whether expression of certain isoforms of key metabolic enzymes scale differentially across species.”

      1. While the procedure used to identify transcripts whose expression scale is clearly described, focusing the enrichment on KEGG pathways can only identify metabolic genes that scale. It would be informative and instructive to investigate if and to what extent genes involved in non-metabolic processes, that affect metabolic rates, also scale.

      We acknowledge that focusing the enrichment on KEGG pathways does enrich for the identification of metabolic processes that scale. However, we would respectfully submit that because this manuscript focuses on metabolic scaling, this seems to be the most appropriate setting in which to conduct the analysis. New data added in this revision demonstrate that three metabolic enzymes that scaled in the transcriptomics analysis also scale relative to β-actin, further suggesting that the inverse correlation of gene expression with body weight is primarily confined to metabolic processes:

      Author Response Figure 3

      In addition, we measured the expression of two structural proteins (collagenase 3 [Mmp3] and Larp6) outside of metabolic pathways, relative to β-actin (Actb), and found that neither was differentially expressed relative to actin in mice versus rats:

      Author Response Figure 4

      We recognize that these data may be confounded by the fact that Actb expression could potentially be different in mice versus rats; however, the fact that metabolic genes scale relative to β-actin (Actb) expression shows that it is unlikely that global mRNA scaling is unlikely to be the sole cause of the metabolic scaling phenotype.

      1. The result on flux ratios and absolute fluxes, based on the equations in Table S1, rely on certain assumptions (e.g. metabolic and isotopic steady state, among the others listed in PINTA); the current presentation does not ensure that all assumptions of PINTA are met in the present setting, so the estimates may be biased, leading to alternative explanations for the observed differences in vivo or the lack thereof in vitro.

      However, we fully agree with the reviewer that it is critical to ensure that key assumptions are met when presenting tracer data, and thank them for raising this important point. Thus, we have now added data demonstrating that plasma m+1, m+2, and m+7 glucose are in steady state at 100 min of the 120 min in vivo tracer infusion:

      Author Response Figure 5

      Additionally, we now show that blood glucose and plasma lactate concentrations have reached steady state as well:

      Author Response Figure 6

      With these data, we validate that the mice and rats are at metabolic and isotopic steady state by the end of the 120 min tracer infusion. We recognize that we have not validated that liver m+1 and m+2 glucose are at steady state, as that would require two additional groups of mice and rats (to sacrifice at 100 and 110 min, compared to the animals euthanized after 120 min of infusion) and introduce additional variability. Additionally, plasma m+1 and m+2 glucose come from endogenous glucose production from 13C tracer, so if m+1 and m+2 glucose are in steady state in plasma, they must be in steady state in liver.

      An additional assumption is that liver glycogen is effectively depleted after the overnight fast utilized in these studies. We have now verified this assumption by comparing fed and overnight fasted liver glycogen concentrations, and detect negligible glycogen after the fast in both rats and mice:

      Author Response Figure 7

      Additionally, we validated isotopic steady state in our hepatocytes incubated in 3-13C lactate. As expected in plated cell studies, cells reached steady state in both [13C] lactate enrichment and m+1 and m+2 glucose enrichment within 60 min. Because net glucose production is measured using the accumulation of glucose, we do not expect – and did not measure – glucose concentration at steady state, but we did confirm that the accumulation of glucose is linear throughout the 6 hr incubation (thus confirming that 6 hr is a reasonable endpoint):

      Author Response Figure 8

      We very respectfully submit that after 8 prior publications using PINTA called as such (PMID 28986525, 29307489, 29483297, 31545298, 31578240, 32610084, 32132708, 32179679), in addition to several prior publications that utilized PINTA without the acronym, it would not be the most responsible use of animals to try to prove in this manuscript that PINTA is a legitimate means of assessing substrate fluxes in the current manuscript. However, we thank the reviewer for raising the important point regarding assumptions of the method, thereby allowing us to insert data verifying that the key assumptions are met.

      1. The findings regarding the flux estimates seem to be fully determined by observed differences in gluconeogenesis (as demonstrated in Fig. 4). Usage of more involved approaches for metabolic flux analysis may provide wider-reaching conclusions beyond selected fluxes that appear fully coupled.

      Fluxes are back-calculated from total glucose production so that methodologically they are “coupled”, but this does not mean that glucose production will always mirror other flues. For example, in our 2015 manuscript using PINTA – although we had not yet named the method “PINTA” – we measured decreased endogenous glucose production (EGP) simultaneously with increased citrate synthase flux (mitochondrial oxidation, VTCA, which we have subsequently begun to call VCS in recognition of the fact that different reactions in the TCA cycle can proceed at different rates, but the calculation is the same) (Perry et al. Science 2015).

      Similarly, another study demonstrated that the same mitochondrial uncoupler (CRMP) increased VCS while EGP decreased in nonhuman primates (Goedeke et al. Sci. Transl. Med. 2019).

      These data demonstrate that, while fluxes are back-calculated from EGP with PINTA, the method is fully capable of detecting differences in oxidative fluxes without, or in the opposite direction of, changes in EGP. We very respectfully submit that we are not aware of what a more “involved” approach for metabolic flux analysis would entail, and that after the 8 prior publications listed in response to the previous point, we are not trying to validate PINTA in the current manuscript.

      Reviewer #3 (Public Review):

      This manuscript addresses a fundamental aspect of mammalian biology referred to as scaling, in which metabolic processes calibrate to the size of the organism. Longstanding observations related to scaling have been established based on rates of oxygen consumption. This manuscript extends these observations to gene expression and metabolic fluxes in order to discover the metabolic pathways that scale with body mass. The analyses are focused on the liver, which is the metabolic hub of the organism. Gene expression levels gleaned from available databases for organisms of varied sizes are analyzed and queried for scaling based on body mass. This analysis reveals that scaling is mainly a characteristic of metabolic genes. These data inform metabolic flux studies in cultured cells, liver slices and whole organisms. These studies demonstrate that scaling of metabolic fluxes occurs, but not out of the context of the whole organism or intact liver (in the form of liver slices). Scaling of metabolic fluxes is not observed in cultured hepatocytes. Overall, this is an interesting line of inquiry. The data are largely correlative in nature but add important texture to traditional characterization of oxygen consumption rates. The application of flux studies is a particular strength because these reflect the true metabolic processes. Enthusiasm was tempered by certain claims that extend beyond data (e.g., the title that suggests that metabolic scaling applies to tissues other than the liver, which was studied), as well as low numbers of biological replicates in some experiments, studies conducted in a single-gender and a writing style that includes excessive technical jargon.

      We thank the reviewer for their time spent evaluating the paper, and for their very helpful comments. We agree that “the application of flux studies is a particular strength because these reflect the true metabolic processes.” We agree that the study was focused on liver, although the previous iteration did include a small amount of white adipose tissue flux data, and have edited the manuscript to make clear that this is a liver-focused manuscript. We have now added specific numbers to each figure legend, and have also added in vivo flux measurements in female rats and mice. Additionally, the manuscript has been edited extensively. We have further detailed these modifications in our point-by-point responses to the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      Bornstein and colleagues address an important question regarding the molecular makeup of the different cellular compartments contributing to the muscle spindle. While work focusing on single components of the spindle in isolation - proprioceptors, gamma-motor neurons, and intrafusal muscle fibres - have been recently published, a comprehensive analysis of the transcriptome and proteome of the spindle was missing and it fills an important gap considering how local translation and protein synthesis can affect the development and function of such a specialised organ.

      The authors combine bulk transcriptome and proteome analysis and identify new markers for neuronal, intrafusal, and capsule compartments that are validated in vivo and are shown to be useful for studying aspects of spindle differentiation during development. The methodology is sound and the conclusions in line with the results.

      We thank the reviewer for highlighting the importance of our study.

      I feel a bit more analysis regarding the specificity and developmental expression profiles of the identified markers would be a great addition. In particular:

      • Are any of the proprioceptive sensory neurons markers specific for fibres innervating the muscle spindles or also found in Golgi tendon organs?

      We thank the reviewer for the important question, following which we performed two additional analyses. First, in order to study the specificity of spindle afferent genes we identified, we examined the overlap between our list of 260 potential proprioceptive neuron genes and markers for the three proprioceptive neurons subtypes (Ia, II and Ib) identified by Wu and colleagues (Wu et al. 2021). As shown in the newly added Figure 1- figure supplement 2F, while we found many genes that are common to all subtypes, 69 genes exclusively overlapped with subtype markers (22 genes with type Ia neurons, 45 genes with type II neurons and 2 genes with both; lists are shown in Supplementary File 4). These results suggest that the 69 genes are expressed by muscle spindle afferents and not by GTO afferents.

      Second, to study the specificity of our validated markers, we examined the expression of ATP1a3, VCAN and GLTU1, marking proprioception neurons, extracellular matrix and outer capsule, respectively, in GTOs. Results showed that all three markers were also detected in the different tissues composing the GTOs (newly added Figure 3 – figure supplement 3, below). As ATP1a3 is not in the 69 unique marker list, this analysis verified that it is expressed by all proprioceptive neurons. The expression of both VCAN and GLUT1 in GTO capsules highlights the similarity between the capsules of the two proprioceptors.

      • On the same line are any of the gamma motor neurons markers found also in alpha?

      We thank the reviewer for raising this issue. Following the reviewer’s question, we conducted a detailed analysis of the expression of potential γ motor neuron genes. To this end, we first generated a list of α-motor neurons genes in our data by performing ranked GSEA using published expression profiles of these neurons (Blum et al., 2021). Then, we compared between the three lists of neuronal genes, i.e. γ motor neurons, α motor neurons and proprioceptive neurons (newly added Figure 1 – figure supplement 2G), and found an overlap between the three lists. Nonetheless, we also identified 40 spindle genes that are specific to γ motor neuron (Figure 1 – figure supplement 2G and Supplementary File 4) and, therefore, are potential markers for these neurons.

      • How early expression of ATP1A3 is found in neurons at the spindle or fibres starting to innervating the muscle? A couple of late embryonic timepoints would be great.

      We thank the reviewer for this suggestion. We performed late embryonic (E15.5-E17.5) staining for ATP1a3, which showed its expression as early as E15.5 (new Figure 4 – figure supplement 1).

      • Given that the approach used allows to obtain insights on whether local translation plays a major role into the differentiation of the spindle it would be interesting to assess whether the proprioceptor and gamma motor neuron markers identified are also found in the cell body or exclusively at the spindle.

      The reviewer raises an interesting question about local translation of the neuronal genes. Going through the literature, several lines of evidence indicate that the genes expressed at the neuronal end are also expressed in the neuron soma. In a study on retinal ganglion cell translatome, Holt and colleagues found that the axonal translatome is a subset of the significantly larger somal translatome (Shigeoka et al., Cell, 2016). Similarly, a study by Shuman and colleagues that compared the translatome of neuronal cell bodies, dendrites, and axons of rat hippocampal neurons showed that many common genes are translated, albeit at different levels (Glock et al., PNAS, 2021). Finally, following the reviewer’s suggestion, we studied the expression of ATP1a3 in the DRG, and found it to be expressed there as well (Figure L1). Thus, we predict that the markers we found in the neurons ends are likely also expressed in the soma. While this issue is very interesting, we believe that further validation of our assumption exceeds the scope of this study.

      Figure L1. ATP1a3 expression in the DRG. Confocal images of DRG sections from adult PValb-Cre;tdTomato mice stained for ATP1a3 (magenta). Scale bars represent 50 μm.

      Altogether, this is a novel and important work that will benefit scientists studying the neuromuscular and musculoskeletal systems by pushing the field toward an holistic understanding of the muscle spindle. These datasets in combination with the previous ones can be used to develop new genetic and viral strategies to study muscle spindle development and function in healthy and pathological states by analysing the roles and relative contributions of different components of this fascinating and still mysterious organ.

      We thank again the reviewer for highlighting the importance of our study.

      Reviewer #2 (Public Review):

      The data presented are of high quality. Through complementary experiments involving the isolation of masseter muscle spindles, the authors perform RNA-seq and proteomic analysis, and identify genes and proteins that are differentially expressed in the muscle spindle versus the adjacent muscle fiber, and proteins that accumulate specifically in capsule cells and nerve endings. These data, while essentially descriptive, provide important information about the developmental framework of the sensory apparatus present in each muscle that accounts for its tension/contraction state. The data presented thus allow for a better characterization of muscle spindles and provide the community with a set of new markers for better identification of these structures. Analysis of the expression pattern of the Tomato reporter in transgenic animals under the control of Piezo2-CRE, Gli1-CRE and Thy1-YFP reporter reinforces the findings and the specificity of the expression pattern of the specific genes and proteins identified by the multi-omics approach and further validated by immunohistochemistry.

      We thank the reviewer for the positive and encouraging feedback.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Marmor and colleagues reanalyze a previously published dataset of chronic widefield Ca2+ imaging from the dorsal cortex of mice as they learn a go/no-go somatosensory discrimination task. Comparing hit trials that have a distinct history (i.e. are preceded by distinct trial types), the authors find that hit trials preceded by correct rejections of the nontarget stimulus are associated with larger subsequent neural responses than trials precede by other hits, across the cortex. The authors analyze the time course over which this effect emerges in the barrel cortex (BC) and the rostrolateral visual area (RL), and find that its magnitude increases as the animals become expert task performers. Although the findings are potentially interesting, I, unfortunately, believe that there are important methodological concerns that could put them into question. I also disagree with the rationale that singles out BC and RL as being especially important for the emergence of trial history effects on neural responses during decision-making. I detail these points below .

      1) The authors did not perform correction for hemodynamic contamination of GCaMP fluorescence. In widefield imaging, blood vessels divisively decrease neural signals because they absorb green-wavelength photons, which could lead to crucial confounds in the interpretation of the main results because of neurovascular coupling, which lags neural activity by seconds. For example, if a reward response from the previous trial is associated with a lagged hemodynamic contamination that artificially decreases the signal in the following trial, one could get artificially higher activity in trials that were not preceded by a reward (i.e. CR), which is what the authors observed. Ideally, the experiments would be repeated with proper hemodynamic correction, but at the very least the authors should try to address this with control analyses.

      Done. We basically redone the experiment with proper hemodynamic correction and maintained trial history results. Please see point 1 above for more details (Figures S4 and S5). In addition to hemodynamic controls, we also present novel two-photon single cell data with similar results in Figure S6. We also added a dedicated section for this in the Methods section (pg. 12).

      For example, what is the time course of reward-related responses in BC and elsewhere?

      In general, and specifically in BC, reward related responses return to baseline up to 5 seconds after the start of the reward period and at least 5 seconds before the stimulus presentation of the next trial. In the novel experiments we even extended the baseline period by an additional 2 seconds just in case. Trial history information was still present with an extended inter-trial interval.

      The text now reads (pg. 4): "We further report that responses during the reward period in cortex and specifically in BC went back to baseline 4-5 seconds after the start of the reward period and 6-8 seconds before the presentation of the next stimulus (total inter-trial interval ranged between 10-12 seconds)."

      Do hemodynamics artifacts have a trial-by-trial correlation with the subsequent trial history effect?

      We have now done the proper hemodynamic control (Figure 2) and we did not find a strong effect of hemodynamic responses on trial history information.

      What is the learning time course of reward responses?

      Responses during the reward period as a function of learning were not significantly modulated. We further show the whole learning profile for BC response during the reward period in Author response image 1.

      Author response image 1.

      Response in BC averaged during the reward period (2-4 sec after texture stop) as a function of learning for each mouse separately.

      The text now reads (pg. 4): "In addition, responses in BC during the reward period were not consistently modulated as a function of learning (p>0.05; Wilcoxon signed-rank test between naïve and expert, BC response averaged during the reward period, 2-4 seconds after stimulus onset; n=7 mice). Taken together, we find that direct responses from the reward period do not effect history-related responses during the next trial."

      Note that I don't believe the FA-Hit condition analysis that the authors have already presented provides adequate control, as punishment responses are also pervasive in the cortex and therefore suffer from the same interpretational caveat. Unfortunately, I believe this is a serious methodological issue given the above. However, I will proceed to take the reported results at face value .

      We hope that our additional control analysis regarding the hemodynamic controls are satisfactory.

      2) The statistics used to assess the effect of trial history over learning are inadequate (e.g., Fig 2b). The existence of a significant effect in one condition (e.g., CR-Hit vs. Hit-Hit in expert) but not in another (e.g., same comparison in naive) does not imply that these two conditions are different. This needs to be tested directly. Moreover, the present analysis does not account for the fact that measures across learning stages are taken from the same animals. Thus, the appropriate analysis for these cases would be to first use a two-way ANOVA with repeated measures with factors of trial history and learning stage (or equivalent non-parametric test) and then derive conclusions based on post hoc pairwise tests, corrected for multiple comparisons .

      Done. We performed 2 way ANOVA as suggested and found significant history and learning effects along with a significant interaction effect for BC.

      The text now reads (pg. 4): "This difference was significant during the stim period in learning and expert phases across mice (Fig. 2b; 2-way ANOVA with repeated measures; DF (1-6) F=51 p<0.001, DF (2-12) F=18 p<0.001, DF(2-12) F=5 p<0.05 for trial history, learning and the interaction between trial history and learning; Post hoc Tukey analysis p<0.05 for trial history in learning and expert phases; p>0.05 in the naïve phase)."

      3) I am not convinced that BC and RL are especially important for trial-history-dependent effects. Figures 4 and 5 suggest that this modulation is present across the cortex, and in fact, the difference between CR-Hit and Hit-Hit in some learning stages appears stronger in other areas. BC and RL do have the highest absolute activity during the epochs in Figs 4 and 5, but I would argue that this is likely due to other aspects of the task (e.g., touch) and therefore is not necessarily relevant to the issue of trial history .

      Done. First, we would like to point out that RL during the pre period displays the largest difference between the CR-Hit and Hit-Hit conditions (Fig. 5c bottom). Second, we now show difference maps (i.e., activity in CR-Hit minus Hit-Hit) which clearly show a positive activity patch in BC during the stim period for 5 out of the 7 mice (Fig. S10a). Example maps also highlight RL during the pre period (Fig. S10b). We note that activity patches somewhat spread over to other areas and also slightly vary across mice. This is why the grand average may slightly average out trial history information. Taken together, we strongly feel that during the pre period, trial history information emerges in RL (and adjacent posterior association areas) which shift towards BC during the stim period

      Nevertheless, we agree with the reviewer that other areas (that do not necessarily display high activity) may encode trial history information and we now clearly report this in the text (pg. 5): "We note that other areas, e.g., different association areas, also encoded historydependent information especially during learning and expert phases. In addition, we present activity difference maps between CR-Hit and Hit-Hit conditions during the stim period (Fig. S10a). These maps clearly show the highest trial history information (i.e., difference in activity) in BC. Taken together, these results indicate that BC encodes history-dependent information that emerges during the stim period and just after learning. "

      And also in (pg. 6): " In addition, we present activity difference maps between CR-Hit and HitHit conditions during the pre period (Fig. S10b). These maps localize trial history information to RL which also spreads to other adjacent association areas. Moreover, activity patches slightly vary across the different mice which may affect the grand average (averaged across mice) of each area."

      4) Because of similar arguments to the above, and because this was not directly assessed, I do not believe the conclusion that history information emerges in RL and is transferred to BC is warranted. For instance, there is no direct comparison between areas, but inspection of the ROC plots in Fig 6b suggests that history information emerges concomitantly across cortical areas. I suggest directly comparing the time course between these and other areas

      Done. We now add example history AUC maps and quantify history AUC for all 25 areas during the pre and stim periods. During the pre period (Fig. 6), AUC values are concentrated around the RL (and other PPC areas), whereas during the stim periods AUC values shift to BC. Again, due to the inter-mouse variability, these differences are slightly averaged out which also makes it tough to have strong statistical test (with only 7 mice).

      The text now reads (pg. 7): "We next calculated the history AUC for each pixel during either the pre or stim period. The history AUC maps during the pre period display AUC values around the RL areas (Fig. 6f). In contrast, the history AUC maps during the stim period display AUC values mostly in BC (Fig. 6g). Quantified across 25 areas and averaged across mice, RL displays the highest history AUC during the pre period, whereas BC displays the highest history AUC values during the stim period (Fig. 6h). We note that other cortical areas such as other association areas also display high history AUC values. Taken together, we find that trial history emerges in RL before the texture arrives and then shifts to BC during stimulus presentation. "

      5) How much is task performance itself modulated by trial history? How does this change over the course of learning? These behavioral analyses would greatly help interpret the neural findings and how this trial history might be used behaviorally .

      Done, we have now calculated the dprime for Hit-Hit and CR-Hit trials separately. We find no significant differences between conditions both within and across mice (see Fig. S2 below).

      The text now reads pg. 3): "We note that learning curves that are calculated separately for each pair (i.e., either a preceding Hit or CR trial) were not significantly different (Fig. S2)."

      Reviewer #2 (Public Review):

      Marmor et al. mine a previously published dataset to examine whether recent reward/stimulus history influences responses in sensory (and other) cortices. Bulk L2/3 calcium activity is imaged across all of the dorsal cortex in transgenic mice trained to discriminate between two textures in a go/no-go behavior. The authors primarily focus on comparing responses to a specific stimulus given that the preceding trial was or was not rewarded. There are clear differences in activity during stimulus presentation in the barrel cortex along with other areas, as well as differences even before the second stimulus is presented. These differences only emerge after task learning. The data are of high quality and the paper is clear and easy to follow. My only major criticism is that I am not completely convinced that the observed difference in response is not due to differences in movement by the animal on the two trial types. That said, the demonstration of differences in sensory cortices is relatively novel, as most of the existing literature on trial history effect demonstrates such differences only in higher-order areas .

      Major :

      1a) The claim that body movements do not account for the results is in my view the greatest weakness of the paper - if the difference in response simply reflects a difference in movement, perhaps due to "excitement" in anticipation of reward after not receiving one on CR-H vs. HH trials, then this should show up in movement analysis. The authors do a little bit of this, but to me, more is needed .  

      Done. We have now extensively and carefully analyzed body and whisker movements for CRHit and Hit-Hit conditions. First, In the figure below we decomposed body movements into 22 different body parts using DeepLabCut. In short, we find no significant difference between CRHit and Hit-Hit conditions in each body part separately (Fig. S7 below). This was true for the naïve, learning and expert phases. Please see additional analyses in the points below.

      This is now reported in the text (pg. 4): “In addition, we performed a more detailed body and whisker analysis, e.g., decomposing the movement to different body parts and obtaining single whisker dynamics. These analyses did not find significant differences in movement parameters between CR-Hit and Hit-Hit conditions (Fig. s7 and s8).”

      First, given the small sample size and use of non-parametric tests, you will only get p<.05 if at least 6 of the 7 mice perform in the same way. So getting p>.05 is not surprising even if there is an underlying effect. This makes it especially important to do analyses that are likely to reveal any differences; using whisker angle and overall body movement, which is poorly explained, is in my opinion insufficient. An alternative approach would be to compare movements within animals; small as the dataset is, it is feasible to do an animal-by-animal analysis, and then one could leverage the large trial count to get much greater statistical power, foregoing summary analyses that pool over only n=7 .

      We agree with this point and are have now dramatically improved our statistical analysis.

      1) We now perform within mouse statistics for responses in BC during naïve, learning and expert (see Fig. S4 below). In short, we find statistical significance for 7 out of 7 mice during the expert phase, 6 out of 7 mice in the learning phase and 0 out of 7 in the naive phase. For RL during the pre period we find significant difference in 5 out of 7 expert mice.

      This is now reported in the text (pg. 4): "In addition, a statistical comparison between CR-Hit and Hit-Hit responses within each mouse separately maintained significance for expert (7/7 mice Mann-Whitney U-test p<0.05) and learning (6/7 mice) but not for naïve (0/7 mice. Fig. S3)."

      And also in (pg. 5): "In addition, a statistical comparison between CR-Hit and Hit-Hit responses in RL within each mouse separately maintained significance for expert (5/7 mice; MannWhitney U-test p<0.05)."

      2) We would like to point out that we have now added 3 additional mice (with hemodynamics control) and performed within mouse statistics in BC and RL (Fig. S5), adding to our initial observations.

      3) In terms of body movements, we now performed within mice statistics and compared body movements between CR-Hit and Hit-Hit conditions. In general, most mice did not show a significant difference in body movements or whisker envelope.

      This is now reported in the text (pg. 4): "A within mouse statistical comparison between body or whisker parameters in CR-Hit and Hit-Hit maintained a non-significant difference in expert (1/7 mice displayed a significant difference; Mann-Whitney U-test p>0.05), learning (2/7 mice) and naïve (0/7 mice)."

      And also in (pg. 4): "Body movements and whisker parameters did not significantly differ between CR-Hit and Hit-Hit conditions during the pre-period (Similar to the stim period. Across and within mice. P>0.05; Mann-Whitney U-test)."

      In summary, we have now substantially improved our statistical analysis and further decomposed the body movements, maintaining the trial history results.

      The authors only consider a simple parametrization of movement (correlation across successive frames), and given the high variability in movement across animals, it is likely that different mice adopt different movements during the task, perhaps altering movement in specific ways. Aggregating movement across different body parts after an analysis where body parts are treated separately seems like an odd choice - perhaps it is fine, but again, supporting evidence for this is needed. As it stands, it is not clear if real differences were averaged out by combining all body parts, or what averaging actually entails .

      Please see the above point where we decomposed body movements (Fig. S7 and Methods section in Pg. 14).

      If at all possible, I would recommend examining curvature and not just the whisker angle, since the angle being the same is not too surprising given that the stimulus is in the same place. If the animal is pressing more vigorously on CR-H trials, this should result in larger curvature changes .

      Done. We now decompose whisker dynamics (i.e., curvature) using DeepLabCut (Fig. S8 see below). In general, we find no significant differences in whisker parameters between Hit-Hit and CR-Hit conditions.

      This is now reported in the text (pg. 4): "In addition, we performed a more detailed body and whisker analysis, e.g., decomposing the movement to different body parts. This analysis did not find significant differences between CR-Hit and Hit-Hit conditions (Fig. S7 and S8)."

      Finally, the authors presumably have access to lick data. Are reaction times shorter on CR-H trials? Is lick count or lick frequency shorter?

      Done. We now calculated lick reaction time and lick rate and find a significant difference for the lick reaction time but not in lick rate. We show a figure below for the reviewer and report this in the text

      The text now reads (pg. 3): "In addition, the lick reaction time (but not the lick rate) between Hit-Hit and CR-Hit were significantly different (p<0.05; Wilcoxon signed-rank test) ,maybe indicating a more considered response after a previous stop signal."

      If movement differs across trial types, it is entirely plausible that at least barrel cortex activity differences reflect differences in sensory input due to differences in whisker position/posture/etc. This would mitigate the novelty of the present results .

      As detailed above, have now meticulously analyzed the whisker parameter differences between both conditions and did not find any significant differences.

      1b) Given the importance of this control to the story, both whisker and body movement tracking frames should be explicitly shown either in the primary paper or as a supplement. Moreover, in the methods, please elaborate on how both whisker and body tracking were performed .

      Done. Please see Figs. S7 and S8 for tracking frames. This is now detailed in the above points and also the revised relevant methods section

      2) .Did streak length impact the response? For instance, in Fig. 1f "Learning", there is a 6-trial "no-go" streak; if the data are there, it would be useful to plot CR-H responses as a function of preceding unrewarded trials.

      Done. We have now calculated response in CR-Hit as a function of the number of preceding CRs. In general, we obtain inconsistent results across mice that may be due to the small number of trials that have more than one preceding CR. Nevertheless, some mice have a trend, sometimes significant, in which CR-Hit responses are higher for longer CR preceding streaks. This is especially true during the learning phase. We have decided not to include this in the manuscript and present this figure only to the reviewer.

    1. Author response:

      Reviewer #1 (Public Review):

      This is an important and very well conducted study providing novel evidence on the role of zinc homeostasis for the control of infection with the intracellular bacterium S. typhimurium also disentangling the underlying mechanisms and providing clear evidence on the importance of spatio-temporal distribution of (free) zinc within the cell.

      We thank the reviewer for the positive comments.

      1) It would be important to provide more information on the genotype of mice.

      As suggested by the reviewer, we have added the detailed genotype of Slc30a1flagEGFP/+ and Slc30a1fl/flLysMCre mice to the revised supplementary Figure supplement 10.

      2) It is rather unlikely that C57Bl6 mice survive up to two weeks after i.p. injection of 1x10E5 bacteria.

      According to the reviewer comment, we have tested survival rate using a group of our experimental animals and C57BL/6 wild type.

      The Salmonella stain is a gift from our friend, Professor Ge Bao-xue. We have sent this stain for genetic characterisation which we found 100% identity to Salmonella enterica Typhimurium with many strains originated from poultry. One of them is Salmonella enterica subsp. enterica serovar Typhimurium strain MeganVac1 (Accession: CP112994.1), a live attenuated stain. We hope that this would support the relationship between the high infectious dose and mice survive.

      Author response image 1.

      (A) Survival rate of Slc30a1fl/fl and Slc30a1fl/flLysMCre (n = 14-15/group) and (B) Survival rate of C57BL/6 wild type (n = 8) after Salmonella infection for two weeks. (C) A fulllength sequence (1,478 bases) of 16S rDNA genes sequences of Salmonella stain and (D) the sequencing electropherogram.

      3) To be sure that macrophages Slc30A1 fl/fl LysMcre mice really have an impaired clearance of bacteria it would be important to rule out an effect of Slc30A1 deletion of bacterial phagocytosis and containment (f.e. evaluation of bacterial numbers after 30 min of infection).

      As the reviewer advised, we have repeated the experiment and measured the bacterial numbers after 30 min of infection (dashed line in A). The results show that there is no statistical difference in the bacterial numbers after 30 min between Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs. Therefore, the reduction of bacterial numbers after 24 hours occurs due to the impairment of intracellular pathogen-killing capacity as the reviewer pointed out.

      Author respnse image 2.

      (A) Time course of the intracellular pathogen-killing capacity of Salmonellainfected Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs measured in colony-forming units per ml (n = 5). (B) Fold change in Salmonella survival (CFU/mL) at different time points from A. (C) Representative images of Salmonella colonies on solid agar medium at 24 hours. Data are represented as mean ± SEM. P values were determined using 2-tailed unpaired Student’s t-test. P<0.05, *P<0.01, and ns, not significant.

      4) Does the addition of zinc to macrophages negatively affect iNOS transcription as previously observed for the divalent metal iron and is a similar mechanism also employed (CEBPß/NF-IL6 modulation) (Dlaska M et al. J Immunol 1999)?

      The reviewer has raised an important point here since free zinc also play a role in multiple levels of cellular signaling components (Kembe et al., 2015). Dlaska and colleague reported that NF-IL6, a protein responsible for iNOS transcription is negatively regulated by iron perturbation under IFNg/LPS stimulation in macrophages (Dlaska and Weiss, 1999). As the reviewer suggested, our results showed that zinc supplementation decreases the iNOS expression in macrophages after Salmonella infection, suggesting that free zinc might play a role in iNOS regulation.

      However, in Slc30a1fl/flLysMCre macrophages, despite increase intracellular free zinc, lacking Slc30a1 also induces Mt1, a zinc reservoir which might negatively affect NO production (Schwarz et al., 1995) or alternatively inhibits iNOS through NF-kB pathway (Cong et al., 2016) as reported by previous studies. Therefore, we couldn’t rule out the possibility that defects in Salmonella clearance due to iNOS/NO inhibition may be caused by a complex combination of excess free zinc and overexpression of the zinc reservoir. To prove this hypothesis, further studies using the specific target, for example Mtfl/fliNOSfl/flLysMCre model might be needed to investigate the precision mechanism.

      Author response image 3.

      RT-qPCR analysis of mRNA encoding Nos2 in BMDMs after infected with Salmonella and Salmonella plus ZnSO4 (20 μM) for 4 h.

      Reference:

      Dlaska M, Weiss G. 1999. Central role of transcription factor NF-IL6 for cytokine and ironmediated regulation of murine inducible nitric oxide synthase expression. The Journal of Immunology. 162:6171-6177, PMID: 10229861

      Kambe T, Tsuji T, Hashimoto A, Itsumura N. 2015. The physiological, biochemical, and molecular roles of zinc transporters in zinc homeostasis and metabolism. Physiological Reviews. 95:749-784. https://doi: 10.1152/physrev.00035.2014, PMID: 26084690

      Schwarz MA, Lazo JS, Yalowich JC, Allen WP, Whitmore M, Bergonia HA, Tzeng E, Billiar TR, Robbins PD, Lancaster JR Jr, et al. 1995. Metallothionein protects against the cytotoxic and DNA-damaging effects of nitric oxide. Proceedings of the National Academy of Sciences of the United States of America. 92: 4452-4456. https://doi: 10.1073/pnas.92.10.4452, PMID: 7538671

      Cong W, Niu C, Lv L, Ni M, Ruan D, Chi L, Wang Y, Yu Q, Zhan K, Xuan Y, Wang Y, Tan Y, Wei T, Cai L, Jin L. 2016. Metallothionein prevents age-associated cardiomyopathy via inhibiting NF-κB pathway activation and associated nitrative damage to 2-OGD. Antioxidants & Redox Signaling. 25: 936-952. https://doi: 10.1089/ars.2016.6648, PMID: 27477335

      5) How does Zinc or TPEN supplementation to bacteria in LB medium affect the log growth of Salmonella?

      We found that zinc supplementation at both low (20 µM) and high (640 µM) concentrations negatively effects Salmonella growth, especially during log phase and stationary phase in the broth culture medium, but not TPEN (20 µM) supplementation. These indicates that high zinc conditions occur at cellular levels such as within phagosomes (Botella et al., 2011) can limit bacterial growth.

      Author response image 4.

      Growth curve (optical density, OD 600 nm) of Salmonella in LB medium at different concentrations of ZnSO4 and/or TPEN. Bar graph indicating Salmonella growth at specific time points. Each value was expressed as mean of triplicates for each testing and data were determined using 2-tailed unpaired Student’s t-test. P<0.05, P<0.01, **P<0.001 and ns, not significant.

      Reference:

      Botella H, Peyron P, Levillain F, Poincloux R, Poquet Y, Brandli I, Wang C, Tailleux L, Tilleul S, Charrière GM, Waddell SJ, Foti M, Lugo-Villarino G, Gao Q, Maridonneau-Parini I, Butcher PD, Castagnoli PR, Gicquel B, de Chastellier C, Neyrolles O. 2011. Mycobacterial p(1)-type ATPases mediate resistance to zinc poisoning in human macrophages. Cell Host Microbe. 10:248-59. https://doi: 10.1016/j.chom.2011.08.006, PMID: 21925112

      Reviewer #2 (Public Review):

      This paper explores the importance of zinc metabolism in host defense against the intracellular pathogen Salmonella Typhimurium. Using conditional mice with a deletion of the Slc30a1 zinc exporter, the authors show a critical role for zinc homeostasis in the pathogenesis of Salmonella. Specifically, mice deficient in Slc30a1 gene in LysM+ myeloid cells are hypersusceptible to Salmonella infection, and their macrophages show alter phenotypes in response to Salmonella. The study adds important new information on the role metal homeostasis plays in microbe host interactions. Despite the strengths, the manuscript has some weaknesses. The authors conclude that lack of slc30a1 in macrophages impairs nos2-dependent anti-Salmonella activity. However, this idea is not tested experimentally. In addition, the research presented on Mt1 is preliminary. The text related to Figure 7 could be deleted without affecting the overall impact of the findings.

      We thank the reviewer for his/her positive comments and constructive suggestions.

      Reviewer #3 (Public Review):

      Na-Phatthalung et al observed that transcripts of the zinc transporter Slc30a1 was upregulated in Salmonella-infected murine macrophages and in human primary macrophages therefore they sought to determine if, and how, Slc30a1 could contribute to the control of bacterial pathogens. Using a reporter mouse the authors show that Slc30a1 expression increases in a subset of peritoneal and splenic macrophages of Salmonella-infected animals. Specific deletion of Slc30a1 in LysM+ cells resulted in a significantly higher susceptibility of mice to Salmonella infection which, counter to the authors conclusions, is not explained by the small differences in the bacterial burden observed in vivo and in vitro. Although loss of Slc30a1 resulted in reduced iNOS levels in activated macrophages, the study lacks experiments that mechanistically link loss of NO-mediated bactericidal activity to Salmonella survival in Slc30a1 deficient cells. The additional deletion of Mt1, another zinc binding protein, resulted in even lower nitrite levels of activated macrophages but only modest effects on Salmonella survival. By combining genetic approaches with molecular techniques that measure variables in macrophage activation and the labile zinc pool, Na-Phattalung et al successfully demonstrate that Slc30a1 and metallothionein 1 regulate zinc homeostasis in order to modulate effective immune responses to Salmonella infection. The authors have done a lot of work and the information that Slc30a1 expression in macrophages contributes to control of Salmonella infection in mice is a new finding that will be of interest to the field. Whether the mechanism by which SLC30A1 controls bacterial replication and/or lethality of infection involves nitric oxide production by macrophages remains to be shown.

      We very much appreciate the reviewer’s detailed evaluation and suggestions. The manuscript has been revised thoroughly according to the reviewer’s advice.

    1. Author Response

      Reviewer #1 (Public Review):

      This work focuses on the mechanisms that underlie a previous observation by the authors that the type VI secretion system (T6SS) of a Pseudomonas chlororaphis (Pchl) strain can induce sporulation in Bacillus subtilis (Bsub). The authors bioinformatically characterize the T6SS system in Pchl and identify all the core components of the T6SS, as well as 8 putative effectors and their domain structures. They then show that the Pchl T6SS, and in particular its effector Tse1, is necessary to induce sporulation in Bsub. They demonstrate that Tse1 has peptidoglycan hydrolase activity and causes cell wall and cell membrane defects in Bsub. Finally, the authors also study the signaling pathway in Bsub that leads to the induction of sporulation, and their data suggest that cell wall damage may lead to the degradation of the anti-sigma factor RsiW, leading to activation of the extracellular sigma factor σW that causes increased levels of ppGpp. Sensing of high ppGpp levels by the kinases KinA and KinB may lead to phosphorylation of Spo0F, and induction of the sporulation cascade.

      The findings add to the field's understanding of how competitive bacterial interactions work mechanistically and provide a detailed example of how bacteria may antagonize their neighbors, how this antagonism may be sensed, and the resulting defensive measures initiated.

      While several of the conclusions of this paper are supported by the data, additional controls would bolster some aspects of the data, and some of the final interpretations are not substantiated by the current data.

      • The Bsub signaling pathway that is proposed is intricate and extensive as shown in Fig 5A. However, the data supporting that is very sparse:

      a) The authors show no data showing that the proteases PrsW and/or RasP, or the extracellular sigma factor σW are necessary, or that the cleavage of RsiW is needed, for induction of sporulation - this could presumably be tested using mutants of those genes.

      It has been previously demonstrated that the proteases PrsW and/or RasP cleave RsiW under certain conditions such as alkaline-shock (Heinrich et al., 2009). In first place, PrsW cleaves RsiW and the resulting cleaved-RsiW serves as substrate to RasP. In the previous version of the manuscript, we already demonstrated that treatment with Tse1 causes damage to PG and delocalization of RsiW, however as the reviewer comments we did not show the participation of any of these proteases in the proposed signaling pathway. We have now generated single mutants in rsiW and prsW and they have been treated with Tse1. We have observed no variation in the levels of sporulation compared to untreated strains (Figure 1) a finding according to their suggested implication in the sporulation signaling pathway activated by Tse1. Positive controls, that is the single mutants grown at 37ºC, were still able to sporulate. This data has been added to Figure 6B in the new version of the manuscript.

      As suggested by other reviewers, we have generated a sister plot of this figure showing the raw CFUs in each case. These data are included in Supplementary file 3. This experiment and the related figure have been incorporated into the new version of the manuscript.

      Figure 1. A) Quantification of the percentage of sporulated Bsub, rsiW and prsW cells after treatment with purified Tse1 showing that rsiW and prsW single mutants are blind to the presence of Tse1. B) Cell density (CFUs/mL) of total (blue bars) and sporulated population (brown bars) of different Bacillus strains (Bsub, ∆rsiW and ∆prsW) untreated and treated with Tse1. Sporulation at 37ºC is shown as positive control in each strain. Statistical significance was assessed via t-tests. p value < 0.1, p value < 0.001, **p value < 0.0001.

      Similarly, they don't demonstrate that the levels of ppGpp increase in the cell upon exposure to Pchl.

      We have not been able to measure the levels of ppGpp, however, given that in the same proposed sporulation cascade the levels of different nucleotides are altered (Kriel et al., 2013, Tojo et al., 2013, López and Kolter, 2010), we have alternatively analyzed the levels of ATP using an ATP Determination Kit (Thermo, A22066). We have found that ATP levels increased by 3-fold in Bsub cells treated with Tse1 compared to untreated control cells. Consistently, no increase in ATP levels were observed in rsiW or prsW mutants treated with Tse1. We have incorporated all the raw luminescence data obtained for each sample and treatment in Figure 6-source data 1. This experiment, figures (Figure 6A in the new version of the manuscript) and description in “Materials and Methods” have been added to the new version of the manuscript.

      c) There is some data showing that kinA and kinB mutants don't induce sporulation (Fig supplement 7A), but that is lacking the 'no attacker' control that would demonstrate an induction.

      We have included in the new version of the manuscript the ‘no attacker’ control sporulation (%). The figure shows that the presence of Pchl strains induces the sporulation of all kinase mutants. This new data has been incorporated in Figure 6-figure supplement 1A in the new version of the manuscript.

      d) There is some data showing that RsiW may be cleaved (Fig 5C, D), but that data would benefit from a positive control showing that the lack of YFP foci is seen in a condition where RsiW is known to be cleaved, as well as from a time-course showing that the foci are present prior to the addition of Tse1, and then disappear. As it is shown now, it is possible that the addition of Tse1 just blocks the production of RsiW or its insertion into the membrane (especially given the membrane damage seen). Further, there is no data that the disappearance of the YFP loci requires the proteases PrsW and /or RasP - such data would also support the idea that the disappearance is due to cleavage of RsiW.

      Thank you for your useful suggestion. It is important to consider that we have not seen repression of the expression of genes that encode any of the two proteases on cells treated with Tse1 in our transcriptomics analysis. However, we agree that additional experiments would enhance the significance of our findings. We have repeated the whole experiment including a positive control to demonstrate that YFP foci disappears in a condition in which RsiW is known to be degraded by PrsW and RasP. Bacillus cells have been incubated in medium at pH 10 which provokes an alkaline shock that triggers RsiW cleavage (Asai, 2017; Heinrich et al., 2009). As shown in Fiugre 6D under this condition we also observed disappearance of YFP foci . We have also provided extra images with quantification of average signal from YFP-foci in Figure 6-figure supplement 2 .

      • The entire manuscript suggests that T6SS is solely responsible for the induction of sporulation. While T6SS does appear to play a major part in explaining the sporulation induction seen, in the absence of 'no attacker' controls for Fig. 2A, it is impossible to see this. From the data shown in Fig. 2C, and figure supplement 2A, the 'no attacker' sporulation rate seems to be ~20%, while the rate is ~40% with Pchl strains lacking T6SS, suggesting that an additional factor may be playing a role.

      This must be a misunderstanding of the message of this manuscript. The conceptual fundament of this study was settled in our previous manuscript (Molina-Santiago et al., 2019). We demonstrated that B. subtilis sporulated in the presence of P. chlororaphis. Interestingly, the overgrowth of P. chlororaphis over B. subtilis colony did not eliminate cells of B. subtilis, given that most of them were sporulated. The data we obtained strongly suggested that a functional T6SS was involved in the cellular response of Bacillus in the close cell to cell contact. In this new manuscript, we have explored this idea, and found that indeed, the T6SS of P. chlororaphis mobilized at least one effector, Tse1, which is able to trigger sporulation in Bacillus. Thus we did not conclude, and neither have done in this new study, that T6SS is the only factor expressed by P. chlororaphis responsible for sporulation activation in Bacillus. We have accordingly rephrased some sentences of the manuscript to clarify the proposed implication of T6SS in B. subtilis sporulation.

      In addition, as mentioned above, we have included data of sporulation percentages in the absence of an attacker to better compare the induction of sporulation observed in the presence of the different Pchl strains and in the presence of Tse1.

      Reviewer #2 (Public Review):

      In a previous study, the authors showed that cell-cell contact with Pseudomonas chlororaphis induces sporulation in Bacillus subtilis. Here, the authors build on this finding and elucidate the mechanism behind this observation. They describe the enzymatic activity of a protein (Tse1) secreted by the type VI secretion system (T6SS) of P. chlororaphis (Pch), which partially degrades the peptidoglycan (PG) of targeted B. subtilis cells and triggers a signal cascade culminating in sporulation.

      Most of the key conclusions of this paper (Tse1 being secreted by the T6SS and inducing sporulation in targeted cells) are well supported by the data. One conclusion (sporulation response being an anti-T6SS "defense" strategy) is not well supported by the data and should be removed or rephrased.

      The authors elucidate the enzymatic activity of Tse1, a T6SS effector protein, in a genus (Pseudomonas) of great interest to microbiologists, and to researchers studying the T6SS specifically. They also carefully dissect the cellular response (signal cascade and sporulation) of an important model organism (B. subtilis; Bsub) specifically to exposure to Tse1. The results describing this cellular response contribute substantially to our understanding of how T6SS effector proteins interact with cells of Gram-positive species.

      My only major concerns regard the interpretation of these results as sporulation being an adaptive and/or specific response to attacks by the T6SS. I outline my reasoning below.

      • Interpretation of sporulation as a "defense" mechanism/strategy against the T6SS. In order for a phenotype X to be regarded as a "defense against Y" mechanism, it has to be shown that phenotype X (sporulation in response to Tse1) evolved - at least in part - for the purposes of increasing survival in the presence of Y (T6SS attacker). There are no experiments in this study comparing e.g. a sporulating Bsub with a non-sporulating Bsub, that would allow testing if sporulation increases survival. The experiments carefully describe the cellular response to Tse1, but no inference can be made with regards to this being adaptive for Bsub, or if it helps the cells survive against T6SS attacks, etc. A more parsimonious explanation would be that Tse1 happens to target the PG and causes envelope stress, triggering sporulation. So, it would be a general stress response that also happens to be triggered by T6SS. Now, some general (cell envelope) stress responses are known to be very effective at protecting against the T6SS. But in those instances, a beneficial effect for survival in the face of T6SS attacks has been shown in dedicated experiments. Purely observing a response to a T6SS effector, as this study does (very well), is not evidence that the response has evolved for the purpose of surviving T6SS attacks. Tucked away in the supplement (and briefly mentioned in the main text) is data on Bsub and Bacillus cereus, showing that i) cell densities of the sporulating Bsub and a sporulating B. cereus strain are not affected by an active T6SS, and ii) cell densities of an asporogenic B. cereus are slightly reduced by an active T6SS. However, the effect sizes of density reduction by the T6SS in the asporogenic B. cereus are minute (20x10^6 vs. ~50x10^6). In typical killing assays against e.g. gram-negative strains, a typical effect size for T6SS killing would be a several order of magnitude reduction in survival of the target strain when exposed to a T6SS attacker. Based on this dataset alone (Figure Suppl. 8), I would say that all three Bacillus strains are not experiencing any "fitness-relevant" killing by the T6SS, which is in line with the T6SS often being useless against gram-positives when it comes to killing. Hence, no claims about fitness benefits of sporulation in response to a T6SS attack, or this being a "defense mechanism/strategy" should be made in the manuscript.

      Thanks for this interesting introductory and specific comments. We agree with the reviewer and have rephrased some sentences of the manuscript. Sporulation is not an adaptive or specific response of Bacillus to T6SS, indeed and as stated by reviewer 2, sporulation is a general stress response. It might happen that the way the manuscript was written, at some points, gave the wrong impression. In consequence we have rephrased some sentences. Nevertheless, in Figure supplement 8 (in the new version of the manuscript is Figure 6-figure supplement 3) we made a mistake during generation of the Figure. We have again done this experiment and we have generated a new and corrected chart that shows three orders of magnitude reduction in survival of the asporogenic B. cereus strain in competition with Pchl mutant strains compared to Pchl WT strain. These new findings show that the absence of sporulation ability leads to a severe reduction in survival of Bacillus cereus DSM 2302 population in competition with Pchl with an active T6SS compared to the survival in competition with Pchl hcp mutant. In this figure, it is also shown that Bacillus population also decreased in competition with tse1 mutant, demonstrating that Tse1 is responsible for killing Bacillus. However, there is a statistical difference in the survival of Bacillus competing with hcp or tse1 mutants. The increased survival of Bacillus in the interaction with tse1 strain compared to Bacillus-hcp competition, is suggestive of the ability of this strain to deliver additional T6SS-dependent toxins. This observation is in accordance to the data presented in Fig. 2B, which indicated that tse1 mutant has an active T6SS able to kill E. coli.

      • Data supporting baseline "no competitor" sporulation rates being no different from those triggered by T6SS mutants is not convincing. For the data shown in Fig. 2A, a key comparison here would be to show baseline Bsub sporulation rates in absence of a competitor. This measurement is shown in Fig supplement 2A, and the value shown there (roughly 22% on average) appears to be much lower than the average T6SS mutant shown in Fig. 2A. The main text states that sporulation rates induced rate by the different T6SS mutants are "statistically" similar to the no-competitor baseline (L206/207). I am not convinced by this, since i) overall sporulation rates (incl of WT Pch) appear to have been lower in the experiment shown in supplement 2A, so a direct comparison between the no-competitor baseline and the data shown in Fig. 2A is not possible; and ii) hcp and tse1 mutants were tested in different experiments throughout the study, and sporulation rates appear to consistently hover around 30-40%, which is higher than the roughly 22% for "no competitor" depicted in Supplement Fig2A. I am focussing on this, because for the interpretation of the results, and the main narrative of the paper, knowing if "simply interacting with a T6SS-negative P. chlororaphis" induces some sporulation would make a big difference. One sentence in the discussion adds to my confusion about this: L464/465, "... a strain lacking paar (Δpaar) had an active T6SS that triggered sporulation comparably to Δhcp, ΔtssA, and Δtse1 strains", suggesting that the authors' claims that even strains lacking active T6SS trigger increased sporulation (which I would agree with, based on the data).

      We understand the reviewer's comment that a direct comparison between the two figures is not correct due to fluctuations of the baseline sporulation rates between experiments. To solve this issue, we have added the baseline "no competitor" sporulation percentages in the experiments represented in Figure 2B in the new version of the manuscript.

      Related with the sporulation provoked by a T6SS-negative P. chlororaphis, the reviewer is right. Bacillus sporulation occurs due to many external factors (abiotic and biotic stresses) so the presence of P. chlororaphis in the competition already has an effect on the sporulation percentage of B. subtilis. Accordingly, we have removed the statement on the sporulation rates induced by the different T6SS mutants are "statistically" similar to the no-competitor. However, our previous data (Molina-Santiago, Nat Comm 2019) and current findings convincedly demonstrate the relevance of the T6SS and, specifically the Tse1 toxin, in the induction of sporulation at least in the close cell to cell contact.

      • Claim regarding "bacteriolytic activity" when tse1 is heterologously expressed in E. coli. The data supporting this claim (Fig2-supplement 2C) only shows a lower net population growth rate after induction of tse1 (truncated vs. non-truncated) expression. This could be caused by: slower growth (but no death), equal growth (with some death), or a combination of the two. The claim of "bacteriolytic" activity in E. coli is therefore not supported by this dataset.

      We agree with the reviewer and we have decided to remove this figure and the experiment of “bacteriolytic activity” given that it does not contribute conceptually to the message of the manuscript.

      I cannot comment in more detail on the validity of the biochemistry/enzymatic activity assays as these are not my area of expertise.

      Reviewer #3 (Public Review):

      The authors identify tse1, a gene located in the type 6 secretion system (T6SS) locus of the bacterium Pseudomonas chlororaphis, as necessary and sufficient for induction of Bacillus subtilis sporulation. The authors demonstrate that Tse1 is a hydrolase that targets peptidoglycan in the bacterial cell wall, triggering activation of the regulatory sigma factor sigma-w. The sporulation-inducing effects of sigma-w are dependent on the downstream presence of the sensor histidine kinases KinA and KinB. Overall, this is a well-structured paper that uses a combination of methods including bacterial genetics, HPCL, microscopy, and immunohistochemistry to elucidate the mechanism of action of Tse1 against B. subtilis peptidoglycan. There are some concerns regarding a few experimental controls that were not included/discussed and (in a few figures) the visual representation of the data could be improved. The structure of the manuscript and experiments is such that key questions are addressed in a logical flow that demonstrates the mechanisms described by the authors.

      To begin, we have concerns regarding the sporulation assays and their results. The data should be presented as "Percent sporulation" or "Sporulation (%)" - not as a "sporulation rate": there is no kinetic element to any of these measurements, so no rate is being measured (be careful of this in the text as well, for instance near lines 204). More importantly, there is no data provided to indicate that changes in percent spores are not instead just the death of non-sporulated cells. For example, imagine that within a population of B. subtilis cells, 85% of the cells are vegetative and 15% are spores. If, upon exposure to tse1, a large proportion of the vegetative cells are killed (say, 80% of them), this could lead to an apparent increase in sporulation: from 15% for the untreated population to ~50% of the treated, but the difference would be entirely due to a change in the vegetative population, not due to a change in sporulation. The authors need to clearly describe how they conducted their sporulation assays (currently there is no information about this in the methods) as well as provide the raw data of the counts of vegetative cells for their assays to eliminate this concern.

      Thanks for the suggestion. We have changed all the titles and data presented as “sporulation rate” by “sporulation (%)” or “sporulation percentage”. As also suggested by reviewer 2, we have included the raw data of the CFUs counts of total population and sporulated cells to show that there is no substantial change in the rate of death. Also, we have added a section in Material and Methods to specify how sporulation assays have been done. Quote text:

      “Sporulation assays

      Spots of bacteria were resuspended in 1 mL sterile distilled water. Then, serial dilutions were made and cultured in LB solid media for vegetative cells CFU counts. The same serial dilutions were further heated at 80ºC for 10 minutes to kill vegetative cells and immediately cultured again in LB solid media. Plates were grown overnight at 28 ºC and the resulting colonies were counted to calculate the percentage of Bsub sporulation (%). A list of raw CFUs (total and spore population) from all figures with sporulation percentage is shown in Supplementary file 3.”

      A related concern is regarding the analysis of the kinases and the effects of their deletions on the impact of Tse1. Previous literature shows that the basal levels of sporulation in a B. subtilis kinA or a kinB mutant are severely defective relative to a wild-type strain; these mutants sporulate poorly on their own. Therefore, the data presented on Lines 394+ and the associated Supplemental Figure regarding the sporulation defects of these two mutants are not compelling for showing that these kinases are required for this effector to act. It is likely that simply missing these kinases would severely impact the ability of these strains to sporulate at all, irrespective of the presence of Tse1, and no discussion of this confounding concern is discussed.

      Previous literature shows that mutation of kinases affects sporulation of B. subtilis. Histidine kinases KinA and KinB are the first responsible for initiation of sporulation cascade upon phosphorylation of spo0F. However, as shown in Figure 6-figure supplement 1A, single mutants in these kinases (ΔkinA, ΔkinB) still sporulate given that the phosphorylation cascade is controlled by numerous intermediaries and other histidine kinases that form a multicomponent phosphorelay (KinA-E). In this context, the sporulation of B. subtilis can be also triggered by KinC or KinD in the absence of KinA or KinB, as KinC/KinD can act directly on the master regulator of sporulation Spo0A (Burbulys et al., 1991; Wang et al., 2017).

      In addition, as suggested by reviewer 1, we have added to Figure 6-figure supplement 1A of the new version of the manuscript, the sporulation percentage 'no competitor' control of each kinase mutant and B. subtilis WT. The results show that, as commented by the reviewer and also supported by literature, these mutants sporulate poorly on their own in the absence of an attacker (none). However, as shown in the figure, all kinase mutants increase the sporulation percentage in the presence of a competitor.

      Another concern is regarding the statistical tests used in Figure 2. For statistical tests in A, B, and D, it should be stated whether a post-test was used to correct for multiple comparisons, and, if so, which post-test was used. to provide a stronger control comparison. For C, we suggest the inclusion of a mock control in addition to the two conditions already included (i.e., an extraction from an E. coli strain expressing the empty vector)

      We have clarified the statistical tests used in Figure 2. Briefly, we have used one-way ANOVA followed by the Dunnett test in Figure 2A, B and D for the statistical analysis of the sporulation percentage of Bsub in competition with Pchl as control group. In relation to Figure 2C, it is not possible to add a mock control with a strain carrying the empty vector, because this is a suicide plasmid (pDEST17) unable to replicate in E. coli without chromosome integration.

      An additional concern regarding controls is that there is an absence of loading controls for the immunoblot assays. In Figure 5D and all immunoblot assays, there is no mention of a loading control, which is a critical control that should be included.

      In the previous version of the manuscript, we already included a loading control for Figure 5D in Figure supplement 7B, both for cell and for supernatant fractions. In the new version of the manuscript, the loading control of Figure 6E (in the previous version of the manuscript Figure 5D) is shown in Figure 6-figure supplement 2C. We have also included the original unedited gels and blot (Figure 6-figure supplement 2- source data 1 and Figure 6-figure supplement 2-source data 2).

      Some of the visualizations could be improved to help the reader understand and appropriately interpret the data presented. For instance, in Figures 3 and 4 the scale bars are different across each of the Figure's imaging panels. These should be scaled consistently for better comparison. Additionally, the red false colorization makes the printed images difficult to see. Black-and-white would be easier to see and would not subtract from the images.

      The reviewer is right. Scales bar equal 2 in Figure 3A, but the length of the bars was not the same. We have edited the images to have the same magnifications for better comparison.

      In relation to Figure 4, we have changed the magnifications and now all the figures have the same scale bars and magnifications. In addition, we have added more images of broader fields in Figure 4-figure supplement 1 which were used to measure the percentage of permeabilized cells and to obtain the fluorescence intensity measures shown in Figure 4.

      An additional weakness of the paper is that the RNA-seq data is not fully investigated, and there is an absence of methods included regarding the RNA-seq differential abundance analysis (it is mentioned on L379-380 but no information is provided in the methods). As stated by the authors, 58% of differentially regulated genes belonged to the sw regulon, but the other 42% of genes are not discussed, and will hopefully be a target of future investigations.

      The methods section has been modified for a better explanation of the RNA-seq differential abundance analysis. Quote text: “The raw reads were pre-processed with SeqTrimNext (Falgueras et al., 2010) using the specific NGS technology configuration parameters. This pre-processing removes low-quality, ambiguous and low-complexity stretches, linkers, adapters, vector fragments, and contaminated sequences while keeping the longest informative parts of the reads. SeqTrimNext also discarded sequences below 25 bp. Subsequently, clean reads were aligned and annotated using the Bsub reference genome with Bowtie2 (Langmead and Salzberg, 2012) in BAM files, which were then sorted and indexed using SAMtools v1.484(Li et al., 2009). Uniquely localized reads were used to calculate the read number value for each gene via Sam2counts (https://github.com/vsbuffalo/sam2counts). Differentially expressed genes (DEGs) were analyzed via DEgenes Hunter, which provides a combined p value calculated (based on Fisher’s method) using the nominal p values provided by edgeR (Robinson et al., 2010) and DEseq2. This combined p value was adjusted using the Benjamini-Hochberg (BH) procedure (false discovery rate approach) and used to rank all the obtained DEGs. For each gene, combined p value < 0.05 and log2-fold change > 1 or < −1 were considered as the significance threshold”

      Regarding the RNA-seq analysis, we are aware of the amount of information that can be extracted. Previous to filtering the information shown in the manuscript, we have done bioinformatic analysis trying to find a connection with the cellular response, that is increase of sporulation. Besides this, we had some observations but with no direct connection to sporulation, which would be interesting to pursue in future studies, but not for the clarity of this story (Figure 23 below). In any case, we are including the whole picture of the transcriptomics changes occurring in Bsub after treatment with Tse1. KEGG pathway analyses of genes differentially expressed showed induction of flagellar assembly and aminobenzoate degradation, nitrogen and amino acid metabolisms. Interestingly, fatty acid degradation and CAMP resistance pathways were also induced, probably related to changes suffered in the cell wall after the action of Tse1 toxin. On the other hand, synthesis and degradation of ketone bodies pathway was mostly repressed.

      Figure 2. KEGG pathway analyses of genes differentially expressed occurring in Bsub after treatment with Tse1.

      Another methodological concern in this paper is the limited details provided for the calculation of the permeabilization rate (Figure 4, L359, L662-664). It is not clear how, or if, cell density was controlled for in these experiments.

      We agree with the reviewer and we have explained with more detail how the permeabilization rate was calculated. Quote text: “N=3 for Bsub treated with Tse1 and N=3 for untreated Bsub. N refers to the number of CLSM fields analyzed to calculate the number of permeabilized cells of the total of cells in the field”

      Finally, one weakness of the paper is the broad conclusions that they draw. The authors claim that the mechanism of sporulation activation is conserved across Bacilli when the authors only test one B. subtilis and one B. cereus strain. They further argue (lines 469+) that Tse1 requires a PAAR repeat for its targeting, but do not provide direct evidence for this possibility.

      We have reduced the tone of the final conclusion in order to specify that the activation of sporulation is a mechanism that can be found in different Bacillus species such as Bsub and Bcer. Related with the second appreciation, we have included a further explanation for this argument. Quote text: “As shown in Figure 2B, a paar mutant has an active T6SS able to kill E. coli. However, as shown in Figure 2A, we noticed that a paar mutant (which encodes tse1) is not able to trigger B. subtilis sporulation to a similar level than Pchl WT strain. Given that paar deletion apparently abolishes Tse1 secretion, we suggest that Tse1 is a PAAR-associated effector that requires a PAAR repeat domain protein to be targeted for secretion, thereby increasing Bacillus sporulation during contact with Pseudomonas cells (Cianfanelli et al., 2016; Hachani et al., 2014; Whitney et al., 2014)”.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Elkind et al. use a deep learning segmentation algorithm trained on detecting putative cell nuclei in mouse brains to count cells in the Allen Mouse Brain Connectivity Atlas. The Allen Mouse Brain Connectivity Atlas is a dataset compromising hundreds of mice brains. The authors use this increased statistical power for detecting differences in volume, cell count, and cell density between strains (C57BL/6J and FVB.CD1) as well as sex differences.

      Both volume, cell count, and cell density are regularly used in neuroanatomy to normalize or benchmark results so having a large available dataset for others to compare their data would be a useful resource. The trained segmentation algorithm might also find utility in assays where investigators for one reason or another can't dedicate an entire labeled channel to count cell nuclei.

      Nevertheless, because of technical reasons, I find the current work problematic.

      We thank the Reviewer for acknowledging potential usefulness of our work, and the insightful, helpful comments. We believe this consideration has made our revised manuscript much stronger compared to the initial submission. We hope our revised version will also clear the Reviewer’s remaining doubts.

      Major:

      The authors make use of the "red" channel from the Allen Mouse Brain Connectivity Project (AMBCP). The AMBCP was acquired using two-photon tomography with the TissueCyte 1000 system (http://help.brain-map.org/download/attachments/2818171/Connectivity_Overview.pdf?version=2&modificationDate=1489022310670&api=v2). The sample is illuminated at 925 nm wavelength and the channel the authors describe as autofluorescence is collected through a 593/40 nm bandpass filter. The authors go on to describe their rationale for using this channel for quantifying cell nuclei:

      "We noticed that the red (background) channel of STPT images, taken for the purpose of atlas alignment, typically features dark, round-like objects resembling cell nuclei. We had observed this phenomenon in our own imaging of mouse brains but found little more than anecdotal mentions of it in the literature8,9,10,11".

      The authors here cite a Scientific Reports paper from 2021 with 11 citations, a Journal of Clinical Pathology paper from 2005 with 87 citations, and lastly a paper in Laboratory Investigation from 2016 with 41 citations. The authors completely fail to cite the work from Watt Webb's group (co-inventor of 2p microscopy) in PNAS from 2003 that entirely described the phenomena of native fluorescence by multiphoton- excitation (https://www.pnas.org/doi/10.1073/pnas.0832308100 ), citations so far: 1959 citations. This is either indicative of poor scholarship or an attempt to describe something as novel. Either way, the native fluorescence and second harmonic generation from multiphoton illumination are perfectly characterized by Webb and colleagues and they clearly show the differential effect on nucleosides, retinol, indoleamines, and collagen. This is also where the authors should have paid more attention to discrepancies in their own data when correlated to well-established cell nuclei markers (Murakami et al). The authors will note "black large spots" in the data at specific anatomical regions and structures, like the fornix and stria medullaris: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=18833&z=5

      which is not reproduced in for example the Allen Reference Atlas H&E staining: http://atlas.brain-map.org/atlas?atlas=1&plate=100960284#atlas=1&plate=100960284&resolution=4.19&x=5507.4000244140625&y=5903.39990234375&zoom=-2

      In connection here notice the poor signal in the 2p "autofluorescence" within the paraventricular nucleus: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=17833&z=6

      and then compare it to the H&E staining: http://atlas.brain-map.org/atlas?atlas=1&plate=100960280#atlas=1&plate=100960276&resolution=1.50&x=5342.476283482143&y=5368.023856026786&zoom=0

      These multiphoton-specific signals are especially pronounced in the pons and medulla which makes quantification especially dubious, which is even apparent simply from looking at Figure 1c in the manuscript.

      We thank the Reviewer for the comments and sincerely apologize for missing the seminal work of Webb’s group. We included the former references for their specific mention or illustration of non-autofluorescent nuclei. We indeed entirely missed to address the underlying chemistry that Webb’s group beautifully characterized. We have added the following sentence in the Results section “Autofluorescence of STPT images displays cell nuclei” (red font for new sentence; Reference #15 corresponds to Zipfel et al.):

      “We noticed that the red (background) channel of STPT images, taken for the purpose of atlas alignment, typically features dark, round-like objects resembling cell nuclei. This phenomenon was described in previous literature11,12,13,14. In particular, Zipfel et al. characterized the use of multiphoton-excited native florescence and second harmonic generation for the purpose of staining-free tissue imaging15.”

      And mentioned the dependency of our method on the presence of intrinsically fluorescent molecules in the Discussion:

      “The study has several limitations. First, the model is sensitive to the contrast between dark nuclei and autofluorescent surroundings, which can be limited by image quality and tissue composition. In particular, the staining-free approach depends on the presence of intrinsic molecular indicators such as NADH, retinol or collagen15, which may vary between cell or tissue components, even within the brain.”

      We understand that more generally, the Reviewer’s major concern above was regarding the technical validity of our approach; that the segmentation based on small objects lacking autofluorescence, as evident in the STPT dataset, in fact corresponds to cells/nuclei.

      In our initial Supplemental Figure 1 (in current version Figure 1—figure supplement 1) we provide technical validation of the method, by showing nuclear staining, and autofluorescence side-by-side, using epifluorescence microscopy. In our revision we now report appropriate statistical measures for this analysis (true positives, false positives, false negatives).

      In addition, we performed the following two sets of validations –

      (i) Technical validation of our staining-free quantification approach, by nuclear staining. We performed nuclear staining (Hoechst 33342) followed by STPT imaging of 9 female brains and trained a new deep neural network (DNN) to segment the resulting images (STPT was performed by TissueVision). Unfortunately, in STPT it is not technically possible to analyze nuclear staining and autofluorescence in the very same tissue. Therefore, we compared per-region density, cell count and volume of the nuclei-stained validation brains to our original DNN-based analysis of AMBCA brains. We show a correlation coefficient >0.99 for per-region cell count in AMBCA autofluorescence and our nuclear staining (and a similar correlation coefficient for volume). However, the number of cells in nuclear staining over the whole brain is 56% larger than in autofluorescence. Although we currently have no technically feasible way to prove this, one likely explanation for this discrepancy is the nature of the two signals the imaging detects; as positive (Hoechst fluorophore) or autofluorescence. Further, discrepancies between the two methods were notably higher in glial-rich tissues (e.g., CTX L1, midbrain, brainstem) – leading to the speculation that low-autofluorescent object-counts may be biased to detect neurons, rather than glia.

      (ii) Independent validation of the biological findings – discussed further below. Regarding the specific concern of “black large spots” in the fornix and stria medullaris – we would like to emphasize that our DNN does not identify and segment dark regions like ventricles and tracks. We provide in the Author Response Image 1 three examples featuring “black large spots” of different shapes and size, with examples of the segmentation results as shown in Figures 1 and 2 of the manuscript. Note that colored circles, that appear as dots depending on magnification, are the objects that were detected and segmented by the DNN. In the Figure we demonstrate that (1) fiber tracts (incl. fornix, stria medullaris) are not segmented; (2) striatal patches (that are smaller still than the fiber tracts in question) are not segmented; and (3) putative blood vessels, appearing as elongated, black structures, are ignored by our DNN.

      Author Response Image 1. How does the DNN deal with large black spots? Examples for fiber tracts, striatal patches, and blood vessels; adapted from Figures 1 and 2 in the manuscript. Note that dots/outlines represent segmented putative “nuclei” as detected by the model, colored by assigned region according to Allen Mouse Brain hierarchy. Example (1): fiber tracts (incl. fornix, stria medullaris) are not segmented. Example (2): Striasomes (patches in the striatum, that are smaller still than the fiber tracts in question) are not segmented, and the much smaller objects that are detected as putative nuclei are indicated by arrows. Example (3) putative blood vessels, appearing as elongated, black structures, are ignored by our DNN. Examples of the segmentation images were adapted from the manuscript’s Figure 1 to correspond to the STPT image featuring fiber tracts (and Striasomes/patches) was pointed out by the Reviewer.

      Retrieved from: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=18833&z=5.

      Regarding the claim of problematic counting in brain stem regions, we agree, and had addressed this limitation in the manuscript’s Discussion (see below). We believe that our counting is valuable even if in some regions there is a significant systematic error: Most of the analyses in this study compare brain regions across individuals and thus systematic error is less impactful. In the revision, we nevertheless took care to validate and quantify the size of this effect. Briefly, we compared counting based on nuclear staining (Hoechst) from 9 STPT imaged brains, to our quantifications of non-autofluorescent objects. As expected, the ratio between these counts depends on the brain region, and accuracy is better in regions with high brightness, which are not on the border of the section (Figure 2—figure supplement 2). As for pons and medulla, the densities in our Hoechst quantifications are 43% and 60% higher than in our AMBCA analysis, respectively, yet rank order is kept in both.

      We have revised the relevant sentences in the Discussion:

      Original sentences: The study has several limitations. … In the hindbrain (pons, medulla), contrast was exceedingly weak, and we expect our quantifications in this region to strongly underestimate real cell densities, to an extent we cannot quantify.

      Revised sentences: The study has several limitations. … In the hindbrain (pons, medulla), contrast was exceedingly weak, and we expect our quantifications in this region to be 66% of the value estimated by nuclear staining (Figure 2—figure supplement 2).

      The authors here use the correlation on log-log coordinates between their data and that of Murakami et al to argue that the method has validity. However, the variance explained here is R^2 = 0.74 which is very poor given the log-log coordinates. A more valid metric would use linear coordinates and computing the ICC and interpret it according to established guidelines (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913118/).

      As mentioned by the Reviewer, Figure 2D compares Murakami et al. cell counts and ours, across all brain regions. The value “r=0.869” represents the correlation coefficient between the two vectors in log scale and not the R^2. We also now display the correlation coefficient for the linear scale, in which case p=0.98. As suggested by the Reviewer, we added ICC values between the two vectors in linear scale. Using 6 different forms (ICC – 1-1;1-k;C-1;C-k;A-1;A-k), the ICC values were 0.98-0.99, thus corresponding to an excellent agreement (ICC values are mentioned in legend of Figure 2).

      Author Response Image 2 displays the revised Figure 2D (left), and the log value of the ratio between the AMBCA-based cell count and the Murakami-based value (right), as a function of region volume. The mean value across regions is zero, corresponding to similar cell counts in both methods. Indeed, there exist outlier regions, that may be attributed to either registration errors, different experimental protocols or may stem from the fact that the Murakami values are based on 3 brains, compared to hundreds of AMBCA brains.

      Author Response Image 2. Correlation with cell counts in Murakami et al. Left, revised Figure 2D; Right, ratio between AMBCA-based cell counts and Murakami et al. counts, as a function of region volume

      In addition to the above concern, the authors argue that the large sample size of the AMBCP is what would enable them to find statistically significant small effect sizes that might have gone undetected in the literature. However, this argument falls flat once we examine some of the main findings the authors report. Although the authors do not directly report measures of dispersion we can estimate it from the figures and then arrive at the sample size needed to find the reported effect size. For example, the effect that describes ORBvl2/3 volume is larger in female mice compared to males would only require n=13 mice at the desired power of 0.8. Likewise, the sample size needed to detect the increased BST volume in male mice looks to be roughly n=16 mice at the desired power of 0.8. Both of these estimates are well within what is a reasonable sample size to expect in an ordinary study. This begs the question: why did the authors simply not verify some of their main findings in an independent sample obtained through traditional ways to quantify volume and cell density since it is well within reach? Such validation would strengthen the arguments of the paper.

      We thank the reviewer for this comment and apologize. In the revised version we do report dispersion.

      We would like to emphasize that due to our restricted time and resources, we decided to focus our experimental validation on the technical comparison of nuclear staining vs. autofluorescence-based segmentation, outlined above.

      We then verified the biological findings from the initial cohort using C57BL/6J volume data from an additional 663 males vs.166 females on AMBCA. This independent cohort showed similar sexual dimorphism in the volume of MEA, BST and ORBvl2/3, as depicted in the following figure (panels A-D and also as new Figure 4—figure supplement 1).

      We fully acknowledge the interesting issue raised on sample sizes required to detect our reported effect sizes. Therefore, we here also present the average p-value for sexual dimorphism in volumes of MEA, BST and ORBvl2/3, as a function of the sample size (panel E in Figure 4—figure supplement 1 of the revised manuscript). The Reviewer will note that the regions with largest effect size (MEA, BST) can be detected within more ordinary sample sizes, and indeed, MEA and BST dimorphism is evident in the literature. ORB dimorphism required much greater sample size; and our analysis (Figure 4) systematically detected many more dimorphic regions, in volume, density and count.

      Reviewer #2 (Public Review):

      This report describes a large-scale analysis of cell counts in mouse brains. The authors found that the Allen Mouse Connectivity project has a rich dataset for cell counting that is yet to be analyzed, and they developed methods to quantify cells in different nuclei. They go on to compare males vs females and two different strains. From this analysis, they found specific differences between male versus female brains, left versus right hemispheres, and C57BL/6 versus FVB.CD1 mice, especially with regard to cell counts and density.

      Overall, the methodology is sound and the quality of the data seems high. In fact, this study uses >100 brains for the statistics, and this is one of the major strengths of this study. For researchers who are interested in interrogating the differences at the macroscopic level in brain structures, this study will be a great resource. For example, the manuscript contains an interesting finding that for most brain areas, females have larger volumes but fewer cell numbers.

      We thank the Reviewer for these comments. We would like to mention that the revised version of the manuscript does not include a statement regarding BL6 female volume. We found a batch effect in the AMBCA experiments, mostly affecting the volume in their first batch (Figure 2—figure supplement 1B). That batch included mostly males, and had, for some reason, lower volume compared to all later experiments, which caused the volume differences. We emphasize that (1) the total number of cells did not show any batch effect (Figure 2—figure supplement 1C); (2) We normalized the volume and repeated the analysis. Aside the finding that females did not in fact have larger volumes, other main findings remained unchanged.

      Reviewer #3 (Public Review):

      Elkind et al. have devised a strategy to detect cells in whole brain samples of the large, publicly accessible Allen Mouse Brain Connectivity database. They put together an analysis pipeline to quantify cell numbers and -density as well as volumes for all annotated brain areas in these samples. This allowed them to make several important discoveries such as (1) strain-, sex- and hemisphere-specific differences in cell densities, (2) a large interindividual variability in cell numbers, and (3) an absence of linear scaling of cell count with volume, among others. The key strength of this work lies in its comprehensive analysis, the large sample size that the authors have drawn from (making their conclusions particularly robust), and the fact that they have made their analysis tools accessible. A weakness of the current manuscript is the dense layout and overplotting of several of the figures, and the lack of necessary information to understand them more easily. Another, conceptual weakness of using the autofluorescence channel for cell detection is that the identity (neuronal vs non-neuronal) of the underlying cells remains unresolved. Overall, however, I believe that this study has the potential to serve as a valuable reference point, and I would expect this work to have a lasting impact on quantitative studies of mouse brain cytoarchitecture.

      We thank the Reviewer for these valuable comments. We have tried to minimize overplotting of figures and hopefully added all necessary information. For example, the revised manuscript presents more pared-down figures, with data labels omitted if they crowded the graphic. Instead, we provide the full data in Supplemental tables, and our online accessible GUI. We hope the reader will feel encouraged to both zoom the presented data, more deeply explore additional tables, and our online tool.

      Regarding the question of cell types, we were unfortunately not able to provide a definitive answer, but our validation experiments provided some potential clues. For example, nuclear staining (Hoechst) uniformly detected 65% more cells than AMBCA autofluorescence quantification. And, in neuron-rich regions, the correspondence between nuclear staining and AMBCA autofluorescence was notably better than in glia-rich regions (e.g., CTX L1, midbrain, medulla). These discrepancies between the techniques may therefore point to an underlying difference in cell types composition – such that counting low-autofluorescent nuclei is biased to neurons.

      In addition, however, the methods differ in their native physical properties; in that one detects presence of a fluorescent signal (e.g., the nuclear stain is detected beyond its focal plane), compared to the detection of the absence of a signal (which, in turn, is dependent on the presence of surrounding intrinsic fluorescent molecules). It is technically non-trivial to assess the extent to which these factors apply. We have added a clarification along these lines in the Discussion (below). We would further like to emphasize the nature of our study as a comparative, systematic analysis within this interesting cohort, rather than providing definitive cell counts – that we found to be greatly variable across the population.

      “We further attempted to estimate the region-specific accuracy of our cell counting by comparing autofluorescence STPT with brain-wide imaging of nuclear-stained STPT. However, this comparison is technically nontrivial because of the native physical properties of direct staining vs. autofluorescence. For example, stained nuclei located off the focal plane may appear in the image, yet remain undetected by autofluorescence. In addition, tissue composition (e.g., cell types, extracellular matrix) may affect the imaged region. Indeed, in regions rich with non-neuronal cells the error of autofluorescent-based counting was larger compared to nuclear staining. Hence, one may speculate that autofluorescent-based detection is biased for neurons”

    1. Author Response:

      Reviewer #1 (Public Review):

      Chakrabarti et al study inner hair cell synapses using electron tomography of tissue rapidly frozen after optogenetic stimulation. Surprisingly, they find a nearly complete absence of docked vesicles at rest and after stimulation, but upon stimulation vesicles rapidly associate with the ribbon. Interestingly, no changes in vesicle size were found along or near the ribbon. This would have indicated a process of compound fusion prior to plasma membrane fusion, as proposed for retinal bipolar cell ribbons. This lack of compound fusion is used to argue against MVR at the IHC synapse. However, that is only one form of MVR. Another form, coordinated and rapid fusion of multiple docked vesicles at the bottom of the ribbon, is not ruled out. Therefore, I agree that the data set provides good evidence for rapid replenishment of the ribbon-associated vesicles, but I do not find the evidence against MVR convincing. The work provides fundamental insight into the mechanisms of sensory synapses.

      We thank the reviewer for the appreciation of our work and the constructive comments. As pointed out below, we now included this discussion (from line 679 onwards).

      We wrote:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Reviewer #2 (Public Review):

      Chakrabarti et al. aimed to investigate exocytosis from ribbon synapses of cochlear inner hair cells with high-resolution electron microscopy with tomography. Current methods to capture the ultrastructure of the dynamics of synaptic vesicle release in IHCs rely on the application of potassium for stimulation, which constrains temporal resolution to minutes rather than the millisecond resolution required to analyse synaptic transmission. Here the authors implemented a high-pressure freezing method relying on optogenetics for stimulation (Opto-HPF), granting them both high spatial and temporal resolutions. They provide an extremely well-detailed and rigorously controlled description of the method, falling in line with previously use of such "Opto-HPF" studies. They successfully applied Opto-HPF to IHCs and had several findings at this highly specialised ribbon synapse. They observed a stimulation-dependent accumulation of docked synaptic vesicles at IHC active-zones, and a stimulation-dependent reduction in the distance of non-docked vesicles to the active zone membrane; while the total number of ribbon-associated vesicles remained unchanged. Finally, they did not observe increases in diameter of synaptic vesicles proximal to the active zone, or other potential correlates to compound fusion - a potential mode of multivesicular release. The conclusions of the paper are mostly well supported by data, but some aspects of their findings and pitfalls of the methods should be better discussed.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      Strengths:

      While now a few different groups have used "Opto-HPF" methods (also referred to as "Flash and Freeze) in different ways and synapses, the current study implemented the method with rigorous controls in a novel way to specifically apply to cochlear IHCs - a different sample preparation than neuronal cultures, brain slices or C. elegans, the sample preparations used so far. The analysis of exocytosis dynamics of IHCs with electron microscopy with stimulation has been limited to being done with the application of potassium, which is not physiological. While much has been learned from these methods, they lacked time resolution. With Opto-HPF the authors were successfully able to investigate synaptic transmission with millisecond precision, with electron tomography analysis of active zones. I have no overall questions regarding the methodology as they were very thoroughly described. The authors also employed electrophysiology with optogenetics to characterise the optical simulation parameters and provided a well described analysis of the results with different pulse durations and irradiance - which is crucial for Opto-HPF.

      Thank you very much.

      Further, the authors did a superb job in providing several tables with data and information across all mouse lines used, experimental conditions, and statistical tests, including source code for the diverse analysis performed. The figures are overall clear and the manuscript was well written. Such a clear representation of data makes it easier to review the manuscript.

      Thank you very much.

      Weaknesses:

      There are two main points that I think need to be better discussed by the authors.

      The first refers to the pitfalls of using optogenetics to analyse synaptic transmission. While ChR2 provides better time resolution than potassium application, one cannot discard the possibility that calcium influx through ChR2 alters neurotransmitter release. This important limitation of the technique should be properly acknowledged by the authors and the consequences discussed, specifically in the context in which they applied it: a single sustained pulse of light of ~20ms (ShortStim) and of ~50ms (LongStim). While longer, sustained stimulation is characteristic for IHCs, these are quite long pulses as far as optogenetics and potential consequences to intrinsic or synaptic properties.

      We thank the reviewer for pointing this out. We would like to mention that upon 15 min high potassium depolarization, the number of docked SVs only slightly increased as shown in Chakrabarti et al., 2018, EMBO rep and Kroll et al. 2020 JCS, but it was not statistically significant. In the current study, we report a similar phenomenon, but here light induced depolarization resulted in a more robust increase in the number of docked SVs.

      To compare the data from the previous studies with the current study, we included an additional table 3 (line 676) now in the discussion with all total counts (and average per AZ) of docked SVs.

      Furthermore, in response to the reviewers’ concern, we now discuss the Ca2+ permeability of ChR2 in addition to the above comparison to our previous studies that demonstrated very few docked SVs in the absence of K+ channel blockers and ChR2 expression in IHCs. We are not entirely certain, if the reviewer refers to potential dark currents of ChR2 (e.g. as an explanation for a depletion of docked vesicles under non-stimulated conditions) or to photocurrents, the influx of Ca2+ through ChR2 itself, and their contribution to Ca2+ concentration at the active zone.

      However, regardless this, we consider it unlikely that a potential contribution of Ca2+ influx via ChR2 evokes SV fusion at the hair cell active zone.

      First of all, we note that the Ca2+ affinity of IHC exocytosis is very low. As first shown in Beutner et al., 2001 and confirmed thereafter (e.g. Pangrsic et al., 2010), there is little if any IHC exocytosis for Ca2+ concentrations at the release sites below 10 µM. Two studies using CatCh (a ChR2 mutant with higher Ca2+ permeability than wildtype ChR2 (Kleinlogel et al., 2011; Mager et al., 2017) estimated a max intracellular Ca2+ increase below 10 µM, even at very negative potentials that promote Ca2+ influx along the electrochemical gradient or at high extracellular Ca2+ concentrations of 90 mM. In our experiments, IHCs were depolarized, instead, to values for which extrapolation of the data of Mager et al., 2017 indicate a submicromolar Ca2+ concentration. In addition, we and others have demonstrated powerful Ca2+ buffering and extrusion in hair cells (e.g. Tucker and Fettiplace, 1995; Issa and Hudspeth., 1996; Frank et al., 2009 Pangrsic et al., 2015). As a result, the hair cells efficiently clear even massive synaptic Ca2+ influx and establish a low bulk cytosolic Ca2+ concentration (Beutner and Moser, 2001; Frank et al., 2009). We reason that these clearance mechanisms efficiently counter any Ca2+ influx through ChR2. This will likely limit potential effects of ChR2 mediated Ca2+ influx on Ca2+ dependent replenishment of synaptic vesicles during ongoing stimulation.

      We have now added the following in the discussion (starting in line 620):

      “We note that ChR2, in addition to monovalent cations, also permeates Ca2+ ions and poses the question whether optogenetic stimulation of IHCs could trigger release due to direct Ca2+ influx via the ChR2. We do not consider such Ca2+ influx to trigger exocytosis of synaptic vesicles in IHCs. Optogenetic stimulation of HEK293 cells overexpressing ChR2 (wildtype version) only raises the intracellular Ca2+ concentration up to 90 nM even with an extracellular Ca2+ concentration of 90 mM (Kleinlogel et al., 2011). IHC exocytosis shows a low Ca2+ affinity (~70 µM, Beutner et al., 2001) and there is little if any IHC exocytosis for Ca2+ concentrations below 10 µM, which is far beyond what could be achieved even by the highly Ca2+ permeable ChR2 mutant (CatCh: Ca2+ translocating channelrhodopsin, Mager et al., 2017). In addition, we reason that the powerful Ca2+ buffering and extrusion by hair cells (e.g., Frank et al., 2009; Issa and Hudspeth, 1996; Pangršič et al., 2015; Tucker and Fettiplace, 1995) will efficiently counter Ca2+ influx through ChR2 and, thereby limit potential effects on Ca2+ dependent replenishment of synaptic vesicles during ongoing stimulation. “

      The second refers to the finding that the authors did not observe evidence of compound fusion (or homotypic fusion) in their data. This is an interesting finding in the context of multivesicular release in general, as well as specifically for IHCs. While the authors discussed the potential for "kiss-and-run" and/or "kiss-and-stay", it would be valuable if they could discuss their findings further in the context of the field for multivesicular release. For example, the evidence in support of the potential of multiple independent release events. Further, as far as such function-structure optical-quick-freezing methods, it is not unusual to not capture fusion events (so-called omega-shapes or vesicles with fusion pores); this is largely because these are very fast events (less than 10 ms), and not easily captured with optical stimulation.

      We agree with the reviewer that the discussion on MVR and UVR should be extended. We now added the following paragraph to the discussion from line 679 on:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Reviewer #3 (Public Review):

      Precise methods were developed to validate the expression of channelrhodopsin in inner hair cells of the Organ of Corti, to quantify the relationship between blue light irradiance and auditory nerve fiber depolarization, to control light stimulation within the chamber of a high-pressure freezing device, and to measure with good precision the delay between stimulation and freezing of the specimen. These methods represent a clear advance over previous experimental designs used to study this synaptic system and are an initial application of rapid high-pressure freezing with freeze substitution, followed by high-resolution electron tomography (ET), to sensory cells that operate via graded potentials.

      Short-duration stimuli were used to assess the redistribution of vesicles among pools at hair cell ribbon synapses. The number of vesicles linked to the synaptic ribbon did not change, but vesicles redistributed within the membrane-proximal pool to docked locations. No evidence was found for vesicle-to-vesicle fusion prior to vesicle fusion to the membrane, which is an important, ongoing question for this synapse type. The data for quantifying numbers of vesicles in membrane-tethered, non-tethered, and docked vesicle pools are compelling and important.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      These quantifications would benefit from additional presentation of raw images so that the reader can better assess their generality and variability across synaptic sites.

      The images shown for each of the two control and two experimental (stimulated) preparation classes should be more representative. Variation in synaptic cleft dimensions and numbers of ribbon-associated and membrane-proximal vesicles do not track the averaged data. Since the preparation has novel stimulus features, additional images (as the authors employed in previous publications) exhibiting tethered vesicles, non-tethered vesicles, docked vesicles, several sections through individual ribbons, and the segmentation of these structures, will provide greater confidence that the data reflect the images.

      Thank you very much for pointing this out. We now included more details in supplemental figures and in the text.

      Precisely, we added:

      • More details about the morphological sub-pools (analysis and images):

        -We now show a sequence of images with different tethering states of membrane proximal SVs together with examples for docked and non-tethered SVs as we did in Chakrabarti et al., 2018 for each condition (Fig. 6-figure supplement 2, line 438). Moreover, we included for each condition additional information, we selected further tomograms, one per condition, and depict two additional virtual sections: Fig. 6-figure supplement 2.

        -Moreover, we present a more detailed quantification for the different morphological sub-pools: For the MP-SV pool, we analyzed the SV diameters and the distances to the AZ membrane and PD of different SV sub-pools separately, we now included this information in Fig. 7 For the RA-SVs, we analyzed in addition the morphological sub-pools and the SV diameters in the distal and the proximal ribbon part as done in Chakrabarti et al. 2018. We now added a new supplement figure (Fig. 7-figure supplement 2, line 558 and a supplementary file 2).

      • We replaced the virtual section in panel 6D: In the old version, it appeared that the ribbon was contacting the membrane and we realized that this virtual section was not representative: actually, the ribbon was not directly contacting the AZ membrane, a presynaptic density was still visible adjacent to the docked SVs. To avoid potential confusion, we selected a different virtual section of the same tomogram and now indicated the presynaptic density also as graphical aid in Fig. 6.

      The introduction raises questions about the length of membrane tethers in relation to vesicle movement toward the active zone, but this topic was not addressed in the manuscript.

      We apologize for not stating it sufficiently clear, we now rephrased this sentence. We now wrote:

      “…and seem to be organized in sub-pools based on the number of tethers and to which structure these tethers are connected. “

      Seemingly quantification of this metric, and the number of tethers especially for vesicles near the membrane, is straightforward. The topic of EPSC amplitude as representing unitary events due to variation in vesicle volume, size of the fusion pore, or vesicle-vesicle fusion was partially addressed. Membrane fusion events were not evident in the few images shown, but these presumably occurred and could be quantified. Likewise, sites of membrane retrieval could also be marked. These analyses will broaden the scope of the presentation, but also contribute to a more complete story.

      Regarding the presence/absence of membrane fusion events we agree with the reviewer that this should be clearly addressed in the MS. We would like to point out that we

      (i) did not observe any omega shapes at the AZ membrane, which we also mention in the MS. We can also report that we could not see them in data sets from previous publications (Vogl et al., 2015, JCS; Jung et al., 2015, PNAS).

      (ii) To be clear on our observations on potential SV-SV fusion events we now point out in the discussion from line 688ff:

      “We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Furthermore, we agree with the reviewer that a complete presentation of endo-exocytosis structural correlates is very important. However, we focused our study on exocytosis events and therefore mainly analyzed membrane proximal SVs at active zones.

      Nonetheless, in response to the reviewer’s comment, we now included a quantification of clathrin-coated (CC) structures. We determined the appearance of CC vesicles (V) and CC invaginations within 0-500 nm away from the PD. We measured the diameter of the CCV, and their distance to the membrane and the PD. We only found very few CC structures in our tomograms (now added in a table to the result section (Supplementary file 1). Sites for endocytic membrane retrieval likely are in the peri-active zone area or even beyond. We did not observe obvious bulk endocytosis events that were connected to the AZ membrane. However, we do observe large endosomal like vesicles that we did not quantify in this study. More details were presented in two of our previous studies: Kroll et al., 2019 and 2020, however, under different stimulation conditions.

      Overall, the methodology forms the basis for future studies by this group and others to investigate rapid changes in synaptic vesicle distribution at this synapse.

      Reviewer #4 (Public Review):

      This manuscript investigates the process of neurotransmitter release from hair cell synapses using electron microscopy of tissue rapidly frozen after optogenetic stimulation. The primary finding is that in the absence of a stimulus very few vesicles appear docked at the membrane, but upon stimulation vesicles rapidly associate with the membrane. In contrast, the number of vesicles associated with the ribbon and within 50 nm of the membrane remains unchanged. Additionally, the authors find no changes in vesicle size that might be predicted if vesicles fuse to one-another prior to fusing with the membrane. The paper claims that these findings argue for rapid replenishment and against a mechanism of multi-vesicular release, but neither argument is that convincing. Nonetheless, the work is of high quality, the results are intriguing, and will be of interest to the field.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      1) The abstract states that their results "argue against synchronized multiquantal release". While I might agree that the lack of larger structures is suggestive that homotypic fusion may not be common, this is far from an argument against any mechanisms of multi-quantal release. At least one definition of synchronized multiquantal release posits that multiple vesicles are fusing at the same time through some coordinated mechanism. Given that they do not report evidence of fusion itself, I fail to see how these results inform us one way or the other.

      We agree with the reviewer that the discussion on MVR and UVR should be extended. It is important to point out that we do not claim that the evoked release is mediated by one single SV. As discussed in the paper (line 672), we consider that our optogenetic stimulation of IHCs triggers the release of more than 10 SVs per AZ. This falls in line with the previous reports of several SVs fusing upon stimulation. This type of evoked MVR is probably mediated by the opening of Ca2+ channels in close proximity to each SV Ca2+ sensor. We indeed sometimes observed more than one docked SV per AZ upon long optogenetic stimulation. This could reflect that possibility. However, given the absence of large structures directly at the ribbon or the AZ membrane that could suggest the compound fusion of several SVs prior or during fusion, we argue against compound MVR release at IHCs. As mentioned above, we added to the discussion (from line 679 onwards).

      We wrote:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      2) The complete lack of docked vesicles in the absence of a stimulus followed by their appearance with a stimulus is a fascinating result. However, since there are no docked vesicles prior to a stimulus, it is really unclear what these docked vesicles represent - clearly not the RRP. Are these vesicles that are fusing or recently fused or are they ones preparing to fuse? It is fine that it is unknown, but it complicates their interpretation that the vesicles are "rapidly replenished". How does one replenish a pool of docked vesicles that didn't exist prior to the stimulus?

      In response to the reviewers’ comment, we would like to note that we indeed reported very few docked SVs in wild type IHCs at resting conditions without K+ channel blockers in Chakrabarti et al. EMBO Rep 2018 and in Kroll et al., 2020, JCS. In both studies, a solution without TEA and Cs was used for the experiments (resting solution Chakrabarti: 5 mM KCl, 136.5 mM NaCl, 1 mM MgCl2, 1.3 mM CaCl2, 10 mM HEPES, pH 7.2, 290 mOsmol; control solution Kroll: 5.36 mM KCl, 139.7 mM NaCl, 2 mM CaCl2, 1 mM MgCl2, 0.5 mM MgSO4, 10 mM HEPES, 3.4 mM L-glutamine, and 6.9 mM D-glucose, pH 7.4). Similarly, our current study shows very few docked SVs in the resting condition even in the presence of TEA and Cs. Based on the results presented in ‘Response to reviewers Figure 1’, we assume that the scarcity of docked SVs under control conditions is not due to depolarization induced by a solution containing 20 mM TEA and 1 mM Cs but is rather representative for the physiological resting state of IHC ribbon synapses. Upon 15 min high potassium depolarization, the number of docked SVs only slightly increased as shown in Chakrabarti et al., 2018 and Kroll et al. 2020, but it was not statistically significant. In the current study, we report a similar phenomenon, but here depolarization resulted in a more robust increase in the number of docked SVs.

      To compare the data from the previous studies with the current study, we included an additional table 3 (line 676) now in the discussion with all total counts (and average per AZ) of docked SVs.

    1. Author Response

      eLife assessment:

      This study addresses whether the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated Bacteroides thetaiotaomicron, a resident micro-organism of the colon. This is an important question because factors determining the colonization of gut bacteria remain a critical barrier in translating microbiome research into new bacterial cell-based therapies. To answer the question, the authors develop an innovative method to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Their main finding that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota is valuable and this observation suggests that interactions between gut bacteria explains why the mutant has a colonization defect. The evidence supporting this claim is currently insufficient. Additionally, some of the analyses and claims are compromised because the authors do not fully explain their data and the number of animals is sometimes very small.

      Thank you for this frank evaluation. Based on the Reviewers’ comments, the points raised have been addressed by improving the writing (apologies for insufficient clarity), and by the addition of data that to a large extent already existed or could be rapidly generated. In particularly the following data has been added:

      1. Increase to n>=7 for all fecal time-course experiments

      2. Microbiota composition analysis for all mouse lines used

      3. Data elucidating mechanisms of SPF microbiome/ host immune mechanisms restriction of acapsular B. theta

      4. Short- versus long-term recolonization of germ-free mice with a complete SPF microbiota and assessment of the effect on B. theta colonization probability.

      5. Challenge of B. theta monocolonized mice with avirulent Salmonella to disentangle effects of the host inflammatory response from other potential explanations of the observations.

      6. Details of all inocula used

      7. Resequencing of all barcoded strains

      Additionally, we have improved the clarity of the text, particularly the methods section describing mathematical modeling in the main text. Major changes in the text and particularly those replying to reviewers comment have been highlighted here and in the manuscript.

      Reviewer #1 (Public Review):

      The study addresses an important question - how the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated B. theta, an important commensal organism. To answer the question, the authors develop a refurbished WITS with extended mathematical modeling to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Interestingly, they show that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota, suggesting (but not proving) that interactions between gut bacteria, rather than with host immune mechanisms, explains why the mutant has a colonization defect. However, it is fairly difficult to evaluate some of the claims because experimental details are not easy to find and the number of animals is very small. Furthermore, some of the analyses and claims are compromised because the authors do not fully explain their data; for example, leaving out the zero values in Fig. 3 and not integrating the effect of bottlenecks into the resulting model, undermines the claim that the acapsular mutant has a longer in vivo lag phase.

      We thank the reviewer for taking time to give this details critique of our work, and apologies that the experimental details were insufficiently explained. This criticism is well taken. Exact inoculum details for experiment are now present in each figure (or as a supplement when multiple inocula are included). Exact microbiome composition analysis for OligoMM12, LCM and SPF microbiota is now included in Figure 2 – Figure supplement 1.

      Of course, the models could be expanded to include more factors, but I think this comment is rather based on the data being insufficiently clearly explained by us. There are no “zero values missing” from Fig. 3 – this is visible in the submitted raw data table (excel file Source Data 1), but the points are fully overlapped in the graph shown and therefore not easily discernable from one another. Time-points where no CFU were recovered were plotted at a detection limit of CFU (50 CFU/g) and are included in the curve-fitting. However, on re-examination we noticed that the curve fit was carried out on the raw-data and not the log-normalized data which resulted in over-weighting of the higher values. Re-fitting this data does not change the conclusions but provides a better fit. These experiments have now been repeated such that we now have >=7 animals in each group. This new data is presented in Fig. 3C and D and Fig. 3 Supplement 2.

      Limitations:

      1) The experiments do not allow clear separation of effects derived from the microbiota composition and those that occur secondary to host development without a microbiota or with a different microbiota. Furthermore, the measured bottlenecks are very similar in LCM and Oligo mice, even though these microbiotas differ in complexity. Oligo-MM12 was originally developed and described to confer resistance to Salmonella colonization, suggesting that it should tighten the bottleneck. Overall, an add-back experiment demonstrating that conventionalizing germ-free mice imparts a similar bottleneck to SPF would strengthen the conclusions.

      These are excellent suggestions and have been followed. Additional data is now presented in Figure 2 – figure supplement 8 showing short, versus long-term recolonization of germ-free mice with an SPF microbiota and recovering very similar values of beta, to our standard SPF mouse colony. These data demonstrate a larger total niche size for B. theta at 2 days post-colonization which normalizes by 2 weeks post-colonization. Independent of this, the colonization probability, is already equivalent to that observed in our SPF colony at day 2 post-colonization. Therefore, the mechanisms causing early clonal loss are very rapidly established on colonization of a germ-free mouse with an SPF microbiota. We have additionally demonstrated that SPF mice do not have detectable intestinal antibody titers specific for acapsular B. theta. (Figure 2 – figure supplement 7), such that this is unlikely to be part of the reason why acapsular B. theta struggles to colonize at all in the context of an SPF microbiota. Experiments were also carried to detect bacteriophage capable of inducing lysis of B. theta and acapsular B. theta from SPF mouse cecal content (Figure 2 – figure supplement 7). No lytic phage plaques were observed. However, plaque assays are not sensitive for detection of weakly lytic phage, or phage that may require expression of surface structures that are not induced in vitro. We can therefore conclude that the restrictive activity of the SPF microbiota is a) reconstituted very fast in germ-free mice, b) is very likely not related to the activity of intestinal IgA and c) cannot be attributed to a high abundance of strongly lytic bacteriophage. The simplest explanation is that a large fraction of the restriction is due to metabolic competition with a complex microbiota, but we cannot formally exclude other factors such as antimicrobial peptides or changes in intestinal physiology.

      2) It is often difficult to evaluate results because important parameters are not always given. Dose is a critical variable in bottleneck experiments, but it is not clear if total dose changes in Figure 2 or just the WITS dose? Total dose as well as n0 should be depicted in all figures.

      We apologized for the lack of clarity in the figures. Have added panels depicting the exact inoculum for each figure legend (or a supplementary figure where many inocula were used). Additionally, the methods section describing how barcoded CFU were calculated has been rewritten and is hopefully now clearer.

      3) This is in part a methods paper but the method is not described clearly in the results, with important bits only found in a very difficult supplement. Is there a difference between colonization probability (beta) and inoculum size at which tags start to disappear? Can there be some culture-based validation of "colonization probability" as explained in the mathematics? Can the authors contrast the advantages/disadvantages of this system with other methods (e.g. sequencing-based approaches)? It seems like the numerator in the colonization probability equation has a very limited range (from 0.18-1.8), potentially limiting the sensitivity of this approach.

      We apologized for the lack of clarity in the methods. This criticism is well taken, and we have re-written large sections of the methods in the main text to include all relevant detail currently buried in the extensive supplement.

      On the question of the colonization probability and the inoculum size, we kept the inoculum size at 107 CFU/ mouse in all experiments (except those in Fig.4, where this is explicitly stated); only changing the fraction of spiked barcoded strains. We verified the accuracy of our barcode recovery rate by serial dilution over 5 logs (new figure added: Figure 1 – figure supplement 1). “The CFU of barcoded strains in the inoculum at which tags start to disappear” is by definition closely related to the colonization probability, as this value (n0) appears in the calculation. Note that this is not the total inoculum size – this is (unless otherwise stated in Fig. 4) kept constant at 107 CFU by diluting the barcoded B. theta with untagged B. theta. Again, this is now better explained in all figure legends and the main text.

      We have added an experiment using peak-to-trough ratios in metagenomic sequencing to estimate the B. theta growth rate. This could be usefully employed for wildtype B. theta at a relatively early timepoint post-colonization where growth was rapid. However, this is a metagenomics-based technique that requires the examined strain to be present at an abundance of over 0.1-1% for accurate quantification such that we could not analyze the acapsular B. theta strain in cecum content at the same timepoint. These data have been added (Figure 3 – figure supplement 3). Note that the information gleaned from these techniques is different. PTR reveals relative growth rates at a specific time (if your strain is abundant enough), whereas neutral tagging reveals average population values over quite large time-windows. We believe that both approaches are valuable. A few sentences comparing the approaches have been added to the discussion.

      The actual numerator is the fraction of lost tags, which is obtained from the total number of tags used across the experiment (number of mice times the number of tags lost) over the total number of tags (number of mice times the number of tags used). Very low tag recovery (less than one per mouse) starts to stray into very noisy data, while close to zero loss is also associated with a low-information-to-noise ratio. Therefore, the size of this numerator is necessarily constrained by us setting up the experiments to have close to optimal information recovery from the WITS abundance. Robustness of these analyses is provided by the high “n” of between 10 and 17 mice per group.

      4) Figure 3 and the associated model is confusing and does not support the idea that a longer lag-phase contributes to the fitness defect of acapsular B.theta in competitive colonization. Figure 3B clearly indicates that in competition acapsular B. theta experiences a restrictive bottleneck, i.e., in competition, less of the initial B. theta population is contributed by the acapsular inoculum. There is no need to appeal to lag-phase defects to explain the role of the capsule in vivo. The model in Figure 3D should depict the acapsular population with less cells after the bottleneck. In fact, the data in Figure 3E-F can be explained by the tighter bottleneck experienced by the acapsular mutant resulting in a smaller acapsular founding population. This idea can be seen in the data: the acapsular mutant shedding actually dips in the first 12-hours. This cannot be discerned in Figure 3E because mice with zero shedding were excluded from the analysis, leaving the data (and conclusion) of this experiment to be extrapolated from a single mouse.

      We of course completely agree that this would be a correct conclusion if only the competitive colonization data is taken into account. However, we are also trying to understand the mechanisms at play generating this bottleneck and have investigated a range of hypotheses to explain the results, taking into account all of our data.

      Hypothesis 1) Competition is due to increased killing prior to reaching the cecum and commencing growth: Note that the probability of colonization for single B. theta clones is very similar for OligoMM12 mouse single-colonization by the wildtype and acapsular strains. For this hypothesis to be the reason for outcompetition of the acapsular strain, it would be necessary that the presence of wildtype would increase the killing of acapsular B. theta in the stomach or small intestine. The bacteria are at low density at this stage and stomach acid/small intestinal secretions should be similar in all animals. Therefore, this explanation seems highly unlikely

      Hypothesis 2) Competition between wildtype and acapsular B. theta occurs at the point of niche competition before commencing growth in the cecum (similar to the proposal of the reviewer). It is possible that the wildtype strain has a competitive advantage in colonizing physical niches (for example proximity to bacteria producing colicins). On the basis of the data, we cannot exclude this hypothesis completely and it is challenging to measure directly. However, from our in vivo growth-curve data we observe a similar delay in CFU arrival in the feces for acapsular B. theta on single colonization as in competition, suggesting that the presence of wildtype (i.e., initial niche competition) is not the cause of this delay. Rather it is an intrinsic property of the acapsular strain in vivo,

      Hypothesis 3) Competition between wildtype and acapsular B. theta is mainly attributable to differences in growth kinetics in the gut lumen. To investigate growth kinetics, we carried our time-courses of fecal collection from OligoMM12 mice single-colonized with wildtype or acapsular B. theta, i.e., in a situation where we observe identical colonization probabilities for the two strains. These date, shown now in Figure 3 C and D and Figure 3 – figure supplement 2, show that also without competition, the CFU of acapsular B. theta appear later and with a lower net growth rate than the wildtype. As these single-colonizations do not show a measurable difference between the colonization probability for the two strains, it is not likely that the delayed appearance of acapsular B. theta in feces is due to increased killing (this would be clearly visible in the barcode loss for the single-colonizations). Rather the simplest explanation for this observation is a bona fide lag phase before growth commences in the cecum. Interestingly, using only the lower net growth rate (assumed to be a similar growth rate but increased clearance rate) produces a good fit for our data on both competitive index and colonization probability in competition (Figure 3, figure supplement 5). This is slightly improved by adding in the observed lag-phase (Figure 3). It is very difficult to experimentally manipulate the lag phase in order to directly test how much of an effect this has on our hypothesis and the contribution is therefore carefully described in the new text.

      Please note that all data was plotted and used in fitting in Fig 3E, but “zero-shedding” is plotted at a detection limit and overlayed, making it look like only one point was present when in fact several were used. This was clear in the submitted raw data tables. To sure-up these observations we have repeated all time-courses and now have n>=7 mice per group.

      5) The conclusions from Figure 4 rely on assumptions not well-supported by the data. In the high fat diet experiment, a lower dose of WITS is required to conclude that the diet has no effect. Furthermore, the authors conclude that Salmonella restricts the B. theta population by causing inflammation, but do not demonstrate inflammation at their timepoint or disprove that the Salmonella population could cause the same effect in the absence of inflammation (through non-inflammatory direct or indirect interactions).

      We of course agree that we would expect to see some loss of B. theta in HFD. However, for these experiments the inoculum was ~109 CFUs/100μL dose of untagged strain spiked with approximately 30 CFU of each tagged strain. Decreasing the number of each WITS below 30 CFU leads to very high variation in the starting inocula from mouse-to-mouse which massively complicates the analysis. To clarify this point, we have added in a detection-limit calculation showing that the neutral tagging technique is not very sensitive to population contractions of less than 10-fold, which is likely in line with what would be expected for a high-fat diet feeding in monocolonized mice for a short time-span.

      This is a very good observation regarding our Salmonella infection data. We have now added the fecal lipocalin 2 values, as well as a group infected with a ssaV/invG double mutant of S. Typhimurium that does not cause clinical grade inflammation (“avirulent”). This shows 1) that the attenuated S. Typhimurium is causing intestinal inflammation in B. theta colonized mice and 2) that a major fraction of the population bottleneck can be attributed to inflammation. Interestingly, we do observe a slight bottleneck in the group infected with avirulent Salmonella which could be attributable either to direct toxicity/competition of Salmonella with B. theta or to mildly increased intestinal inflammation caused by this strain. As we cannot distinguish these effects, this is carefully discussed in the manuscript.

      6) Several of the experiments rely on very few mice/groups.

      We have increased the n to over 5 per group in all experiments (most critically those shown in Fig 3, Supplement 5). See figure legends for specific number of mice per experiment.

      Reviewer #2 (Public Review):

      The goal of this study was to understand population bottlenecks during colonization in the context of different microbial communities. Capsular polysaccharide mutants, diet, and enteric infection were also used paired to short-term monitoring of overall colonization and the levels of specific strains. The major strength of this study is the innovative approach and the significance of the overall research area.

      The first major limitation is the lack of clear and novel insight into the biology of B. theta or other gut bacterial species. The title is provocative, but the experiments as is do not definitively show that the microbiota controls the relative fitness of acapsular and wild-type strains or provide any mechanistic insights into why that would be the case. The data on diet and infection seem preliminary. Furthermore, many of the experiments conflict with prior literature (i.e., lack of fitness difference between acapsular and wild-type strain and lack of impact of diet) but satisfying explanations are not provided for the lack of reproducibility.

      In line with suggestions from Reviewer 1, the paper has undergone quite extensive re-writing to better explain the data presented and its consequences. Additionally, we now explicitly comment on apparent discrepancies between our reported data and the literature – for example the colonization defect of acapsular B. theta is only published for competitive colonizations, where we also observe a fitness defect so there is no actual conflict. Additionally, we have calculated detection limits for the effect of high-fat diet and demonstrate that a 10-fold reduction in the effective population size would not be robustly detected with the neutral tagging technique such that we are probably just underpowered to detect small effects, and we believe it is important to point out the numerical limits of the technique we present here. Additionally for the Figure 4 experiments, we have added data on colonization/competition with an avirulent Salmonella challenge giving some mechanistic data on the role of inflammation in the B. theta bottleneck.

      Another major limitation is the lack of data on the various background gut microbiotas used. eLife is a journal for a broad readership. As such, describing what microbes are in LCM, OligoMM, or SPF groups is important. The authors seem to assume that the gut microbiota will reflect prior studies without measuring it themselves.

      All gnotobiotic lines are bred as gnotobiotic colonies in our isolator facility. This is now better explained in the methods section. Additionally, 16S sequencing of all microbiotas used in the paper has been added as Figure 2 – figure supplement 1.

      I also did not follow the logic of concluding that any differences between SPF and the two other groups are due to microbial diversity, which is presumably just one of many differences. For example, the authors acknowledge that host immunity may be distinct. It is essential to profile the gut microbiota by 16S rRNA amplicon sequencing in all these experiments and to design experiments that more explicitly test the diversity hypotheses vs. alternatives like differences in the membership of each community or other host phenotypes.

      This is an important point. We have carried out a number of experiments to potentially address some issues here.

      1) We carried out B. theta colonization experiments in germ-free mice that had been colonized by gavage of SPF feces either 1 day prior to colonization of 2 weeks prior to colonization. While the shorter pre-colonization allowed B. theta to colonize to a higher population density in the cecum, the colonization probability was already reduced to levels observed in our SPF colony in the short pre-colonization. Therefore, the factors limiting B. theta establishment in the cecum are already established 1-2 days post-colonization with an SPF microbiota (Figure 2 - figure supplement 8). 2) We checked for the presence of secretory IgA capable of binding to the surface of live B. theta, compared to a positive control of a mouse orally vaccinated against B. theta. (Fig. 2, Supplement 7) and could find no evidence of specific IgA targeting B. theta in the intestinal lavages of our SPF mouse colony. 3) We isolated bacteriophage from the intestine of SPF mice and used this to infect lawns of B. theta wildtype and acapsular in vitro. We could not detect and plaque-forming phage coming from the intestine of SPF mice (Figure 2 – figure supplement 7).

      We can therefore exclude strongly lytic phage and host IgA as dominant driving mechanisms restricting B. theta colonization. It remains possible that rapidly upregulated host factors such as antimicrobial peptide secretion could play a role, but metabolic competition from the microbiota is also a very strong candidate hypothesis. The text regarding these experiments has been slightly rewritten to point out that colonization probability inversely correlates with microbiota complexity, and the mechanisms involved may involve both direct microbe-microbe interactions as well as host factors.

      Given the prior work on the importance of capsule for phage, I was surprised that no efforts are taken to monitor phage levels in these experiments. Could B. theta phage be present in SPF mice, explaining the results? Alternatively, is the mucus layer distinct? Both could be readily monitored using established molecular/imaging methods.

      See above: no plaque-forming phage could be recovered from the SPF mouse cecum content. The main replicative site that we have studied here, in mice, is the cecum which does not have true mucus layers in the same way as the distal colon and is upstream of the colon so is unlikely to be affected by colon geography. Rather mucus is well mixed with the cecum content and may behave as a dispersed nutrient source. There is for sure a higher availability of mucus in the gnotobiotic mice due to less competition for mucus degradation by other strains. However, this would be challenging to directly link to the B. theta colonization phenotype as Muc2-deficient mice develop intestinal inflammation.

      The conclusion that the acapsular strain loses out due to a difference of lag phase seems highly speculative. More work would be needed to ensure that there is no difference in the initial bottleneck; for example, by monitoring the level of this strain in the proximal gut immediately after oral gavage.

      This is an excellent suggestion and has been carried out. At 8h post-colonization with a high inoculum (allowing easy detection) there were identical low levels of B. theta in the upper and lower small intestine, but more B. theta wildtype than B. theta acapsular in the cecum and colon, consistent with commencement of growth for B. theta wildtype but not the acapsular strain at this timepoint. We have additionally repeated the single-colonization time-courses using our standard inoculum and can clearly see the delayed detection of acapsular B. theta in feces even in the single-colonization state when no increased bottleneck is observed. This can only be reasonably explained by a bona fide lag-phase extension for acapsular B. theta in vivo. These data also reveal and decreased net growth rate of acapsular B. theta. Interestingly, our model can be quite well-fitted to the data obtained both for competitive index and for colonization probability using only the difference in net growth rate. Adding the (clearly observed) extended lag-phase generates a model that is still consistent with our observations.

      Another major limitation of this paper is the reliance on short timepoints (2-3 days post colonization). Data for B. theta levels over 2 weeks or longer is essential to put these values in context. For example, I was surprised that B. theta could invade the gut microbiota of SPF mice at all and wonder if the early time points reflect transient colonization.

      It should be noted that “SPF” defines microbiota only on missing pathogens and not on absolute composition. Therefore, the rather efficient B. theta colonization in our SPF colony is likely due to a permissive composition and this is likely to be not at all reproducible between different SPF colonies (a major confounder in reproducibility of mouse experiments between institutions. In contrast the gnotobiotic colonies are highly reproducible). We do consistently see colonization of our SPF colony by wildtype B. theta out to at least 10 days post-inoculation (latest time-point tested) at similar loads to the ones observed in this work, indicating that this is not just transient “flow-through” colonization. Data included below:

      For this paper we were very specifically quantifying the early stages of colonization, also because the longer we run the experiments for, the more confounding features of our “neutrality” assumptions appear (e.g., host immunity selecting for evolved/phase-varied clones, within-host evolution of individual clones etc.). For this reason, we have used timepoints of a maximum of 2-3 days.

      Finally, the number of mice/group is very low, especially given the novelty of these types of studies and uncertainty about reproducibility. Key experiments should be replicated at least once, ideally with more than n=3/group.

      For all barcode quantification experiments we have between 10 and 17 mice per group. Experiments for the in vivo time-courses of colonization have been expanded to an “n” of at least 7 per group.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. The analysis of the data revealed an interesting discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus (IFOF) - the longest associative axon bundle in the human brain.

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They also present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. Overall the paper can serve as a how to example for analyzing large non-human primate brain data, though some parts of the paper can be improved and the interpretation of the data should also be further strengthened.

      We thank the reviewer for his positive evaluation of our manuscript.

      The methodological part should include more detail on image acquisition - speed of imaging, pixel residence time, total time for data acquisition of a single brain and data sizes. Also the time and hardware needed for the computational analysis should be included, including the registration to the common reference and the running time for the machine learning predictions - this should also include the F score for the axon detection.

      We thank the reviewer for pointing out these vital issues. We have added these technical details in the resubmitted manuscript.

      “High x-y resolution (0.95 μm/pixel) serial 2D images were acquired in the coronal plane at a z-interval of 200 μm across the entire macaque brain. The scanning time of a single field-of-view which contains 1024 by 1024 pixels was 1.629 s (i.e., pixel residence time was ~1.6 μs), as resulted in a continuous ~1 month scanning and ~5 TB STP tomography data for a single monkey brain.”

      “The data analysis was undertaken on a compute cluster with a 3.1 - 3.3 GHz 248 core CPU, 2.8 T of RAM, and 17472 CUDA cores.”

      “The total computational time for the machine learning predictions in one macaque brain was ~ 1.5 months.”

      “To evaluate overall classifier performance, the precision–recall F measure, also called F-score, was computed by using additional four labeled images as test sets. Higher accuracy performance achieved by the classifier often yield higher F-scores (94.41% ± 1.99%, mean ± S.E.M.).”

      “For registration to the 3D common space, it took half an hour approximately.”

      The discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus seems puzzling. One would expect that the STP data would reveal more detail not less.. One possibility is that the Tau-GFP does not diffuse throughout the full axon arborization of the PFC neurons, resulting in a technical artifact. Can this be excluded to support the functional significance of the current data?

      We thank the reviewer for raising this important issue. We apologize for not providing sufficient details of the IFOF debate due to limited space and causing confusion. We have added literature background of the IFOF debate to the section of Introduction (also recommended by Reviewer #2). Thanks to the comments by Reviewer #2, the present finding provides direct support for the speculation that the IFOF of macaque monkeys may not exist in a mono-synaptic way.

      The AAV construct encoding cytoskeletal GFP (Tau-GFP) was used here to label all processes of the infected neuron, including axons and synaptic terminals. About 3 weeks of post-surgery survival time are usually sufficient to label intracerebral circuits in rodents (Lanciego and Wouterlood, 2020). We have extended the survival time to 2-3 months in order to achieve adequate labeling of axonal fibers and terminals in macaques.

      Regarding the extent of Tau-GFP diffuse, the STP images and high-resolution confocal microscopic analysis further showed differences in the morphology of axon fibers that populate the route and terminals of these axon fibers. Consistent with previous reports (Fuentes-Santamaria et al., 2009; Watakabe and Hirokawa, 2018), the axon fibers were thin and formed bouton-like varicosities in the terminal regions (MD, Figure 2—figure supplement 7D; caudate, Figure 2—figure supplement 7J; PFC, Figure 1—figure supplement 5A-D). Those results indicate that the Tau-GFP has reached axonal terminals.

      References:

      Fuentes-Santamaria V, Alvarado JC, McHaffie JG, Stein BE (2009) Axon Morphologies and Convergence Patterns of Projections from Different Sensory-Specific Cortices of the Anterior Ectosylvian Sulcus onto Multisensory Neurons in the Cat Superior Colliculus. Cereb Cortex 19:2902-2915.

      Lanciego JL, Wouterlood FG (2020) Neuroanatomical tract-tracing techniques that did go viral. Brain Struct Funct 225:1193-1224.

      Watakabe A, Hirokawa J (2018) Cortical networks of the mouse brain elaborate within the gray matter. Brain Struct Funct 223:3633-3652.

      Reviewer #2 (Public Review):

      The authors utilized viral vectors as neural tracers to delineate the connectivity map of the macaque vlPFC at the axonal level. There are three main goals of this study: 1) determine an effective viral vector for tract-tracing in the macaque brain, 2) delineate the detailed map of excitatory vlPFC projections to the rest of the brain, and 3) compare vlPFC connectivity between tracing and tractography results.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      Accordingly, my comments are organized around each aim:

      1) This study demonstrates the advantage of viral tracing technique in targeting neuron type-specific pathways. The authors conducted injection experiments with three types of viral vectors and found success of AAV in labeling long-distance connections without causing fatal neurotoxicity in the monkey. This success extends the application of AAV from rodents to nonhuman primates. The fact that AAV specifically targets glutamatergic neurons makes it advantageous for mapping excitatory projections.

      Although the labeling efficacy of each viral vector type is described in the text, Fig. 2 does not present a clear comparison across viral vectors, despite such comparison for a thalamic injection in Fig. 2S. Without a comparable graph to Fig. 2E, it is unclear to what extent the VSV and lentivirus failed in labeling long-distance pathways.

      We thank the reviewer for the helpful suggestion. As suggested, we have added three new figures as Supplementary materials in the revised manuscript.

      Figure 2—figure supplement 2. Expression of GFP using VSV-△G injected into MD thalamus of the macaque brain. (A) GFP-labeled neurons were found in the MD thalamus ~5 days after injection of VSV-△G encoding Tau-GFP. (B) A magnified view illustrating the morphology of GFP-labeled neurons in the area outlined with a white box in (A). (C) Higher magnification view of GFP-positive axons.

      Figure 2—figure supplement 3. Expression of GFP using lentivirus injected into MD thalamus of the macaque brain. (A) Lentivirus construct was injected into the macaque thalamus and examined for transgene expression after ~9 months. (B) High power views of the dotted rectangle in panel A. (C) Magnified view of panel B. Note the presence of GFP-positive cells.

      Figure 2—figure supplement 4. Expression of GFP using AAV2/9 injected into MD thalamus of the macaque brain. (A) GFP-labeled axons were observed in the subcortical regions ~42 days after injection of AAV2/9 encoding Tau-GFP in MD thalamus. The inset shows the injection site in MD thalamus. Two dashed line boxes enclose the regions of interest: frontal white matter and ALIC, whose GFP signal are magnified in (B) and (C), respectively. (D) Higher magnification view of GFP-positive axons.

      2) The authors quantified connectivity strength by the GFP signal intensity using a machine-learning algorithm. Both the quantitative approach and the resulting excitatory projection map are important contributions to advancing our knowledge of vlPFC connectivity.

      However, several issues with the analysis lead to concerns about the connectivity result. First, the strength measure is based on axonal patterns in the terminal fields (which the authors refer to as "axon clusters"), detected by a machine-learning algorithm (page 25, lines 11-13). However, the actual synaptic connections are the small dot-looking signals in the background. These "green dots" are boutons on the dendritic trees. The density of boutons rather than the passing fibers reflects the density of synapses. The brief method description does not mention how the boutons are quantified, and it is unclear whether the signal was treated as the background noise and filtered out. Second, it is difficult for the reader to assess the robustness of the vlPFC connectivity patterns, due to these issues: i) It is unclear how many injection cases were used to generate the result reported in the subsection "Brain-wide excitatory projectome of vlPFC in macaques". The text mentions a singular "injection site" (page 8, line 12) and Fig. 4 shows a single site. However, there are three cases listed in Table 1. Is the result an average of all three cases? ii) Relatedly, it is unclear in which anatomical area the injection was placed for each case. Table 1 lists the site as "vlPFC" for all three cases, while the vlPFC contains areas 44, 45 and 12l. These areas have different projection patterns documented in the tract tracing literature. If different areas were injected in the three cases, they should be reported separately. iii) It is hard to compare the projection patterns with those reported in the literature. Conventionally, tract tracing studies report terminal fields by showing original labeling patterns in both cortical and subcortical regions without averaging within divided areas (see e.g. Petrides & Pandya, 2007, J Neurosci). It is hard to compare Fig. 3 with previous tract tracing studies to assess its robustness.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      1). We appreciate the reviewer’s comment and sincerely apologize for not explaining this point clearly in our previous submission. The major concern is whether the axonal varicosities were likely to be treated as the background noise and removed by mistake. In fact, the dot-looking autofluorescence rather than the axonal varicosities were reduced through a machine-learning algorithm in segmentation. Hence we have provided new results and updated the “Materials and Methods” and “Discussion” sections in the revision accordingly.

      “Fluorescent images of primate (Abe et al., 2017) brain often contain high-intensity dot-looking background signal caused by accumulation of lipofuscin. Thanks to the broad emission spectrum of lipofuscin, dot-looking background and GFP-positive axonal varicosities are easily distinguishable from each other. For instance (Figure 1—figure supplement 4), axonal varicosities can be selectively excited in green channel, while dot-looking background lipofuscin usually present in both green channel and red channel. During quantitative analysis, a machine learning algorithm was adopted to reliably segment the GFP labelled axonal fibers including axonal varicosities, and remove the lipofuscin background (Arganda-Carreras et al., 2017; Gehrlach et al., 2020).”

      “One recent study compared results of terminal labelling using Synaptophysin-EGFP-expressing AAV (specifically labelling synaptic endings) with the cytoplasmic EGFP AAV (labelling axon fibers and synaptic endings). There was high correspondence between synaptic EGFP and cytoplasmic EGFP signals in target regions (Oh et al., 2014). Thus, we relied on quantifying GFP-positive pixels (containing signals from both axonal fibers and terminals) rather than the number of synaptic terminals, similarly done in recent reports (Oh et al., 2014; Gehrlach et al., 2020).”

      Figure 1—figure supplement 4. Difference between axonal varicosities and dot-looking background. STP images (A-D) and high-resolution confocal images (E-H) were acquired in green channel and the red channel. Synaptic terminals (indicated by white arrows) can be specifically excited in green channel, while dot-looking background lipofuscin (indicated by yellow arrows) can be visualized both in green channel and red channel. (C and G) No colocalization was found between axonal varicosities and dot-looking background. Axonal varicosities were easily distinguished from dot-looking background in the merged image. (D and H) The dot-looking autofluorescence rather than the axonal varicosities was reduced through a machine-learning algorithm.

      References:

      Abe H, Tani T, Mashiko H, Kitamura N, Miyakawa N, Mimura K, Sakai K, Suzuki W, Kurotani T, Mizukami H, Watakabe A, Yamamori T, Ichinohe N (2017) 3D reconstruction of brain section images for creating axonal projection maps in marmosets. J Neurosci Methods 286:102-113.

      Arganda-Carreras I, Kaynig V, Rueden C, Eliceiri KW, Schindelin J, Cardona A, Sebastian Seung H (2017) Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics 33:2424-2426.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      2.1) We apologize for causing these confusions due to insufficient description in the main text. Now we have revised the description of the “Materials and Methods” section accordingly. Furthermore, we have made both the whole-brain serial two-photon data and high-resolution diffusion MRI data freely available to the community, as allows researchers in the field to perform further analyses that we have not done in the current study.

      “Three samples were injected with AAV in vlPFC, and two of them were able to be imaged with STP. Unfortunately, one sample became “loose” and fell off from the agar block after several weeks of imaging. So, the quantitative results were not shown in Figure 3.”

      2.2) We apologize for insufficient description of the precise location of the injection sites. We have revised the description of “Materials and Methods” section and provided a new figure to clarify the exact location of the injection sites.

      “Figure 3-4 and Figure 4—figure supplement 2-4 were derived from sample #8 with infected area in 45, 12l and 44 of vlPFC. Figure 1—figure supplement 6 was derived from sample #7 with infected area in 12l and 45 of vlPFC.”

      Figure 1—figure supplement 6. Representative fluorescent images showing injection site and major tracts of sample #7. (A) STP image of the injection site in vlPFC are shown overlaid with the monkey brain template (left hand side), mainly spanning areas 12l and 45a. (B) Confocal image of the AAV infected neurons (indicated by white arrows). (C-F) Representative confocal images of major tracts originating from vlPFC.

      2.3) We agree with the reviewer that most tract tracing studies report terminal fields by showing original labeling patterns. Several recent studies report the total volume of segmented GFP-positive pixels (Oh et al., 2014) or percentage of total labeled axons (Do et al., 2016; Gehrlach et al., 2020) to represent the connectivity strength, and other studies provide the projection density as well (Hunnicutt et al., 2016). We have provided both percentage of total labeled axons (Figure 3C right panel), projection density (Figure 3C left panel) and representative original fluorescent images (Figure. 4, Figure 4—figure supplement 2 and Figure 4—figure supplement 4) to demonstrate our projection data at different dimensions.

      References:

      Do JP, Xu M, Lee SH, Chang WC, Zhang S, Chung S, Yung TJ, Fan JL, Miyamichi K, Luo L, Dan Y (2016) Cell type-specific long-range connections of basal forebrain circuit. Elife 5.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Hunnicutt BJ, Jongbloets BC, Birdsong WT, Gertz KJ, Zhong H, Mao T (2016) A comprehensive excitatory input map of the striatum reveals novel functional organization. Elife 5.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      3) Using the ground-truth from tract tracing to validate tractography results is a timely problem and this study showed promising consistency and discrepancy between the two modalities. Especially, the discrepancy between tracing and tractography data on the IFOF termination brings critical insights into a potential cross-species difference. The finding that IFOF does not reach the occipital cortex provides important support for the speculation that IFOF may not exist in monkeys (for a context of the IFOF debate see Schmahmann & Pandya, 2006, pp 445-446).

      I have minor concerns regarding the statistical robustness of the tracing-tractography comparison. The authors compared the vlPFC-CC-contralateral tract instead of a global connectivity pattern without justification. Why omitting other major tracts that connect with vlPFC? In addition, the results are shown for only one monkey, while two monkeys went through both tracer injection and dMRI scans. It is unclear how the results were chosen or whether the data were averaged.

      We apologize for not describing it clearly. The STP images were acquired in the coronal plane with high x-y resolution (0.95 μm/pixel), while the z resolution was relatively low (200 μm). The axonal connection information along z axis may be lost due to the present step size (relatively large) such that it is technically demanding to reconstruct the axonal density maps in sagittal or horizontal plane. Therefore, we focused on the vlPFC-CC-contralateral tract traveling along the coronal plane when quantifying the similarity coefficients along the anterior-posterior axis of the whole macaque brain, and omitted the tracts that were shown as dots in the coronal plane. We have revised it in the resubmitted manuscript.

      “GFP projection and probabilistic tract were plotted with the Dice coefficients and Pearson coefficients (R) along the anterior-posterior axis of the whole macaque brain. The Dice coefficients and Pearson coefficients were higher in dense projection regions, especially for the vlPFC-CC-contralateral tract (Figure 6A). To carry out a proof-of-principle investigation, we focused on the vlPFC-CC-contralateral tract that was reconstructed in 3D space by using STP and dMRI data, respectively.”

      With regard to the demonstration of dMRI data, we apologize for not making it clear in previous version. We have already revised Figure 6 and Figure 7 so that dMRI scans from different macaque monkeys were shown separately.

      Figure 6. Comparison of vlPFC connectivity profiles by STP tomography and diffusion tractography. (A) Percentage of projection, Probabilistic tracts, Dice coefficients and Pearson coefficients (R) were plotted along the anterior-posterior axis in the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. (B, C) 3D visualization of the fiber tracts issued from the injection site in vlPFC to corpus callosum to the contralateral vlPFC by STP tomography and diffusion tractography. (D-F) Representative coronal slices of the diffusion tractography map and the axonal density map along the vlPFC-CC-contralateral tract, overlaid with the corresponding anatomical MR images. (G-J) GFP-labeled axon images as marked in Figure 6F were shown with magnified views. (H, J) correspond to high magnification images of the white boxes indicated in G and I, both of which presented a great deal of details about axonal morphology.

      Figure 7. Illustration of the inferior fronto-occipital fasciculus by diffusion tractography and STP. (A) The fiber tractography of IFOF (lateral view). Two inclusion ROIs at the external capsule (pink) and the anterior border of the occipital lobe (purple) were used and shown on the coronal plane. The IFOF stems from the frontal lobe, travels along the lateral border of the caudate nucleus and external/extreme capsule, forms a bowtie-like pattern and anchors into the occipital lobe. (B) The reconstructed traveling course of IFOF based on vlPFC projectome was shown in 3D space. (C) The Szymkiewicz-Simpson overlap coefficients between 2D coronal brain slices of the dMRI-derived IFOF tract and vlPFC projections were plotted along the anterior-posterior axis of the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. Four cross-sectional slices (D-G) along the IFOF tracts were arbitrarily chosen to demonstrate the spatial correspondence between the diffusion tractography and axonal tracing of STP images. (D-G) The detected GFP signals (green) of vlPFC projectome and the IFOF tracts (red) obtained by diffusion tractography were overlaid on anatomical MRI images, with a magnified view of the box area. Evidently there was no fluorescent signal detected in the superior temporal area where the dMRI-derived IFOF tract passes through (G).

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-executed study using cutting-edge proteomics analysis to characterize muscle tissue from a genetically diverse mouse population. The use of only females in the study is a serious limitation that the authors acknowledge. The statistical methods, including protein quantification, QTL mapping, and trait correlation analysis are appropriate and include corrections for multiple testing. One concern is that missense variants, if they occur in peptides used to quantify proteins, could lead to false-positive signatures of low abundance (see lines 123-127). The experimental validation and deep dive into UFMylation provide some confidence in the reliability of other associations that can be mined from these data. The authors have provided a web-based tool for exploring the data.

      We thank the reviewer for these very positive comments and for reviewing the manuscript.

      We agree the quantification of peptides containing missense variants could confound quantification at the protein level. This is an important consideration when there are only a few peptides identified for a specific protein. However, in our data the average number of peptides used to quantify the 14 proteins containing missense-associated pQTLs was ~68 peptides/protein (lowest was 5 peptides for FGB and highest 703 peptides for NEB).

      In the case of EPHX1, we quantified 15 peptides (Figure R1A). We identified a peptide adjacent to R338 spanning amino acids 339-347. As such, mutation of R338C would prevent trypsin from cleavage resulting in the missense peptide not being identified and may lead to false-positive signatures of low abundance as suggested by the reviewer. To investigate this, we re-quantified EPHX1 relative protein abundance with or without the peptide spanning 339-347 for each genotype (Figure R1B). This made little difference to protein quantification and EPHX1 abundance was still significantly lower following mutation of R338C (AA genotype). In fact, quantification at the peptide-level revealed 12 out of the remaining 14 peptides were also significantly lower in AA genotype (data not shown).

      Although we agree this a very important consideration, we are mindful of the length of the article and feel including these data would not significantly improve the manuscript. We therefore request to not include these data as it would detract from the main findings of the paper focused on phenotypic associations and validation of UFMylation as a regulator of muscle function.

      Figure 1R. (A) Identified peptides from EPHX1 mapped onto primary amino acid sequence highlighting the missense mutation induced by SNP rs32746574 that was associated to EPHX1 protein levels by pQTL analysis. (B) Relative quantification of EPHX1 between the two genotypes of SNP rs32746574 with and without the peptide neighboring the missense mutation (amino acids 339-347) (**p<0.001, students t-test)

    1. Author Response

      Reviewer #1 (Public Review):

      Building upon the previous evidence of activation of auditory cortex VIP interneurons in response to non-classical stimuli like reward and punishment, Szadai et al., extended the investigation to multiple cortical regions. Use of three-dimensional acousto-optical two-photon microscopy along with the 3D chessboard scanning method allowed high-speed signal acquisition from numerous VIP interneurons in a large brain volume. Additionally, activity of VIP interneurons in deep cortical regions was obtained using fiber photometry. With the help of these two imaging methods authors were able to extract and analyze the VIP cell signal from different cortical regions. Study of VIP interneuron activity during an auditory go-no-go task revealed that more than half of recorded cortical VIP interneurons were responding to both reward and punishment with high reliability. Fiber photometry data revealed similar observations; however, the temporal dynamics of reinforcement stimuli-related response in mPFC was slower than in the auditory cortex. The authors performed detailed analysis of individual cell activity dynamics, which revealed five categories of VIP cells based on their temporal profiles. Further, animals with higher performance on the discrimination task showed stronger VIP responses to 'go trials' possibly suggesting the role of VIP interneurons in discrimination learning. Authors found that reinforcement related response of VIP interneurons in visual cortex was not correlated with their sensory tuning, unveiling an interesting idea that VIP interneurons take part in both local as well as global processing. These observations bring attention to the possible involvement of VIP interneurons in reinforcement stimuli-associated global signaling that would regulate local connectivity and information processing leading to learning.

      The state-of-the-art imaging technique allowed authors to succeed in imaging VIP interneurons from several cortical regions. Advanced analyses revealed the nuances, similarities and differences in the VIP activity trend in various regions. The conclusions about reinforcement stimuli related activity of VIP interneurons made by the authors are well supported by the results obtained, however some claims and interpretations require more attention and clarification.

      We thank Reviewer #1 for the positive general comments.

      Reviewer #2 (Public Review):

      In recent years the activity of cortical VIP+ interneurons in relation to learning and sensory processing has raised great interest and has been intensely investigated. The ability of VIP+ interneurons in the auditory cortex to respond to both reward and punishment was already reported a few years ago by some of the authors (Pi et al., 2013, Nature). However, this work importantly adds to their previous study demonstrating a largely similar and synchronous response of a large fraction of these interneurons across the neocortex to salient stimuli of different valence during the performance of an auditory discrimination task.

      An additional strength of this study is the analysis and identification of the general pattern of VIP+ interneuron responses associated to specific behaviors in the different layers of the neocortex depth.

      Interestingly, the authors also identified using cluster analysis 5 different classes of VIP+ interneurons, based on the dynamic of their responses, that were unequally distributed in distinct cortical areas.

      This is a well performed study that took advantage of a cutting-edge imaging approach with high recording speed and good signal-to-noise ratio. Experiments are well performed and the data are properly analyzed and nicely illustrated. However, one shortcoming of this paper, in my opinion, is the "case report" structure of the data. Essentially for each neocortical area the activity of VIP+ interneurons was analyzed only in one animal. This limits the assessment of the stability of the response/recruitment of these interneurons. I appreciate the high number of recorded VIP+ interneurons per area/animal and I do understand that it would be excessively laborious to perform 3D random-access two-photon microscopy in several mice for each cortical area. On the other hand, it would be important to have some knowledge of the general variability of the responses of these neurons among animals.

      In conclusion, despite the findings described in this manuscript being generally sound, additional experiments are recommended to further substantiate the conclusions.

      Thank you for pointing out this potential misunderstanding. Although we mentioned the number of animals the recordings were obtained from (n=22 total), we repeated this multiple times to alleviate the potential confusion. The data recorded with the 2-photon microscope are from 16 animals, and fiber photometry was performed on a separate 6 animals. Each animal was recorded in one (14 mice) or two areas (8 mice, 2 AOD, 6 photometry). We aimed to acquire data from at least 3 recordings per area (4 in the primary somatosensory cortex, 6 in the primary and secondary motor cortices, 4 in the lateral and medial parietal cortices, 3 in the primary visual cortices, 6 in the auditory and medial prefrontal cortices). In the revised manuscript this information can be found at the beginning of the results section and in the figure legends:

      “To probe the behavioral function of VIP interneurons, we trained head-fixed mice (n=22 in total, n=16 for 2-photon microscopy and n=6 for fiber photometry) on a simple auditory discrimination task (Figure 1A).”

      “Among the 811 neurons imaged in 18 imaging sessions from 16 mice,”

      “Ca2+ responses of individual VIP interneurons recorded separately from 18 different cortical regions from 16 mice using fast 3D AO imaging were averaged for Hit (thick green), FA (thick red), Miss (dark blue), and CR (light blue). Fiber photometry data were recorded simultaneously from mPFC and ACx regions and are shown in gray boxes. Functional map (Kirkcaldie, 2012) used with the permission of the author. Speaker symbols represent the average time of tone onset, and gray triangles mark the reinforcement onset for Hit and FA. Averages of Miss and CR trials were aligned according to the expected reinforcement delivery calculated on the basis of the average reaction time. mPFC: medial prefrontal cortex (n=6 mice), ACx: auditory cortex (n=6), S1Hl/S1Tr/S1Bf/S1Sh: primary somatosensory cortex, hindlimb/trunk/barrel field/shoulder region (n=4), M1/M2: primary/secondary motor cortex (n=6), Mpta/Lpta: medial/lateral parietal cortex (n=4), V1: primary visual cortex (n=3).”

      “This approach allowed us to simultaneously measure bulk calcium-dependent signals from VIP interneurons located in the right medial prefrontal (mPFC) and left auditory cortices (ACx) by implanting two 400 µm optical fibers at these locations (n=6 sessions from n=6 mice, Figure 1–figure supplement 1C).”

      “Raster plot of the trial-to-trial activation of the responsive VIP neurons in Hit and FA trials during the two-photon imaging sessions (n=18 sessions, n=16 mice, n=746 cells).”

      Subregional labels, for example on Figure 2, should be considered as additional information to orient the readers, even if they were very precisely defined on the basis of the coordinates. All analyses considering regional differences were conducted on the level of the main functional areas of the dorsal cortex (motor, somatosensory, parietal, and visual). Despite some location-dependent heterogeneity in the late response phase (Figures 2G and H), even these main dorsal cortical regions were all similar from the perspective of responsiveness to reinforcers and auditory cues.

      Reviewer #3 (Public Review):

      In this study Szadai et al. show reliable, relatively synchronous activation of VIP neurons across different areas of dorsal cortex in response to reward and punishment of mice performing an auditory discrimination task. The authors use both a relatively fast 2 photon imaging, as well as fiber photometry for some deeper areas. They cluster neurons according to their temporal response profiles and show that these profiles differ across areas and cortical depths. Task performance, running behavior and arousal are all related to VIP response magnitude, as has been previously shown.

      Methodologically, this paper is strong: the described imaging technique allows for fairly fast sampling rates, they sample VIP cells from many different areas and the analyses are sophisticated and touch on the most relevant points. The figures are of high quality.

      However, as the manuscript is now, the presentation could be clearer, the methods more complete and it is not clear whether their conclusions are entirely supported by the data.

      The main issue is that reinforcement and arousal are hard to distinguish in this study. It is well known that VIP activity is correlated with arousal. And it is fairly clear that the reinforcement they use in this study - air puffs to the eye, as well as water rewards - cause arousal. It is possible that the reinforcer responses they observe in VIP neurons throughout all areas merely reflect the increases in arousal caused by these behaviorally salient events. They do discuss this caveat (albeit not fully convincingly) and in their abstract even state that the arousal state was not predictive of reinforcer responses. However their data clearly shows the tight relationship of the VIP reinforcer responses to both arousal (as measured by pupil diameter), as well as running speed of the animal. Both of these variables are well known to be tightly coupled to VIP activity.

      Although barely mentioned, the authors do appear to sometimes present uncued reward (Figure S2F). If responses were noticeably different from the same events in the task context (as actual reinforcers) this could at least hint towards the reinforcement signal being distinct from mere arousal. However, this data is only mentioned in one supplementary figure in a different context (comparison with PV cells) and neither directly compared to cued reward, nor is this discussed at all. Were uncued air puffs also presented? How do the responses compare to cued air puffs/punishment?

      Our original approach to distinguish between reinforcement- and arousal-related responses aimed:

      1) to show that VIP cells with both low and high correlation coefficients with arousal produce large signals upon reinforcement presentation (Figure 3B),

      2) the high differences of low and high arousal changes were reflected in a limited way in the VIP activity (Figures 3C and D): as highlighted in Figure R1, where we also added bars to show ∆P/P in high and low pupil change conditions, the difference in ∆P/P is ~5-fold, while it is only ~1.5-fold for ∆F/F. This disproportionality suggests that a large part of the signal below the dashed blue line is independent of arousal. We have added these modifications to the new version of Figure 3 for clarity.

      Figure R1 = Figure 3C-D with modification. Comparison of pupil changes and corresponding calcium averages.

      We collected further evidence to support our claims. In Figure 3–figure supplement 2 we depicted Hit and FA trials in which the reinforcement didn’t elevate the arousal level any further. Many of these trials were associated with locomotion prior to the reinforcement, but it was also common that the animals remained still during the whole trial. Trials with increased locomotion upon reinforcement presentation were excluded. Reinforcement-related calcium signals were still present under these conditions, indicating that these signals are not simple reflections of arousal. Moreover, we estimate the distinct contributions of arousal, locomotion, and reinforcers in Figure 3–figure supplement 2D in a systematic way with a generalized linear model. This model also confirmed our view about the reinforcement-related coding.

      We now say in the results:

      “Finally, to assess the motor- and reinforcement-related contributions to VIP interneuronal activity, we built a generalized linear model using the behavior and imaging data of the SS and Mtr recordings (Figure 3–figure supplement 2D, n=3 mice). This model was able to explain 18.8 ± 11.1% of the variance of the VIP population calcium signal, and highlighted that arousal was the best predictor, followed by reward, punishment, locomotion velocity, and auditory cue (weights = 0.055, 0.031, 0.028, 0.020, 0.018 respectively; all predictors, except the auditory cue in the case of one animal, contributed significantly, p<0.001). These observations indicate that running and arousal changes alone cannot fully explain the recruitment of VIP interneurons by reinforcers.”

      We apologize for not describing the rational and the result from the uncued reward experiments. Briefly, while recording reinforcement related signals in auditory cortex in our task, we realized that the cue delivery, and the resulting purely sensory response could alter the measurement of the reward-related responses. Hence, in order to disentangle the reward and sensory-related responses, we presented the animals with simple, uncued reward and observed a similar and robust recruitment of VIP interneurons. Based on the same rational, we made similar measurement for PV neurons.

      We now say in the results:

      “We did not further analyze the FA responses in auditory cortex as those responses also had a sensory component linked to the white noise-like sound created by the air puff delivery. Because the cue delivery could prove as a confound to measure reward-mediated responses from VIP interneurons in auditory cortex (see also methods), we delivered random reward in separate sessions. Water droplets delivery recruited VIP interneurons in both auditory and medial prefrontal cortex in a similar fashion as water delivery during the discrimination task (Figure 2–figure supplement 1G). Like our single cell results, PV-expressing neuronal population in ACx did not show any significant change in activity upon similar random reward delivery (Figure 2–figure supplement 1G).”

      Regarding the difference between cued and uncued responses, we definitely agree with the reviewer that it is an important point. The goal of this manuscript is however to study how reward and punishment are being represented by VIP interneurons in cortex.

      The imaging method appears well suited for their task, however the improvements listed in table S1 make the method appear far superior to existing methods in many aspects. Published or preprinted papers with 2 photon imaging of VIP populations (eg. from Scanziani lab (Keller et al.), Carandini lab (Dipoppa et al.), deVries lab (Millman et al.), Adesnik lab (Mossing et al.), which use the much more common resonant scanning, seem to be able to image 4-7 layers at 4-8Hz with a good enough SNR and potentially bigger neuronal yield of approximately 100-200 VIP cells, depending on the field of view. While not every single cell in a volume would be captured by these studies, the only main advantage of the here-used technique appears to be the superior temporal resolution.

      We thank the reviewer for the positive comment and we agree that interpretation must be improved. We agree that the imaging methods in the papers listed above have good SNR and were proper to address the scientific questions that had arisen. As the reviewer points out, 3D-AOD imaging allows fast 3D measurement that cannot be achieved otherwise. We used these advantages to address the critical question of layer specificity in the response of VIP interneurons to reinforcer presentation (Figure 2–figure supplement 1F, but see also the new Figure 1–figure supplement 1B). Regarding the comparison and quantification of the factual advantages of AOD microscopy over other imaging methods, the reviewer and readers can refer to the methods section (3D AO microscopy), Table S1 and Szalay et al., 2016. We agree with the reviewer that one of the main advantages is the superior temporal resolution. The second main advantage is the improved SNR. This originates from the fact that the entire measurement time is spent on regions of interest; measurement of unnecessary background areas is not required. More specifically, SNR is improved even in the case of 2D imaging by the factor of:

      ((area of the entire frame )/(area of the recorded VIP cells))^0.5

      which is about (100)0.5=10 as VIP interneurons represent about 1% of the brain. We used this second advantage of AO scanning when we determined the activation ratio (e.g., see Figure 2D).

      As the resolution of single or a few action potentials is challenging in behaving mice labelled with the GCaMP6 sensor, any improvement in SNR will improve the detection threshold. The higher SNR achieved here improved the detection threshold, which also explains the relatively high activation ratio in our work.

      In the case of asynchronous activity patterns, there is negligible contribution of individual small neuropil structures to somatic activities because of the relatively high volume-ratio of a soma and a given small neuropil structure: this minimizes the error during ∆F/F calculation of somatic responses. However, reinforcement, arousal, and running can generate highly synchronous neuronal activities which can synchronize neuropil activity around a given soma and, therefore, effectively and systematically modulating the somatic ∆F/F responses. To avoid this error, we used a high NA objective with proper neuropil resolution and combined it with motion correction. The use of the high NA also decreased the total scanning volume to about 689 µm × 639 µm × 580 µm and, therefore, it limited the maximum number of VIP cells which could be recorded. It is also possible to use a low-NA objective with a much higher FOV and scanning volume and record over 1000 VIP cells, but the extension of the PSF along the z dimension is inversely and quadratically proportional to the NA of the objective, therefore neuropil resolution will be at least partially lost. In summary, using the high-NA Olympus objective we maximized the 2P resolution which, in combination with off-line motion artifact elimination, allowed precise recording of somatic signals without any neuropil contamination: this provided correct activation ratio values.

      Even though this is not mentioned at all, it certainly appears possible, that the accousto-optical scanning emits audible noise. In this case it would be good to know the frequency range and level of this background noise, whether there are auditory responses to the scanning itself and if it interferes with the performance of the animals in the auditory task in any way. If this is not the case, this should probably simply be mentioned for non-experts.

      While the name of the acousto-optical deflectors seems to refer to “acoustic noise”, these devices are driven in the range of 55-120 MHz, which is 3 orders of magnitude higher frequency than the hearing threshold of animals: mice don’t hear them. Moreover, we developed water-cooled AODs ten years ago which means that ventilators are also not required, therefore AOD-based scanning can be used with zero noise emission. In contrast, galvo, resonant, and piezo scanning work in the kHz frequency range, which is in the middle of the hearing range of mice. Moreover, these technologies can’t be used in a vacuum and the scanner is just a few tens of centimeters away from the mice, which means that acoustic noise can’t be canceled but can only be partially suppressed with white noise. We thank the reviewer for the helpful comment and have added one sentence about the absence of acoustic noise during acousto-optical scanning:

      “The deflectors are driven in the 55-120 MHz frequency range, therefore the noise emitted does not interfere with the auditory cues, as mice can’t hear it. This, in combination with the water cooling of the deflectors, makes the AOD-based scanning the quietest technology for in-vivo imaging.”

      The authors show a strong correlation between task performance (hit rate) and the response to the auditory cue on hit trials. Was there any other significant correlations of VIP cells' responses to other trial types? Was reinforcer response correlated to behavioral variables at all?

      We have not found any remarkable correlations between VIP cell activity and behavioral variables except the one mentioned above.

      For example, we tested discrimination rate (hit rate/FA rate) correlation with ∆F/Ftone in Hit trials, but this was not significant (R2=0.03, F=0.49, p=0.69), just like Hit rate vs. ∆F/Ftone in FA trials (R2=0.19, F=3.8, p=0.07), and discrimination rate vs. ∆F/Ftone in FA trials (R2=0.07, F=1.1, p=0.31).

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Carrasquilla and colleagues applied Mendelian Randomization (MR) techniques to study causal relationship of physical activity and obesity. Their results support the causal effects of physical activity on obesity, and bi-directional causal effects of sedentary time and obesity. One strength of this work is the use of CAUSE, a recently developed MR method that is robust to common violations of MR assumptions. The conclusion reached could potentially have a large impact on an important public health problem.

      Major comments:

      (1) While the effect of physical activity on obesity is in line with earlier studies, the finding that BMI has a causal effect on sedendary time is somewhat unexpected. In particular, the authors found this effect only with CAUSE, but the evidence from other MR methods do not reach statistical significance cutoff. The strength of CAUSE is more about the control of false positive, instead of high power. In general, the power of CAUSE is lower than the simple IVW method. This is also the case in this setting, of high power of exposure (BMI) but lower power of outcome (sedentary time) - see Fig. 2B of the CAUSE paper.

      It does not necessarily mean that the results are wrong. It's possible for example, by better modeling pleiotropic effects, CAUSE better captures the causal effects and have higher power. Nevertheless, it would be helpful to better understand why CAUSE gives high statistical significance while others not. Two suggestions here:

      (a) It is useful to visualize the MR analysis with scatter plot of the effect sizes of variants on the exposure (BMI) and outcome (sedentary time). In the plot, the variants can be colored by their contribution to the CAUSE statistics, see Fig. 4 of the CAUSE paper. This plot would help show, for example, whether there are outlier variants; or whether the results are largely driven by just a small number of variants.

      We agree and have now added a scatter plot of the expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Figure 2-figure supplement 1 panel A). We identified one clear outlier variant (red circle) that we thus decided to remove before re-running the CAUSE analysis (panel B). We found that the causal effect of BMI on sedentary time remained of similar magnitude before and after the removal of this outlier variant (beta=0.13, P=6x10-4 and beta=0.13, P=3x10-5, respectively) (Supplementary File 1 and 2).

      We have added a paragraph in the Results section to describe these new findings:

      Lines 204-210: “We checked for outlier variants by producing a scatter plot of expected log pointwise posterior density (ELPD) contributions of the variants to BMI and sedentary time (Supplementary File 1), identifying one clear outlier variant (rs6567160 in MC4R gene) (Figure 2, Appendix 1—figure 2). However, the causal effect of BMI on sedentary time remained consistent even after removing this outlier variant from the CAUSE analysis (Supplementary File 1 and 2).”

      (b) CAUSE is susceptible to false positives when the value of q, a measure of the proportion of shared variants, is high. The authors stated that q is about 0.2, which is pretty small. However, it is unclear if this is q under the causal model or the sharing model. If q is small under the sharing model, the result would be quite convincing. This needs to be clarified.

      We thank the reviewer for a very relevant question. We have now clarified in the manuscript that all of the reported q values (~0.2) were under the causal model (lines 202-203). We applied the strict parameters for the priors in CAUSE in all of our analyses, which leads to high shared model q values (q=0.7-0.9). To examine whether our bidirectional causal findings for BMI and sedentary time may represent false positive results, we performed a further analysis to identify and exclude outlier variants, as described in our response to Question 7. I.e. we produced a scatter plot of expected log pointwise posterior density (ELPD) contributions of each variant to BMI and sedentary time, and the contributions of the variants to selecting either the causal model or the shared model (Supplementary Figure 2 panel A, shown above). We identified one clear outlier variant (red circle) that we thus removed (panel B), but the magnitude of the causal estimates was not affected by the exclusion of the variant (Supplementary File 1 and 2).

      (2) Given the concern above, it may be helpful to strengthen the results using additional strategy. Note that the biggest worry with BMI-sedentary time relation is that the two traits are both affected by an unobserved heritable factor. This hidden factor likely affects some behavior component, so most likely act through the brain. On the other hand, BMI may involve multiple tissue types, e.g. adipose. So the idea is: suppose we can partition BMI variants into different tissues, those acted via brain or via adipose, say; then we can test MR using only BMI variants in a certain tissue. If there is a causal effect of BMI on sedentary time, we expect to see similar results from MR with different tissues. If the two are affected by the hidden factor, then the MR analysis using BMI variants acted in adipose would not show significant results.

      While I think this strategy is feasible conceptually, I realize that it may be difficult to implement. BMI heritability were found to be primarily enriched in brain regulatory elements [PMID:29632380], so even if there are other tissue components, their contribution may be small. One paper does report that BMI is enriched in CD19 cells [PMID: 28892062], though. A second challenge is to figure out the tissue of origin of GWAS variants. This probably require fine-mapping analysis to pinpoint causal variants, and overlap with tissue-specific enhancer maps, not a small task. So I'd strongly encourage the authors to pursue some analysis along this line, but it would be understandable if the results of this analysis are negative.

      We thank the reviewer for a very interesting point to address. We cannot exclude the possibility of an unobserved heritable factor acting through the brain, and tissue-specific MR analyses would be one possible way to investigate this possibility. However, we agree with the reviewer that partitioning BMI variants into different tissues is not currently feasible as the causal tissues and cell types of the GWAS variants are not known. Nevertheless, we have now implemented a new analysis where we tried to stratify genetic variants into “brain-enriched” and “adipose tissue-enriched” groups, using a simple method based on the genetic variants’ effect sizes on BMI and body fat percentage.

      Our rationale for stratifying variants by comparing their effect sizes on BMI and body fat percentage is the following:

      BMI is calculated based on body weight and height (kg/m2) and it thus does not distinguish between body fat mass and body lean mass. Body fat percentage is calculated by dividing body fat mass by body weight (fat mass / weight * 100%) and it thus distinguishes body fat mass from body lean mass. Thus, higher BMI may reflect both increased fat mass and increased lean mass, whereas higher body fat percentage reflects that fat mass has increased more than lean mass.

      In case a genetic variant influences BMI through the CNS control of energy balance, its effect on body fat mass and body lean mass would be expected to follow the usual correlation between the traits in the population, where higher fat mass is strongly correlated with higher lean mass. In such a scenario, the variant would show a larger standardized effect size on BMI than on body fat percentage. In case a genetic variant more specifically affects adipose tissue, the variant would be expected to have a more specific effect on fat mass and less effect on lean mass. In such scenario, the variant would show a larger standardized effect size on body fat percentage than on BMI.

      We therefore stratified BMI variants into brain-specific and adipose tissue-specific variants by comparing their standardized effect sizes on BMI body body fat percentage. Of the 12,790 variants included in the BMI-sedentary time CAUSE analysis, 12,266 had stronger effects on BMI than on body fat percentage and were thus classified as “brain-specific”. The remaining 524 variants had stronger effects on body fat percentage than on BMI (“adipose tissue-specific”). To assess whether the stratification of the variants led to biologically meaningful groups, we performed DEPICT tissue-enrichment analyses. The analyses showed that the genes expressed near the “brain-specific” variants were enriched in the CNS (figure below, panel A), whereas the genes expressed near the “adipose tissue-specific” variants did not reach significant enrichment at any tissue, but the showed strongest evidence of being linked to adipocytes and adipose tissue (figure below, panel B).

      Figure legend: DEPICT cell, tissue and system enrichment bar plots for BMI-sedentary time analysis.

      Having established that the two groups of genetic variants likely represent tissue-specific groups, we re-estimated the causal relationship between BMI and sedentary time using CAUSE, separately for the two groups of variants. We found that the 12,266 “brain-specific” genetic variants showed a significant causal effect on sedentary time (P=0.003), but the effect was attenuated compared to the CAUSE analysis where all 12,790 variants (i.e. also including the 524 “adipose tissue-specific” variants) were included in the analysis (P=6.3.x10-4). The statistical power was much more limited for the “adipose tissue-specific” variants, and we did not find a statistically significant causal relationship between BMI and sedentary time using the 524 “adipose tissue-specific” variants only (P=0.19). However, the direction of the effect suggested the possibility of a causal effect in case a stronger genetic instrument was available. Taken together, our analyses suggest that both brain-enriched and adipose tissue-enriched genetic variants are likely to show a causal relationship between BMI and sedentary time, which would suggest that the causal relationship between BMI and sedentary time is unlikely to be driven by an unobserved heritable factor.

      Minor comments

      The term "causally associated" are confusing, e.g. in l32. If it's causal, then use the term "causal".

      We have now changed the term “causally associated” to “causal” throughout the manuscript.

      Reviewer #3 (Public Review):

      Given previous reports of an observational relationship between physical inactivity and obesity, Carrasquilla and colleagues aimed to investigate the causal relationship between these traits and establish the direction of effect using Mendelian Randomization. In doing so, the authors report strong evidence of a bidirectional causal relationship between sedentary time and BMI, where genetic liability for longer sedentary time increases BMI, and genetic liability for higher BMI causally increases sedentary time. The authors also give evidence of higher moderate and vigorous physical activity causally reducing BMI. However they do note that in the reverse direction there was evidence of horizontal pleiotropy where higher BMI causally influences lower levels of physical activity through alternative pathways.

      The authors have used a number of methods to investigate and address potential limiting factors of the study. A major strength of the study is the use of the CAUSE method. This allowed the authors to investigate all exposures of interest, in spite of a low number of suitable genetic instruments (associated SNPs with P-value < 5E-08) being available, which may not have been possible with the use of the more conventional MR methods alone. The authors were also able to overcome sample overlap with this method, and hence obtain strong causal estimates for the study. The authors have compared causal estimates obtained from other MR methods including IVW, MR Egger, the weighted median and weighted mode methods. In doing so, they were able to demonstrate consistent directions of effects for most causal estimates when comparing with those obtained from the CAUSE method. This helps to increase confidence in the results obtained and supports the conclusions made. This study is limited in the fact that the findings are not generalizable across different age-groups or populations - although the authors do state that similar results have been found in childhood studies. As the authors also make reference to, due to the nature of the BMI genetic instruments used, the findings of this study can only inform on the lifetime impact of higher BMI, and not the effect of a short-term intervention.

      The findings of this study will be of interest to those in the field of public health, and support current guidelines for the management of obesity.

      We thank the Reviewer for the valuable feedback and insights. We agree that the lack of generalizability of the findings across age groups and populations is an important limitation. We have now mentioned this in lines 341-342 of the manuscript:

      “The present study is also limited in the fact that the findings are not generalizable across different age-groups or populations.”

    1. Author Response

      Reviewer #1 (Public Review):

      As far as I can tell, the input to the model are raw diffusion data plus a couple of maps extracted from T2 and MT data. While this is ok for the kind of models used here, it means that the networks trained will not generalise to other diffusion protocols (e.g with different bvecs). This greatly reduces to usefulness of this model and hinders transfer to e.g. human data. Why not use summary measures from the data as an input. There are a number of rotationally invariant summary measures that one can extract. I suspect that the first layers of the network may be performing operations such as averaging that are akin to calculating summary measures, so the authors should consider doing that prior to feeding the network.

      We agree with the reviewer that using summary measures will make the tool less dependent on particular imaging protocols and more translatable than using rawdata as inputs. We have experimented using a set of five summary measures (T2, magnetization transfer ratio (MTR), mean diffusivity, mean kurtosis, and fractional anisotropy) as inputs. The prediction based on these summary measures, although less accurate than predictions based on rawdata in terms of RMSE and SSIM (Figure 2A), still outperformed polynomial fitting up to 2nd order. The result, while promising, also highlights the need for finding a more comprehensive collection of summary measures that match the information available in the raw data. Further experiments with existing or new summary measures may lead to improved performance.

      The noise sensitivity analysis is misleading. The authors add noise to each channel and examine the output, they do this to find which input is important. They find that T2/MT are more important for the prediction of the AF data, But majority of the channels are diffusion data, where there is a lot of redundant information across channels. So it is not surprising that these channels are more robust to noise. In general, the authors make the point that they not only predict histology but can also interpret their model, but I am not sure what to make of either the t-SNE plots or the rose plots. I am not sure that these plots are helping with understanding the model and the contribution of the different modalities to the predictions.

      We agree that there is redundant information across channels, especially among diffusion MRI data. In the revised manuscript, we focused on using the information derived from noise-perturbation experiments to rank the inputs in order to accelerate image acquisition instead of interpreting the model. We removed the figure showing t-SNE plots with noisy inputs because it does not provide additional information.

      Is deep learning really required here? The authors are using a super deep network, mostly doing combinations of modalities. is the mapping really highly nonlinear? How does it compare with a linear or close to linear mapping (e.e. regression of output onto input and quadratic combinations of input)? How many neurons are actually doing any work and how many are silent (this can happen a lot with ReLU nonlinearities)? In general, not much is done to convince the reader that such a complex model is needed and whether a much simpler regression approach can do the job.

      The deep learning network used in the study is indeed quite deep, and there are two main reasons for choosing it over simpler approaches.

      The primary reason to pick the deep learning approach is to accommodate complex relationships between MRI and histology signals. In the revised Figure 2A-B, we have demonstrated that the network can produce better predictions of tissue auto-fluorescence (AF) signals than 1st and 2nd order polynomial fitting. For example, the predicted AF image based on 5 input MR parameters shared more visual resemblance with the reference AF image than images generated by 1st and 2nd order polynomial fittings, which were confirmed by RMSE and SSIM values. The training curves shown in Fig. R1 below demonstrate that, for learning the relationship between MRI and AF signals, at least 10 residual blocks (~ 24 layers) are needed. Later, when learning the relationship between MRI and Nissl signals, 30 residual blocks (~64 layers) were needed, as the relationship between MRI and Nissl signals appears less straightforward than the relationship between MRI and AF/MBP/NF signals, which have a strong myelin component. In the revised manuscript, we have clarified this point, and the provided toolbox allows users to select the number of residual blocks based on their applications.

      Fig. R1: Training curves of MRH-AF with number of residual blocks ranging from 1 to 30 showing decreasing RMSEs with increasing iterations. The curves in the red rectangular box on the right are enlarged to compare the RMSE values. The training curves of 10 and 30 residual blocks are comparable, both converged with lower RMSE values than the results with 1 and 5 residual blocks.

      In addition, the deep learning approach can better accommodate residual mismatches between co-registered histology and MRI than polynomial fitting. Even after careful co-registration, residual mismatches between histology and MRI data can still be found, which pose a challenge for polynomial fittings. We have tested the effect of mismatch by introducing voxel displacements to perfectly co-registered diffusion MRI datasets and demonstrated that the deep learning network used in this study can handle the mismatches (Figure 1 – figure supplement 1).

      Relatedly, the comparison between the MRH approach and some standard measures such as FA, MD, and MTR is unfair. Their network is trained to match the histology data, but the standard measures are not. How does the MRH approach compare to e.g. simply combining FA/MD/MTR to map to histology? This to me would be a more relevant comparison.

      This is a good idea. We have added maps generated by linear fitting of five MR measures (T2, MTR, FA, MD, and MK) to MBP for a proper comparison. Please see the revised Figure 3A-B. The MRH approach provided better prediction than linear fitting of the five MR measures, as shown by the ROC curves in Figure 3C.

      • Not clear if there are 64 layers or 64 residual blocks. Also, is the convolution only doing something across channels? i.e. do we get the same performance by simply averaging the 3x3 voxels?

      We have revised the paragraph on the network architecture to clarify this point in Figure 1 caption as well as the Methods section. We used 30 residual blocks, each consists of 2 layers. There are additional 4 layers at the input and output ends, so we had 64 layers in total.

      The convolution mostly works across channels, which is what we intended as we are interested in finding the local relationship between multiple MRI contrasts and histology. With inputs from modified 3x3 patches, in which all voxels were assigned the same values as the center voxel, the predictions of MRH-AF did not show apparent loss in sensitivity and specificity, and the voxel-wise correlation with reference AF data remained strong (See Fig. R2 below). We think this is an important piece of information and added it as Figure 1 – figure supplement 3. Averaging the 3x3 voxels in each patch produced similar results.

      Fig. R2: Evaluation of MRH-AF results generated using modified 3x3 patches with 9 voxels assigned the same MR signals as the center voxel as inputs. A: Visual inspection showed no apparent differences between results generated using original patches and those using modified patches. B: ROC analysis showed a slight decrease in AUC for the MRH-AF results generated using modified patches (dashed purple curve) compared to the original (solid black curve). C: Correlation between MRH-AF using modified patches as inputs and reference AF signals (purple open circles) was slightly lower than the original (black open circles).

      The result in the shiverer mouse is most impressive. Were the shiverer mice data included in the training? If not, this should be mentioned/highlighted as it is very cool.

      Data from shiverer mice and littermate controls were not included in the training. We have clarified this point in the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used GWAS and RNAseq data of TCGA to show a link between telomere length and lung cancer. Authors identified novel susceptibility loci that are associated with lung adenocarcinoma risk. They showed that longer telomeres were associated with being a female nonsmoker and early-stage cancer with a signature of cell proliferation, genome stability, and telomerase activity.

      Major comments:

      1) It is not clear how are the signatures captured by PC2 specific for lung adenocarcinoma compared to other lung subtypes. In other words, why is the association between long telomeres specific to lung adenocarcinoma?

      We thank the reviewer for raising this point (similarly mentioned by reviewer #2). Indeed, it is unclear why genetically predicted LTL appears more relevant to lung adenocarcinoma. We have used LASSO approach to select important features of PC2 in lung adenocarcinoma and inferred PC2 in lung squamous cell carcinomas tumours to better explore the differences between histological subtypes. The new results are presented in Figure 5, as well as being described in the methods and results sections. In addition, we have expanded upon this point in the discussion with the following paragraph (page 11, lines 229-248):

      ‘An explanation for why long LTL was associated with increased risk of lung cancer might be that individuals with longer telomeres have lower rates of telomere attrition compared to individuals with shorter telomeres. Given a very large population of histologically normal cells, even a very small difference in telomere attrition would change the probability that a given cell is able to escape the telomere-mediated cell death pathways (24). Such inter-individual differences could suffice to explain the modest lung cancer risk observed in our MR analyses. However, it is not clear why longer TL would be more relevant to lung adenocarcinoma compared to other lung cancer subtypes. A suggestion may come from our observation that longer LTL is related to genomic stable lung tumours (such as lung adenocarcinomas in never smokers and tumours with lower proliferation rates) but not genomic unstable lung tumours (such as heavy smoking related, highly proliferating lung squamous carcinomas). One possible hypothesis is that histologic normal cells exposed to highly genotoxic compounds, such as tobacco smoking, might require an intrinsic activation of telomere length maintenance at early steps of carcinogenesis that would allow them to survival, and therefore, genetic differences in telomere length are less relevant in these cells. By contrast, in more genomic stable lung tumours, where TL attrition rate is more modest, the hypothesis related to differences in TL length may be more relevant and potentially explaining the heterogeneity in genetic effects between lung tumours (Figure 2). Alternately, we also note that the cell of origin may also differ, with lung adenocarcinoma is postulated to be mostly derived from alveolar type 2 cells, the squamous cell carcinoma is from bronchiolar epithelium cells (19), possibly suggesting that LTL might be more relevant to the former.

      2) The manuscript is lacking specific comparisons of gene expression changes across lung cancer subtypes for identified genes such as telomerase etc since all the data is presented as associations embedded within PCs.

      The genes associated with telomere maintenance such as TERT and TERC are very low expressed in these tumours (Barthel et al NG 2017). In this context, no sample has more than 5 normalised read counts by RNA-sequencing for TERT within TCGA lung cohorts (TCGA-LUSC, TCGA-LUAD). As such we have not explored the difference by individual telomere related genes. Nevertheless, we have explored an inferred telomerase activity gene signature, developed by Barthel et al and we did explore this in the context of lung adenocarcinoma tumours. We have added a note in the result section to inform the reader regarding why we did not directly test TERT/TERC expression (page 9, lines 184-187).

      3) It is not clear how novel are the findings given that most of these observations have been made previously i.e. the genetic component of the association between telomere length and cancer.

      Others, including ourselves, have studied TL and lung cancer. We have built on that on the most updated TL genetic instrument and the largest lung cancer study available. In addition, we provided insights into the possible mechanisms in which telomere length might affect lung adenocarcinoma development. Using colocalisation analyses, we reported novel shared genetic loci between telomere length and lung adenocarcinoma (MPHOSPH6, PRPF6, and POLI), such genes/loci that have not previously linked to lung adenocarcinoma susceptibility. For MPHOSPH6 locus, we showed that the risk allele of rs2303262 (missense variant annotated for MPHOSPH6 gene) colocalized with increased lung adenocarcinoma risk, lower lung function (FEV1 and FVC), and increased MPHOSPH6 gene expression in lung, as highlighted in the discussion section of the revised manuscript.

      In addition, we have used a PRS analysis to identify a gene expression component associated with genetically predicted telomere length in lung adenocarcinoma but not in squamous cell carcinoma subtype. The aspect of this gene expression component associated with longer telomere length are also associated with molecular characteristics related to genome stability (lower accumulation of DNA damage, copy number alterations, and lower proliferation rates), being female, early-stage tumours, and never smokers, which is an interesting but not completely understood lung cancer strata. As far as we are aware, this is the first time an association between a PRS related to an etiological factor, such as telomere length and a particular expression component in the tumour.

      We have adjusted the discussion further highlight the novel aspects in the discussion section of the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript of Penha et al performs genetic correlation, Mendelian randomization (MR), and colocalization studies to determine the role of genetically determined leukocyte telomere length (LTL) and susceptibility to lung cancer. They develop an instrument from the most recent published association of LTL (Codd et al), which here is based on n=144 genetic variants, and the largest association study of lung cancer (including ~29K cases and ~56K controls). They observed no significant genetic correlation between LTL and lung cancer, in MR they observed a strong association that persisted after accounting for smoking status. They performed colocalization to identify a subset of loci where LTL and lung cancer risk coincided, mainly around TERT but also other loci. They also utilized RNA-Seq data from TCGA lung cancer adenocarcinoma, noting that a particular gene expression profile (identified by a PC analysis) seemed to correlate with LTL. This expression component was associated with some additional patient characteristics, genome stability, and telomerase activity.

      In general, most of the MR analysis was performed reasonably (with some suggestions and comments below), it seems that most of this has been performed, and the major observations were made in previous work. That said, the instrument is better powered and some sub-analyses are performed, so adds further robustness to this observation. While perhaps beyond the scope here, the mechanism of why longer LTL is associated with (lung) cancer seems like one of the key observations and mechanistically interesting but nothing is added to the discussion on this point to clarify or refute previous speculations listed in the discussion mentioned here (or in other work they cite).

      Some broad comments:

      1) The observations that lung adenocarcinoma carries the lion's share of risk from LTL (relative to other cancer subtypes) could be interesting but is not particularly highlighted. This could potentially be explored or discussed in more detail. Are there specific aspects of the biology of the substrata that could explain this (or lead to testable hypotheses?)

      We thank the reviewer for these comments. A similar point was raised by reviewer #1. Please see our response above, as well as the additional analysis described in Figure 5 that considers the differences by histological subtype.

      2) Given that LTL is genetically correlated (and MR evidence suggests also possibly causal evidence in some cases) across a range of traits (e.g., adiposity) that may also associate with lung cancer, a larger genetic correlation analysis might be in order, followed by a larger set of multivariable MR (MVMR) beyond smoking as a risk factor. Basically, can the observed relationship be explained by another trait (beyond smoking)? For example, there is previous MR literature on adiposity measures, for example (BMI, WHR, or WHRadjBMI) and telomere length, plus literature on adiposity with lung cancer; furthermore, smoking with BMI. A bit more comprehensive set of MVMR analyses within this space would elevate the significance and interpretation compared to previous literature.

      Indeed, there are important effects related to BMI and lung cancer (Zhou et al., 2021. Doi:10.1002/ijc.33292; Mariosa et al., 2022. Doi: 10.1093/jnci/djac061). We have tested the potential for influence on our finding using MVMR, modelling LTL and BMI using a BMI genetic instrument of 755 SNPs obtained from UKBB (feature code: ukb-b-19953). This multivariate approach did not result any meaningful changes in the associations between LTL and lung cancer risk.

      3) In the initial LTL paper, the authors constructed an IV for MR analyses, which appears different than what the authors selected here. For example, Codd et al. proposed an n=130 SNP instrument from their n=193 sentinel variants, after filtering for LD (n=193 >>> n=147) and then for multi-trait association (n=147 >> n=130). I don't think this will fundamentally change the author's result, but the authors may want to confirm robustness to slightly different instrument selection procedures or explain why they favor their approach over the previous one.

      We appreciate the reviewer’s suggestion. Our study is designed for a Mendelian Randomization framework and chose to be conservative in the construction of our instrumental variable (IV). We therefore applied more stringent filters to the LTL variants relative to Codd et al’s approach. We applied a wider LD window (10MB vs. 1MB) centered around the LTL variants that were significant at genome-wide level (p<5e-08) and we restricted our analyses to biallelic common SNPs (MAF>1% and r2<0.01 in European population from 1000 genomes). Nevertheless, the LTL genetic instrument based on our study (144 LTL variants) is highly correlated with the PRS based on the 130 variants described by Codd et al. (correlation estimate=0.78, p<2.2e-16). The MR analyses based on the 130 LTL instrument described by Codd et al showed similar results to our study.

      4) Colocalization analysis suggests that a /subset/ of LTL signals map onto lung cancer signals. Does this mean that the MR relationships are driven entirely by this small subset, or is there evidence (polygenic) from other loci? Rather than do a "leave one out" the authors could stratify their instrument into "coloc +ve / coloc -ve" and redo the MR analyses.

      Mainly here, the goal is to interpret if the subset of signals at the top (looks like n=14, the bump of non-trivial PP4 > 0.6, say) which map predominantly to TERT, TERC, and OBFC1 explain the observed effect here. I.e., it is biology around these specific mechanisms or generally LTL (polygenicity) but exemplified by extreme examples (TERT, etc.). I appreciate that statistical power is a consideration to keep in mind with interpretation.

      We appreciate the reviewer’s comment and, indeed, we considered this idea. However, the analytical approach used the lung cancer GWAS to identify variants that colocalise. To validate this hypothesis that a subset of colocalised variants would be driving all the MR associations, we would need an independent lung cancer case control study to act as an out-of-sample validation set. This is not available to us at this point. Nevertheless, we slightly re-worded the discussion to highlight that the colocalised loci tend to be near genes related to telomere length biology and are also exploring the colocalisation approach to select variants for PRS analysis elsewhere.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examine the role of the K700E mutation in the Sf3B1 splicing factor in PDAC and report that this Sf3B1 mutation promotes PDAC by decreasing sensitivity to TGF-b resulting in decreased EMT and decreased apoptosis as a result. They propose that the Sf3b1 K700E mutant causes decreased expression of Map3K7, a known mediator of TGF-β signaling and also known to be alternately spliced in other systems by the Sf3b1 K700E mutation. The role of splicing defects in cancer is relatively understudied and could identify novel targets for therapeutic intervention so this work is of potential significance. However, the data is over-interpreted in many instances and it is not clear the authors can make the claims they do based on the data shown. In particular, the data showing that decreased Map3k7 underlies the effects of the Sf3b1K700E mutant is very weak. Does over-expression of Map3k7 promote the EMT signature and induce apoptosis? Do the Map3k7 expressing organoids form tumors more effectively when transplanted into mice? Also, the novelty of the work is a concern since aberrant Map3k7 splicing due to SF3B1 mutation was seen previously in other systems. The authors also do not address the apparent conundrum of Sf3b1 K700E mutation promoting tumorigenesis despite there being less EMT which is also required for progression to metastasis in PDAC.

      Major Concerns.

      1) The analysis of the effect of Sf3b1K700E expression on normal pancreas and on PanINs in KC mice and PDAC in KPC mice is superficial and could be enhanced by staining for amylase, cytokeratin-19 and insulin. In particular, the data quantified in figure 1L should be accompanied by staining for CK19, Mucin5AC or some other marker of ductal transformation. Also, are any effects seen at older ages in normal mice?

      We performed staining of normal and cancerous mouse pancreata using Ck19, MUC5AC and b-amylase antibodies. In line with our hypothesis that Sf3b1K700E mainly plays a role in early stages of PDAC formation, we observed significant differences in CK19 (increase), MUC5AC (increase) and b-amylase (decrease) expression in early stage KPC-Sf3b1K700E vs. KPC tumors (Fig. 1G-J), but not in late stage tumors (see Figure 1-figure supplement 1F-I). In addition, no differences were observed in normal mice. We added these data to the revised manuscript (see Figure 1-figure supplement 1D, E).

      2) The invasion assays used are limited and should be complemented by more routine quantification of cell migration and invasion including such assays as a scratch assay, Boyden chamber assays and use of the IncuCyte system to quantify. As it stands the image in Figure 3B is difficult to interpret since it is very poorly described in the figure legend. Additional evidence is needed to make the claims made by the authors.

      During the revisions we performed wound healing/scratch assays using PANC-1 cells with inducible SF3B1 WT/K700E overexpression. We observed a significant difference in migratory capacity between SF3B1 WT- and SF3B1 K700E overexpressing cells stimulated with TGF-β. We added this data to the revised manuscript (Fig. 2I, J). We also describe the abovementioned figure 3B in more detail (revised manuscript Fig. 2G, H; line 759-767).

      3) The authors should show the actual CC3 staining quantified in Suppl. Figure 2G.

      We added a representative image of CC3 staining (see Figure 3-figure supplement 1A) for the quantified data (see Figure 3-figure supplement 1B in the revised manuscript).

      4) The graph in Figure 3L should show WT and Sf3b1K700E expressing organoids number both with and without TGF-b.

      Since without TGF-b supplementation organoids have to be split in a 1:3 ratio every 5 days, we could not follow the same passaging regimen as in experiments with TGF-b supplementation (split in a 1:2 ratio every 20 days, Fig. 3I). However, we assessed the organoid number grown in control medium without TGF-b for 4 passages (20 days) in a 1:3 ratio, and observe no difference in organoid number in WT and Sf3b1K700E expressing organoids (Author response image 1). In the revised manuscript we show with a highly quantitative read-out (CellTiterGlo) that Sf3b1K700E expressing organoids do not grow faster than Sf3b1 WT expressing organoids in absence of TGF-β (see Figure 3-figure supplement 1E). Taken together, we can exclude that Sf3b1K700E organoids outgrow Sf3b1 WT organoids in medium with TGF-β supplementation because they generally have a growth advantage.

      Author response image 1.

      Author response image 1. WT and Sf3b1K700E expressing organoids were cultured without TGF-β supplementation. Organoids were split in a 1:3 ratio every 5 days. Data points show organoid number before splitting, assessed for 4 passages.

      Reviewer #2 (Public Review):

      The manuscript has several areas of strength; it functionally explores a mutant that is detected in a portion of pancreatic cancers; it conducts mechanistic investigation and it uses human cell lines to validate the findings based on mouse models. Some areas for improvement are described below.

      1) TGF-b is known to act as a tumor suppressor early in carcinogenesis, and as a tumor promoter later. The authors should extend their analysis of mouse models to determine whether the effect of SF3B1K700E is specific to promoting initiation (e.g. more, early acinar ductal metaplasia) or faster progression of PanINs following their formation. Another way to address this could be acinar cultures, to determine whether an increased propensity to ADM exists.

      To further detangle the effect KPC-Sf3b1K700E with respect to tumor progression, we analyzed our autochthonous model at an early and late stage of tumor progression: Histological examination at 5 weeks revealed increased propensity to ADM (see Figure 1-figure supplement 1J, K), PanIN formation (shown by Muc5a1 and CK19 IF stainings, Fig. 1G, I, J) and a concomitant decrease of acinar cells (shown by b-amylase staining) in KPC-Sf3b1K700E vs. KPC tumors (Fig. 1G, H). Analyzing tumors at 9 weeks of age did not show differences in CK19 staining and fibrosis. We added these data to the revised manuscript (see Figure 1-figure supplement 1F-I).

      2) Given that the effect of SF3B1K700E expression is more prominent in KC mice, rather than in KPC mice, the authors should explain the rationale for using the latter for RNA sequencing.

      In KC mice, pre-invasive PanIN lesions only infrequently progress to PDAC (spontaneous progression, see Gabriel et al., Pancreatology, 2020 ). Therefore, it would have been difficult to collect enough material for cell sorting and downstream RNA sequencing of tumor cells. The KPC mouse model develops PDAC with a 100% penetrance, allowing the collection of sufficient material.

      3) Given that this mutation is found in about 3% of human pancreatic cancer, it would be interesting to know whether these tumors have any unique feature, and specifically any characteristic that could be harnessed therapeutically.

      Unfortunately, the size of published datasets is too small for a meaningful differential gene expression analysis of SF3B1-WT vs. SF3B1-K700E PDAC tumors (due to the low occurrence of SF3B1-K700E PDAC). However, harnessing the K700E mutation therapeutically by increasing missplicing through splicing inhibitors has previously been suggested, and it was shown that SF3B1-K700E mutated cancer cells are more prone to apoptosis when splicing is chemically targeted than SF3B1-WT cells. We tested a similar approach in murine pre-cancerous organoids, demonstrating that Sf3b1-WT organoids show higher survival than Sf3b1K700E expressing organoids when treated with the splicing-inhibitor Pladienolide B (Author response image 2). However, since this concept is not novel and not within the topic of our manuscript, we would prefer to not integrate this data into our manuscript.

      Author response image 2.

      Author response image 2. 33 nM of the splicing inhibitor Pladienolide B was added to the cell culture medium for 48 hours and the viability was assessed by normalizing organoid numbers to untreated control organoids. The line indicates WT and Sf3b1K700E organoids assessed in the same replicate.

      4) It would be interesting to know whether this mutation mutually exclusive to other mutations affecting response to TGF-b. Further, while the data might not be widely available, it would be interesting to know whether in human patients the mutation occurs in precursor lesions (PanIN might be difficult to assess, but IPMN might be doable) or at later stages.

      We performed a mutual exclusivity analysis in PDAC samples available at www.cbioportal.org, but did not find mutual exclusivity of SF3B1-K700E to genes of the TGF-β-pathway. Of note, the value of the analysis is limited by the small sample size of SF3B1-K700E PDAC (n=7) Moreover, to our knowledge there is no public tissue biobank for PDAC which would allow us to assess the stage of SF3B1-K700E mutated PDAC tumors. Thus, unfortunately we cannot histologically assess if the mutations already occur in early stages of human tumor development.

      Author response table 1.

      Author response table 1: Mutual exclusivity analysis of public PDAC databases (ICGC, CPTAC, QCMG, TCGA, UTSW), including 910 patients. Mutation frequency is 25% for SMAD4, 5% for TGF-ΒR2, 3% for SMAD2, 2.6% for TGF-ΒR1, 1.4% for SMAD3, 0.7% for SF3B1-K700E, 0.7% for TGF-ΒR3, 0.4% for SMAD1. Analysis was performed on cbioportal.org.

      Reviewer #3 (Public Review):

      Alternative splicing as a result of mutations in different components of the splicing machinery has been associated with a variety of cancer types, including hematological malignancies where this has been most extensively studied but also for solid tumors such as breast and pancreatic ductal adenocarcinoma (PDAC). Here the authors analyze genome sequencing data in human PDAC samples and identify a recurring mutation in the SF3B1 subunit that substitutes lysine for glutamate at residue 700 (SF3B1K700E) in PDACs. This mutation has been identified and its' molecular role in disease progression in other diseases has been studied, but the mechanism for promoting disease progression in pancreatic cancer has not been as well characterized.

      To study how SF3B1K700E contributes to PDAC pathology, the authors generate a novel genetically modified mouse model of a pancreas specific SF3B1K700E mutation and explore its oncogenicity and tumor promoting potential. The authors find that SF3B1K700E is not oncogenic, but potentiates the oncogenic potential of Kras and p53 (KP) driver mutations commonly found in PDAC tumors. The authors then proceed to characterize the molecular mechanisms that might drive this phenotype. By transcriptomic analysis, the authors find KP-SF3B1K700E tumors have downregulation of epithelial-to-mesenchymal transition (EMT) genes compared to KP tumors. The cytokine TGFβ has previously been found to limit PDAC initiation and progression by causing lethal EMT in PDAC and PDAC precursor cells. Thus, the authors propose SF3B1K700E inhibition of EMT blocks the tumor suppressive activity of TGFβ and this underpins the tumor promoting role of SF3B1K700E mutation in PDAC. Consistent with this finding, SF3B1K700E mutation blocks TGFβ-induced toxicity in a variety of cell culture models of PDAC and PDAC precursor models.

      Lastly, the authors seek to identify how altered splicing reduces EMT activity in PDAC cells. The authors identify misspliced genes consistent in both KP and human SF3B1K700E mutant cancer samples and find Map3k7 as one of 11 consistently misspliced genes. MAP3K7 has previously been identified as a positive regulator of EMT. Thus the authors speculated Map3k7 missplicing would lead to reduced MAP3K7 activity and a reduction EMT and that this underpins the TGFβ in SF3B1K700E mutant PDAC cells. Consistent with this, the authors find inhibition of MAP3K7 reduces TGFβ toxicity in SF3B1K700E WT cells and overexpression of MAP3K7 in SF3B1K700E mutant PDAC cells induces TGFβ toxicity. Altogether, this suggests activity of Map3k7 is responsible for altered EMT activity and TGFβ sensitivity in SF3B1K700E mutant PDAC.

      Altogether, the authors generate a valuable model to study the role of a recurring splicing mutation in PDAC and provide compelling evidence that this mutation is accelerates disease. The authors then perform both: (1) an open-ended investigation of how this mutation alters PDAC cell biology where they identify altered EMT activity and (2) rigorous mechanistic studies showing suppressed EMT provides PDAC cells with resistance to TGFβ, which has previously been shown to be tumor suppressive in PDAC, suggesting a possible mechanism by which SF3B1K700E mutation is oncogenic in PDAC that future animal studies can confirm. This work generates valuable models and datasets to advance the understanding of how mutations in the splicing machinery can promote PDAC progression and suggests alternative splicing of MAP3K7 is one such possible mechanism that altered splicing promotes PDAC progression in vivo.

      • One major concern about the manuscript is that the proposed mechanism by which SF3B1K700E mutation accelerates PDAC progression (MAP3K7 inhibition -> EMT inhibition -> reduced TGF-β toxicity) is only tested in ex vivo culture models and there is very limited and correlative data to suggest that this is the operative mechanism by which SF3B1K700E mutant tumors are accelerated. This is especially important because of recent findings that IFN-α signaling, which the authors also found to be high in SF3B1K700E mutant tumors, also promotes PDAC progression (https://www.biorxiv.org/content/10.1101/2022.06.29.497540v1). Thus, while thoroughly convinced by the rigorous ex vivo work that SF3B1K700E does lead to MAP3K7 inhibition -> EMT inhibition -> reduced TGF-β toxicity, further experiments to confirm this mechanism is critical in vivo would be needed to convince me that this mechanism is critical to tumor progression in vivo. For example, would forced expression of MAP3K7 slow orthotopic KP-SF3B1K700E tumor growth while leaving IFN-α signaling unperturbed?

      We thank the reviewer for raising these important points. To first test if the upregulation of IFN-α signaling, seen in our RNA-seq data of sorted KPC-Sf3b1K700E cells, was directly caused by the Sf3b1-K700E mutation, we assessed the 5 most deregulated genes of the IFN-α signature in in-vitro activated KPC and KPC-Sf3b1K700E organoids (analogous to the experiments on the EMT gene signature in see Figure 2-figure supplement 1D). However, in contrast to EMT marker genes, INFa signature genes were not differently expressed in KPC-Sf3b1K700E vs. KPC organoids (Author response image 3). Thus, increased IFN-α signaling in KPC-Sf3b1K700E tumors in mice is likely an indirect consequence of further progressed cancers rather than an effect directly caused by Sf3b1K700E mediated missplicing.

      Author response image 3.

      Author response image 3. Expression of the 5 most deregulated genes of the IFN-α gene set identified in sorted KPC-Sf3b1K700E cells in in-vitro activated KPC-Sf3b1K700E and KPC organoids. 4 biological replicates were performed. For analysis, Ct-values of the indicated genes were normalized to Actb and a two-tailed unpaired t-test was used to compute the indicated p-values.

      To next examine the effect of Map3k7 on tumors in vivo, we established orthotopic transplantation models with KPC and KPC-Sf3b1K700E cells, with overexpression or knockdown of Map3k7 (Author response image 4). However, in contrast to the autochthonous mouse model, already orthotopically transplanted KPC vs. KPC-Sf3b1K700E cells did not show differences in tumor size (see Figure 1-figure supplement 1M, N). These data support our hypothesis that Sf3b1-K700E rather plays an important role during early stages of PDAC (KPC cells are isolated from fully developed PDAC tumors and orthotopic KPC transplantation thus represents a late-stage PDAC model).

      Unfortunately, these data also demonstrate that orthotopic transplantation of KPC cells is not a suitable model for studying the impact of Map3k7 in PDAC development, and as expected, neither Map3k7 overexpression in transplanted KPC-Sf3b1K700E cells nor shRNA mediated knockdown of Map3k7 (shMap3k7) in transplanted KPC cells led to differences in growth compared to their control groups (Author response image 4). In line with these results, the EMT genes that were found to be differentially expressed in our autochthonous mouse model (KPC vs. KPC-Sf3b1K700E) were expressed at similar levels upon Map3K7 downregulation or overexpression.

      Since establishment of an autochthonous KPC PDAC mouse model with a knock-down of MAP3K7 is out of scope for a revision, in the revised manuscript we discuss the limitation of our study that the molecular link between Sf3b1K700E, Map3k7 and Tgfb resistance has only been studied in vitro in organoids and cell lines. We also adapted the abstract and the title of the manuscript accordingly (formerly “Mutant SF3B1 promotes PDAC malignancy through TGF-β resistance”, now “Mutant SF3B1 promotes malignancy in PDAC”).

      Author response image 4.

      Author response image 4. (A) Relative gene expression of Map3k7 in KPC cells transduced with shRNA targeting Map3k7 (shMap3k7), normalized to KPC cells transduced with scrambled control shRNA (shCtrl). 3 biological replicates are shown. (B) Weight of tumors derived by orthotopical transplantation of shMap3k7 and shCtrl KPC cells. 5 biological replicates are shown. (C) Relative gene expression of EMT genes in tumors derived by orthotopic transplantation of shCtrl and shMap3k7 cells. 4 biological replicates are shown. (D) Relative gene expression of Map3k7 in KPC-Sf3b1K700E cells transduced with an overexpression vector of Map3k7 (OE Map3k7), normalized to control KPC cells without Map3k7 overexpression. 3 biological replicates are shown, a two-sided student’s t-test was used to calculate significance. (E) Weight of tumors derived by orthotopical transplantation of Map3k7 overexpressing KPC-Sf3b1K700E cells (n=5) and control KPC-Sf3b1K700E cells (n=4). (F) Relative gene expression of EMT genes in tumors derived by orthotopic transplantation of KPC-Sf3b1K700E cells with- and without overexpression of Map3k7. 4 biological replicates are shown. A two-sided student’s t-test was used to calculate significance in Fig. 2A-F.

    1. Author Response:

      Reviewer #1:

      Zappia et al investigate the function of E2F transcriptional activity in the development of Drosophila, with the aim of understanding which targets the E2F/Dp transcription factors control to facilitate development. They follow up two of their previous papers (PMID 29233476, 26823289) that showed that the critical functions of Dp for viability during development reside in the muscle and the fat body. They use Dp mutants, and tissue-targetted RNAi against Dp to deplete both activating and repressive E2F functions, focussing primarily on functions in larval muscle and fat body. They characterize changes in gene expression by proteomic profiling, bypassing the typical RNAseq experiments, and characterize Dp loss phenotypes in muscle, fat body, and the whole body. Their analysis revealed a consistent, striking effect on carbohydrate metabolism gene products. Using metabolite profiling, they found that these effects extended to carbohydrate metabolism itself. Considering that most of the literature on E2F/Dp targets is focused on the cell cycle, this paper conveys a new discovery of considerable interest. The analysis is very good, and the data provided supports the authors' conclusions quite definitively. One interesting phenotype they show is low levels of glycolytic intermediates and circulating trehalose, which is traced to loss of Dp in the fat body. Strikingly, this phenotype and the resulting lethality during the pupal stage (metamorphosis) could be rescued by increasing dietary sugar. Overall the paper is quite interesting. It's main limitation in my opinion is a lack of mechanistic insight at the gene regulation level. This is due to the authors' choice to profile protein, rather than mRNA effects, and their omission of any DNA binding (chromatin profiling) experiments that could define direct E2F1/ or E2F2/Dp targets.

      We appreciate the reviewer’s comment. Based on previously published chromatin profiling data for E2F/Dp and Rbf in thoracic muscles (Zappia et al 2019, Cell Reports 26, 702–719) we discovered that both Dp and Rbf are enriched upstream the transcription start site of both cell cycle genes and metabolic genes (Figure 5 in Zappia et al 2019, Cell Reports 26, 702–719). Thus, our data is consistent with the idea that the E2F/Rbf is binding to the canonical target genes in addition to a new set of target genes encoding proteins involved in carbohydrate metabolism. We think that E2F takes on a new role, and rather than being re-targeted away from cell cycle genes. We agree that the mechanistic insight would be relevant to further explore.

      Reviewer #2:

      The study sets out to answer what are the tissue specific mechanisms in fat and muscle regulated by the transcription factor E2F are central to organismal function. The study also tries to address which of these roles of E2F are cell intrinsic and which of these mechanisms are systemic. The authors look into the mechanisms of E2F/Dp through knockdown experiments in both the fat body* (see weakness) and muscle of drosophila. They identify that muscle E2F contributes to fat body development but fat body KD of E2F does not affect muscle function. To then dissect the cause of adult lethality in flies, the authors proteomic and metabolomic profiling of fat and muscle to gain insights. While in the muscle, the cause seems to be an as of yet undetermined systemic change , the authors do conclude that adult lethality in fat body specific Dp knockdown is the result of decrease trehalose in the hemolymph and defects in lipid production in these flies. The authors then test this model by presenting fat body specific Dp knockdown flies with high sugar diet and showing adult survival is rescued. This study concurs with and adds to the emerging idea from human studies that E2F/Dp is critical for more than just its role in the cell-cycle and functions as a metabolic regulator in a tissue-specific manner. This study will be of interest to scientists studying inter-organ communication between muscle and fat.

      The conclusions of this paper are partially supported by data. The weaknesses can be mitigated by specific experiments and will likely bolster conclusions.

      1) This study relies heavily on the tissue specificity of the Gal4 drivers to study fat-muscle communication by E2F. The authors have convincingly confirmed that the cg-Gal4 driver is never turned on in the muscle and vice versa for Dmef2-Gal4. However, the cg-Gal4 driver itself is capable of turning on expression in the fat body cells and is also highly expressed in hemocytes (macrophage-like cells in flies). In fact, cg-Gal4 is used in numerous studies e.g.:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4125153/ to study the hemocytes and fat in combination. Hence, it is difficult to assess what contribution hemocytes provide to the conclusions for fat-muscle communication. To mitigate this, the authors could test whether Lpp-Gal4>Dp-RNAi (Lpp-Gal4 drives expression exclusively in fat body in all stages) or use ppl-Gal4 (which is expressed in the fat, gut, and brain) but is a weaker driver than cg. It would be good if they could replicate their findings in a subset of experiments performed in Figure 1-4.

      This is indeed an important point. We apologize for previously not including this information. Reference is now on page 7.

      Another fat body driver, specifically expressed in fat body and not in hemocytes, as cg-GAL4, was tested in previous work (Guarner et al Dev Cell 2017). The driver FB-GAL4 (FBti0013267), and more specifically the stock yw; P{w[+mW.hs]=GawB}FB P{w[+m*] UAS-GFP 1010T2}#2; P{w[+mC]=tubP-GAL80[ts]}2, was used to induce the loss of Dp in fat body in a time-controlled manner using tubGAL80ts. The phenotype induced in larval fat body of FB>DpRNAi,gal80TS recapitulates findings related to DNA damage response characterized in both Dp -/- and CG>Dp- RNAi (see Figure 5A-B, Guarner et al Dev Cell 2017). The activation of DNA damage response upon the loss of Dp was thoroughly studied in Guarner et al Dev Cell 2017. The appearance of binucleates in cg>DpRNAi is presumably the result of the abnormal transcription of multiple G2/M regulators in cells that have been able to repair DNA damage and to resume S-phase (see discussion in Guarner et al Dev Cell 2017). More details regarding the fully characterized DNA damage response phenotype were added on page 6 & 7 of manuscript.

      Additionally, r4-GAL4 was also used to drive Dp-RNAi specifically to fat body. But since this driver is weaker than cg-GAL4, the occurrence of binucleated cells in r4>DpRNAi fat body was mild (see Figure R1 below).

      As suggested by the reviewer, Lpp-GAL4 was used to knock down the expression of Dp specifically in fat body. All animals Lpp>DpRNAi died at pupa stage. New viability data were included in Figure 1-figure supplement 1. Also, larval fat body were dissected and stained with phalloidin and DAPI to visualize overall tissue structure. Binucleated cells were present in Lpp>DpRNAi fat body but not in the control Lpp>mCherry-RNAi (Figure 2-figure supplement 1B). These results were added to manuscript on page 7.

      Furthermore, Dp expression was knockdowned using a hemocyte-specific driver, hml-GAL4. No defects were detected in animal viability (data not shown).

      Thus, overall, we conclude that hemocytes do not seem to contribute to the formation of binucleated-cells in cg>Dp-RNAi fat body.

      Finally, since no major phenotype was found in muscles when E2F was inactivated in fat body (please see point 3 for more details), we consider that the inactivation E2F in both fat body and hemocytes did not alter the overall muscle morphology. Thus, exploring the contribution of cg>Dp-RNAi hemocytes in muscles would not be very informative.

      2) The authors perform a proteomics analysis on both fat body and muscle of control or the respective tissue specific knockdown of Dp. However, the authors denote technical limitations to procuring enough third instar larval muscle to perform proteomics and instead use thoracic muscles of the pharate pupa. While the technical limitations are understandable, this does raise a concern of comparing fat body and muscle proteomics at two distinct stages of fly development and likely contributes to differences seen in the proteomics data. This may impact the conclusions of this paper. It would be important to note this caveat of not being able to compare across these different developmental stage datasets.

      We appreciate the suggestion of the reviewer. This caveat was noted and included in the manuscript. Please see page 11.

      3) The authors show that the E2F signaling in the muscle controls whether binucleate fat body nuclei appear. In other words, is the endocycling process in fat body affected if muscle E2F function is impaired. However, they conclude that imparing E2F function in fat does not affect muscle. While muscle organization seems fine, it does appear that nuclear levels of Dp are higher in muscles during fat specific knock-down of Dp (Figure 1A, column 2 row 3, for cg>Dp-RNAi). Also there is an increase in muscle area when fat body E2F function is impaired. This change is also reflected in the quantification of DLM area in Figure 1B. But the authors don't say much about elevated Dp levels in muscle or increased DLM area of Fat specific Dp KD. Would the authors not expect Dp staining in muscle to be normal and similar to mCherry-RNAi control in Cg>dpRNAi? The authors could consider discussing and contextualizing this as opposed to making a broad statement regarding muscle function all being normal. Perhaps muscle function may be different, perhaps better when E2F function in fat is impaired.

      The overall muscle structure was examined in animals staged at third instar larva (Figure 1A-B). No defects were detected in muscle size between cg>Dp-RNAi animals and controls. In addition, the expression of Dp was not altered in cg>Dp-RNAi muscles compared to control muscles. The best developmental stage to compare the muscle structure between Mef2>Dp-RNAi and cg>Dp-RNAi animals is actually third instar larva, prior to their lethality at pupa stage (Figure 1- figure supplement 1).

      Based on the reviewer’s comment, we set up a new experiment to further analyze the phenotype at pharate stage. However, when we repeated this experiment, we did not recover cg>Dp-RNAi pharate, even though 2/3 of Mef2>Dp-RNAi animals survived up to late pupal stage. We think that this is likely due to the change in fly food provider. Since most cg>DpRNAi animals die at early pupal stage (>75% animals, Figure 1-figure supplement 1), pharate is not a good representative developmental stage to examine phenotypes. Therefore, panels were removed.

      Text was revised accordingly (page 6).

      4) In lines 376-380, the authors make the argument that muscle-specific knockdown can impair the ability of the fat body to regulate storage, but evidence for this is not robust. While the authors refer to a decrease in lipid droplet size in figure S4E this is not a statistically significant decrease. In order to make this case, the authors would want to consider performing a triglyceride (TAG) assay, which is routinely performed in flies.

      Our conclusions were revised and adjusted to match our data. The paragraph was reworded to highlight the outcome of the triglyceride assay, which was previously done. We realized the reference to Figure 6H that shows the triglyceride (TAG) assay was missing on page 17. Please see page 17 and page 21 of discussion.

    1. Author Respones

      Reviewer #1 (Public Review):

      The manuscript by Hekselman et al presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying celltype-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

      Statistical analyses were changed to include permutation testing and a different threshold (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2). Assessments of type I error were based on literature text-mining and expert curation, and showed that false-positive rates were low in both (0.01 and 0.07, respectively; Figure 1F and Figure 1–figure supplement 4A).

      Reviewer #2 (Public Review):

      This study identifies 110 disease-affected cell types for 714 Mendelian diseases, based on preferential expression of known disease-associated genes in single-cell data. It is likely that many or most of the results are real, and the results are biologically interesting and provide a valuable resource. However, updates to the method are needed to ensure that inference of statistical significance is appropriately stringent and rigorous.

      Strengths: a systematic evaluation of disease-affected cell types across Mendelian diseases is a valuable addition to the literature, complementing systematic evaluations of common disease and targeted analyses of individual Mendelian diseases. The validation via excess overlap with diseasecell type pairs from literature co-appearance provides compelling evidence that many or most of the results are real. In addition, many of the results are biologically interesting. In particular, it is interesting that diseases with multiple affected tissues tend to affect similar cell types in the respective tissues.

      Limitations: the main limitation of the study is that, although many or most of the results are likely to be real, the criteria for statistical significance is probably not stringent enough, and is not welljustified. For diseases with only 1 disease-associated gene, the threshold is a z-score>2 for preferential expression in the cell type, but this threshold is likely to be often exceeded by chance. (For diseases with many disease-associated genes, the threshold is a median (across genes) zscore>2 for preferential expression in the cell type, which is less likely to occur by chance but still an arbitrary threshold.) Thus, there is a good chance that a sizable proportion of the reported disease-affected cell types might be false positives. The best solution would be to assess statistical significance via empirical comparison with results for non-disease-associated control genes, and assess the statistical significance of the resulting P-values using FDR.

      We thank the reviewer for the valuable insights and suggestions. We revised the method to assess statistical significance by using empirical comparison followed by FDR correction, as suggested by the reviewer (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2).

      The re-analysis using mouse single-cell data adds an interesting additional dimension to the study, with the small caveat that mouse single-cell data does not provide statistically independent information across genes (for the same reason that adding data from independent human individuals would not provide statistically independent information across genes, given that human and mouse expression are partially correlated).

      We acknowledge this caveat in the text (Discussion, page 17, 2nd paragraph, lines 8-11).

      Reviewer #3 (Public Review):

      The authors describe the method, PrEDiCT, which helps identify disease affected cell types based on gene sets. As I understand it, the method is based on finding which "disease genes" (from an annotation) are relatively highly expressed. The idea is nice, however, I have concerns about how "significance" is assessed and the relative controls.

      Overall, I find the idea interesting, but the execution raises some concerns.

      1) From a causal perspective, there is an association of high expression of these genes within these cell types, but without also assessing individuals with those specific diseases, I do not it is fair to say "disease affected" cell types. It is possible that these genes might behave completely fine but are highly expressed in those cell types while being affected another in other cell types.

      We agree with the reviewer. We changed the terminology to "likely disease-affected cell types” and added this caveat to the Discussion, page 16, 2nd paragraph.

      2) It is unclear to me what the "null" comparison is in the method and if there is one. For example, by chance, would I expect this gene to be highly expressed because other genes are also highly expressed in this cell type? Some way to assess "significance" or "enrichment" beyond simply using ranks and thresholds would be helpful in deciding whether these associations are robust.

      We revised the procedure for assessing statistical significance to include permutation tests. Specifically, given a disease D with n disease-associated genes, the null hypothesis was that the PrEDiCT score of these genes is not significantly different from the PrEDiCT score of a random set of n genes. To test this, we randomly selected n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c was set to the fraction of random scores in c that were at least as high as the original PrEDiCT score of D in c. The acquired p-values were adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. To increase stringency, we treated only statistically significant disease–cell-type pairs with PrEDiCT score≥1 as 'likely affected'. The procedure is detailed in Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1–figure supplement 2. Additionally, we estimated type I error by using literature text-mining or expert curation (Results, page 7, 2nd paragraph; Methods, page 22, ‘Textmining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F and Figure 1–figure supplement 4A).

      3) Additionally, it is unclear to me, but I suspect that there are unequal cell numbers in the scores computed as well as between relevant tissues. This is related to point (2) above, but as a result, the estimates of the scores will inherently have different variances, thus making comparisons between them difficult/unreliable unless accounted for. If I understand correctly, the score is first the average expression within a tissue, then, the Z-score? If so, my comment applies.

      To clarify, the PrEDiCT score of a disease D in cell type c was set to the median preferential expression P of its disease genes (Equation 1 below). The preferential expression of each gene in c was computed as a Z-score, by comparing the average expression of the gene in c to its average expression in all cell types of the tissue, divided by the standard deviation (SD, Equation 2 below). Tissues indeed had unequal numbers of cell types, however, the distribution of PrEDiCT scores were similar between tissues (now in Supplementary File 13). We revised this part of Methods and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’) and Supplementary File 13.

      4) There is a large set of work done in gene enrichment sets which appears to not be mentioned (e.g. GSEA and other works by the Price group). It would be helpful for the authors to summarize these methods and how their method differs.

      We added work done in gene enrichment sets (including two relevant and recent studies from the Price group) and summarized these methods in the Introduction (page 2-3).

      5) Additionally, it should be noted that a caveat of this analysis is that the comparisons are all done only relative to the cell types sampled and the diseases which have Mendelian genes associated with them. I would expect these results to change, possibly drastically, if the sampled cell types and diseases were to be changed.

      We agree with the reviewer and now discuss the generalizability of our results, relating to the extent of the sampled cell types (Discussion, page 18, 1st paragraph).

      6) Finally, I would appreciate a more detailed explanation in the methods of how the score is computed. Some equations and the data they are calculated from would be helpful here.

      We now provide a detailed explanation of how the score and its statistical significance were computed and added Equations 1 and 2 (Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’).

      In summary, the general idea is an interesting one, but I do think the issues above should be addressed to make the results convincing.

      We thank the reviewer for the important feedback which helped us strengthen our analyses.

    1. Author Response:

      Reviewer #1:

      Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in males than in females. The patterns of exploration were analyzed in fine-grained detail: males are less likely to stop exploring once exploring is initiated, whereas female mice stop exploring once they learn. Authors find that both learning rate (alpha) and noise parameter (beta) increase in exploration trials in a hidden Markov model (HMM). When reinforcement learning (RL) models were fitted to animal data, they report females had a higher learning rate and over days of testing, suggesting higher meta-learning in females. They also report that of the RL models they fit, the model incorporating a choice kernel updating rule was found to fit both male and female learning. The results do suggest one should pay greater attention to the influence of sex in learning and exploration. Another important takeaway from this study is that similar levels of accuracy do not imply similar strategies. Essential revisions include a request to show more primary behavioral data, to provide a rationale for the different RL models and their parameters, to clarify the difference between learning and 'steady state,' and to qualify how these experiments uniquely identify latent cognitive variables not previously explored with similar methods.

      We appreciate the reviewer’s thorough reading of the paper and hope that the changes we detail below will address these concerns.

      Reviewer #2:

      The authors investigated sex differences in explore-exploit tradeoff using a drifting binary bandit task in rodents. The authors tried to claim that males and females use different means to achieve similar levels of accuracy in making explore-exploit decisions. In particular, they argue that females explore less but learn more quickly during exploration. The topic is very interesting, but I am not yet convinced on the conclusions.

      Here are my major points:

      1) This paper showed that males explore more than females, and through computational modeling, they showed that females have a higher learning rate compared to males. The fact that males explore more and have lower learning rates compare to females, can be an interesting finding as the paper tried to claim, but it can also be that female rats simply learn the task better than male rats in the task used.

      We have revised the manuscript to better demonstrate that male mice did not acquire fewer rewards than females, and included all analyses and plots requested in this review. Ultimately, there was no evidence that they learned the task any less well than the females did. We appreciated this comment because it has strengthened the evidence we were able to present that males and females take different paths to the same outcome. Completing these analyses has also allowed us to clarify the relationship between RL learning rates and performance in this classic dynamic decision-making task.

      (a) First, from Figure 1B, it looks like p(reward, chance) are similar between sex, but visually the female rats' performances, p(reward, obtained), look slight better than males. It would be nice if the authors could show a bar plot comparison like in Figure 1C and 1E. A non-significant test here only fails to show sex differences in performance, but it cannot be concluded that there are no sex differences in performance here. Further evidence needs to be reported here to help readers see whether there are qualitative differences in performances at all.

      The requested bar plot has been added in as Figure 1C and illustrates our central point: male mice did not acquire fewer rewards than females, so there is no evidence that they learned the task any less well than the females did. The t-test result we originally reported suggests that we can discard the hypothesis that males and females have different mean levels of percent reward obtained, but we take the reviewer’s point that the male and female distributions may differ in other, more subtle ways. Therefore, we conducted a better statistical test here. The Kolmogorov-Smirnov (KS) test takes into account not only the means of the distributions but also the shapes of the distributions. The null hypothesis is that both groups were sampled from populations with identical distributions. It tests for any violation of that null hypothesis -- different medians, different variances, or different distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distribution of performance.

      New text from the manuscript (page 5, line 119-128):

      “There was no significant sex difference in the probability of rewards acquired above chance (Figure 1C, main effect of sex, F(1, 30) = 0.05, p = 0.83). While the mean of percent reward obtained did not differ across sexes, we consider the possibility that the distribution of reward acquisition in males and females might be different. We conducted the Kolmogorov-Smirnov (KS) test, which takes into account not only the means of the distributions but also the shapes of the distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distributions for reward acquisition. This result demonstrates equivalently strong understanding and performance of the task in both males and females.”

      (b) The exploration and exploitation states are defined by fitting a hidden Markov model. In the exploration phase, the agent chooses left and right randomly. From Figure 1E and 1F, it looks like for male rats, they choose completely randomly 70% of the times (around 50% for females). The exploration state here is confounded with the state of pure guessing (poor performance).

      This comment seems to confuse our descriptive HMM with a generative model. The HMM does not imply that choices are being made randomly. Instead, exploratory choices are modeled as a uniform distribution over choices. This was done only because this is the maximum entropy distribution for a categorical variable -- the distribution that makes the fewest assumptions about the true underlying distribution and thus does not bias the model towards or away from any particular pattern of choices during exploration. For example, (Ebitz et al., 2019) have shown that the HMM can recover periods of exploration that are highly structured and information- maximizing, despite being modeled in exactly this way.

      Because the model does not imply or require that exploratory choices are random, we could, in the future, ask whether these choices reflect random exploration or instead more directed forms of exploration. However, for various reasons, this task is not the ideal testbed for isolating random and directed exploration, though this is a direction we hope to go in the future. To clarify our model and address these issues for future research, we have added the following text (page 31, line 745-756):

      “The emissions model for the explore state was uniform across the options. The emissions model for the explore state was uniform across the options:

      This is simply the maximum entropy distribution for a categorical variable - the distribution that makes the fewest number of assumptions about the true distribution and thus does not bias the model towards or away from any particular type of high-entropy choice period. This doesn’t require, imply, impose, or exclude that decision-making happening under exploration is random. Ebitz et al. 2019 have shown that exploration was highly structured and information-maximizing, despite being modeled as a uniform distribution over choices (Ebitz et al., 2020, 2019). Because exploitation involves repeated sampling of each option, exploit states only permitted choice emissions that matched one option.”

      (c) Figure 2 basically says that you can choose randomly for two reasons, to be more "noisy" in your decisions (have a higher temperature term), or to ignore the values more (by having a learning rate of 0, you are just guessing). It would be nice to show a simulation of p(reward, obtained) by learning rate x inverse temperature (like in Figure 2C). From Figure 2B, it looks like higher learning rates means better value learning in this task. It seems to me that it's more likely the male rats simply learn the task more poorly and behave more randomly which show up as more exploration in the HMM model.

      This is an important comment and addressing it gave us a chance to show the complicated, nonlinear relationship between learning rate term and performance in this task. Per the reviewer’s request, we now include a plot showing how learning rate (ɑ) and inverse temperature (β)affect reward acquisition (Figure 3F). However, this figure demonstrates that higher learning rate does not mean better performance in this task. Performing well in this task requires both the ability to learn new information and the ability to hang onto the information that has already been learned. That can only happen when learning rates are moderate, not maximal. When the learning rate is maximal, behavior is reduced to a win-stay lose-shift policy, where only the outcome of the previous trial is taken into account for choice. This actually results in a lower percent of the reward obtained. We have addressed the difference between the learning rate parameter in the reinforcement learning (RL) model and actual learning performance in the comment above. We believe that this new figure illustrates an essential point that different strategies could result in the same learning performance.

      This result shows that the male strategy was a valid one that doesn’t perform worse than the female strategy. Not only did they have identical performance (Figure 1C), but their optimized RL parameters put them both within the same predicted performance gradient in this new plot (Figure 3F). That’s exactly why we believe that it is important to understand differences in how individuals approach the same task, even as they may achieve the same overall levels of performance.

      New text from the manuscript (page 14, line 368-385):

      “While females had significantly higher learning rate (α) than males, they did not obtain more rewards than males. This is because the learning rate parameter in an RL model does not equate to the learning performance, which is better measured by the number of rewards obtained. The learning rate parameter reflects the rate of value updating from past outcomes. Performing well in this task requires both the ability to learn new information and the ability to hang onto the previously learned information. That occurs when the learning rate is moderate but not maximal. When the learning rate is maximal (α = 1), only the outcome of the immediate past trial is taken into account for the current choice. This essentially reduces the strategy to a win-stay lose-shift strategy, where choice is fully dependent on the previous outcome. A higher learning rate in a RL model does not translate to better reward acquisition performance. To illustrate that different combinations of learning rate and decision noise can result in the same reward acquisition performance. We conducted computer simulations of 10,000 RL agents defined by different combinations of learning rate (α) and inverse temperature (β) and plotted their reward acquisition performance for the restless bandit task (Figure 3F). This figure demonstrates that 1) different learning rate and inverse temperature combinations can result in similar performance, 2) the optimal reward acquisition is achieved when learning rate is moderate. This result suggested that not only did males and females had identical performance, their optimized RL parameters put them both within the same predicted performance gradient in this plot.”

      (d) From figure 3E, it looks like female rats learn better across days but male rats do not, but I am not sure. If you plot p(reward, obtained) vs times(days), do you see an improvement in female rats as opposed to males? Figure 4 also showed that females show more win-stay-lose-shift behavior and use past information more, both are indicators of better learning in this task.

      Taken the above together, I am not convinced about the strategic sex differences in exploration, it looks more like that the female rats simply learn better in this task.

      Unfortunately, there was no change in performance across days in either males or females. Per request by the reviewer, we now included a new plot illustrating p (reward,obtained) over days in Supplemental Figure 1. Ultimately, this resonated with the points we clarified above and demonstrated in this figure: males and females had identical performance in this task.

      To the other points raised here, about sex differences in win-stay lose-shift and mutual information: these are the strategic differences at the heart of the paper, but again did not alter overall performance for the reasons detailed above. Figure 4 did show that females were doing more win-stay. However, after further examining win-stay behavior by explore-exploit states, we found that females were only doing more win stay during exploratory trials (Figure 5E). There was no difference in win-stay during the exploitative trials. Figure 5F also demonstrated that females did more win-stay lose- shift in the exploration state, indicating that females only learned better during exploration. Although males learned slower during exploration, they compensated that by exploring for longer. Both male and female strategies are equally effective and may be differentially advantageous in different tasks.

      Finally, to address the meta-learning: in developing our response to this comment and looking for any other signs of adaptation across days (sex differenced or not), we did revisit this results and decided to rewrite some passages to be more circumscribed about our interpretations. Figure 3E showed increased learning rate parameters across days in females. We were initially excited about this idea of meta-learning, however we find no other evidence of adaptation over time in multiple behavioral measures, including reward acquisition, response time, and retrieval time (Supplemental Figure 1). Changes in learning rate parameters over sessions from the RL model were marginally significant and we feel that it’s worth mentioning for completeness, but it was only a small contributor to the overall sex differences in the behavioral profile. As a result we have toned down the conclusion we drew from this result accordingly.

      New text from the manuscript (page 4, line 93-113):

      “It is worth noting that unlike other versions of bandit tasks such as the reversal learning task, in the restless bandit task, animals were encouraged to continuously learn about the most rewarding choice(s). There is no asymptotic performance during the task because the reward probability of each choice constantly changes. The performance is best measured by the amount of obtained reward. Prior to data collection, both male and female mice had learned to perform this task in the touchscreen operant chamber. To examine whether mice had learned the task, we first calculated the average probability of reward acquisition across sessions in males and females (Supplemental Figure 1A). There was no significant changes in the reward acquisition performance across sessions in both sexes, demonstrating that both males and females have learned to perform the task and had reached an asymptotic level of performance across sessions (two-way repeated measure ANOVA, main effect of session, p = 0.71). Then we examine two other primary behavioral metrics across sessions that are associated with learning: response time and reward retrieval time (Supplemental Figure 1B, C). Response time was calculated as the time elapsed between the display onset and the time when the nose poke response was completed. Reward retrieval time was measured as the time elapsed between nose-poke response and magazine entry for reward collection. There was no significant change in response time (two-way repeated measure ANOVA, main effect of session, p = 0.39) and reward retrieval time (main effect of session, p = 0.71) across sessions in both sexes, which again demonstrated that both sexes have learned how to perform the task. Since both sexes have learned to perform the task prior to data collection, variabilities in task performance are results of how animals learned and adapted their choices in response to the changing reward contingencies.”

      page 14, line 386-390:

      “One interesting finding is that, when compared learning rate across sessions within sex, females, but not males, showed increased learning rate over experience with task (Figure 3G, repeated measures ANOVA, female: main effect of time, F (2.26,33.97) = 5.27, p = 0.008; male: main effect of time, F(2.5,37.52) = 0.23, p = 0.84). This points to potential sex differences in meta-learning that could contribute to the differential strategies across sexes.”

      2) I do like how the authors define exploration states vs exploitation states via HMM using choices alone. It would be interesting to see how the sex differences in reaction time are modulated by exploration vs exploitation state. As the authors showed, RT in exploration state is longer. Hence, it would make a conceptual difference whether the sex difference in reaction times is due to different proportions of time spent on exploration vs exploitation across sex.

      That is a very interesting idea. We tested for this possibility by calculating a two-way ANOVA (with interaction) between explore-exploit state and sex in predicting RT. There was a significant main effect of state (RT is longer in explore state than exploit state, main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60). Unfortunately, this means that we cannot interpret the response time difference between males and females as a consequence of the greater male tendency to explore. Response time is a fairly noisy primary behavior metric, especially in the males, and a lot of other factors might be at play here, some of which we plan to follow up on in the future. We report this result as follows (page 10, line 248-254):

      “Since males had more exploratory trials, which took longer, we tested the possibility that the sex difference in response time was due to prolonged exploration in male by calculating a two- way ANOVA between explore-exploit state and sex in predicting response time. There was a significant main effect of state (main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60).”

      Reviewer #3:

      In the manuscript 'Sex differences in learning from exploration', Chen and colleagues investigated sex differences in decision making behavior during a two-armed spatial restless bandit task. Sex differences and exploration dysregulation has been observed in various neuropsychiatric disorders. Yet, it has been unclear whether sex differences in exploration and exploitation contributes to sex-linked vulnerabilities in neuropsychiatric disorders.

      Chen and colleagues applied comprehensive modeling (model free Hidden Markov model (HMM), and various reinforcement learning (RL) models) and behavioral analysis (analysis of choice behavior using the latent variables extracted from HMM), to answer this question. They found that male mice explored more than female mice and were more likely to spend an extended period of their time exploring before committing to a favored choice. In contrast, female mice were more likely to show elevated learning during the exploratory period, making exploration more efficient and allowing them to start exploiting a favored choice earlier.

      Overall, I find the question studied in this work interesting, and compelling. Also, the results were convincing and the analysis through. However, assumptions in the proposed HMM is not fully justified and additional analyses are needed to strengthen authors' claims. To be more specific, the effect of obtained reward on state transitions, and biased exploitations should be further explored.

      Thank you for your feedback. We have included two more complex versions of the Hidden Markov models (HMMs) that account for the effect of obtained reward on state transitions and biased exploitations. Although the additional parameters slightly improve the model fit, model comparison tests suggested that such improvement was not significant. We decided to use the original HMM from the original manuscript because it’s the simplest and best fit model that provides the best parameter estimation with the amount of data we have. We do appreciate the comments and believe that the inclusion of two new HMMs and justification of the original HMM has strengthened our claims.

    1. Author Response

      Reviewer #2 (Public Review):

      I believe the authors succeeded in finding neural evidence of reactivation during REM sleep. This is their main claim, and I applaud them for that. I also applaud their efforts to explore their data beyond this claim, and I think they included appropriate controls in their experimental design. However, I found other aspects of the paper to be unclear or lacking in support. I include major and medium-level comments:

      Major comments, grouped by theme with specifics below:

      Theta.

      Overall assessment: the theta effects are either over-emphasized or unclear. Please either remove the high/low theta effects or provide a better justification for why they are insightful.

      Lines ~ 115-121: Please include the statistics for low-theta power trials. Also, without a significant difference between high- and low-theta power trials, it is unclear why this analysis is being featured. Does theta actually matter for classification accuracy?

      Lines 123-128: What ARE the important bands for classification? I understand the point about it overlapping in time with the classification window without being discriminative between the conditions, but it still is not clear why theta is being featured given the non-significant differences between high/low theta and the lack of its involvement in classification. REM sleep is high in theta, but other than that, I do not understand the focus given this lack of empirical support for its relevance.

      Line 232-233: "8). In our data, trials with higher theta power show greater evidence of memory reactivation." Please do not use this language without a difference between high and low theta trials. You can say there was significance using high theta power and not with low theta power, but without the contrast, you cannot say this.

      Thank you, we have taken this point onboard. We thought the differences observed between classification in high and low theta power trials were interesting, but we can see why the reviewer feels there is a need for a stronger hypothesis here before reporting them. We have therefore removed this approach from the manuscript, and no longer split trials into high and low theta power.

      Physiology / Figure 2.

      Overall assessment: It would be helpful to include more physiological data.

      It would be nice, either in Figure 2 or in the supplement, to see the raw EEG traces in these conditions. These would be especially instructive because, with NREM TMR, the ERPs seem to take a stereotypical pattern that begins with a clear influence of slow oscillations (e.g., in Cairney et al., 2018), and it would be helpful to show the contrast here in REM.

      We thank the reviewer for these comments. We have now performed ERP and time-frequency analyses following a similar approach to that of (Cairney et al., 2018). We have added a section in the results for these analyses as follows:

      “Elicited response pattern after TMR cues

      We looked at the TMR-elicited response in both time-frequency and ERP analyses using a method similar to the one used in (Cairney et al., 2018), see methods. As shown in Figure 2a, the EEG response showed a rapid increase in theta band followed by an increase in beta band starting about one second after TMR onset. REM sleep is dominated by theta activity, which is thought to support the consolidation process (Diekelmann & Born, 2010), and increased theta power has previously been shown to occur after successful cueing during sleep (Schreiner & Rasch, 2015). We therefore analysed the TMR-elicited theta in more detail. Focussing on the first second post-TMR-onset, we found that theta was significantly higher here than in the baseline period, prior to the cue [-300 -100] ms, for both adaptation (Wilcoxon signed rank test, n = 14, p < 0.001) and experimental nights (Wilcoxon signed rank test, n = 14, p < 0.001). The absence of any difference in theta power between experimental and adaptation conditions (Wilcoxon signed rank test, n = 14, p = 0.68), suggests that this response is related to processing of the sound cue itself, not to memory reactivation. Turning to the ERP analysis, we found a small increase in ERP amplitude immediately after TMR onset, followed by a decrease in amplitude 500ms after the cue. Comparison of ERPs from experimental and adaptation nights showed no significant difference, (n= 14, p > 0.1). Similar to the time-frequency result, this suggests that the ERPs observed here relate to the processing of the sound cues rather than any associated memory.“

      And we have updated Figure 2.

      Also, please expand the classification window beyond 1 s for wake and 1.4 s for sleep. It seems the wake axis stops at 1 s and it would be instructive to know how long that lasts beyond 1 s. The sleep signal should also go longer. I suggest plotting it for at least 5 seconds, considering prior investigations (Cairney et al., 2018; Schreiner et al., 2018; Wang et al., 2019) found evidence of reactivation lasting beyond 1.4 s.

      Regarding the classification window, this is an interesting point. TMR cues in sleep were spaced 1.5 s apart and that is why we included only this window in our classification. Extending our window beyond 1.5 s would mean that we considered the time when the next TMR cue was presented. Similarly, in wake the duration of trials was 1.1 s thus at 1.1 s the next tone was presented.

      Following the reviewer’s comment, we have extended our window as requested even though this means encroaching on the next trial. We do this because it could be possible that there is a transitional period between trials. Thus, when we extended the timing in wake and looked at reactivation in the range 0.5 s to 1.6 s we found that the effect continued to ~1.2 s vs adaptation and chance, e.g. it continued 100 ms after the trial. Results are shown in the figures below.

      Temporal compression/dilation.

      Overall assessment: This could be cut from the paper. If the authors disagree, I am curious how they think it adds novel insight.

      Line 179 section: In my opinion, this does not show evidence for compression or dilation. If anything, it argues that reactivation unfolds on a similar scale, as the numbers are clustered around 1. I suggest the authors scrap this analysis, as I do not believe it supports any main point of their paper. If they do decide to keep it, they should expand the window of dilation beyond 1.4 in Figure 3B (why cut off the graph at a data point that is still significant?). And they should later emphasize that the main conclusion, if any, is that the scales are similar.

      Line 207 section on the temporal structure of reactivation, 1st paragraph: Once again, in my opinion, this whole concept is not worth mentioning here, as there is not really any relevant data in the paper that speaks to this concept.

      We thank the reviewer for these frank comments. On consideration, we have now removed the compression/dilation analysis.

      Behavioral effects.

      Overall assessment: Please provide additional analyses and discussion.

      Lines 171-178: Nice correlation! Was there any correlation between reactivation evidence and pre-sleep performance? If so, could the authors show those data, and also test whether this relationship holds while covarying our pre-sleep performance? The logic is that intact reactivation may rely on intact pre-sleep performance; conversely, there could be an inverse relationship if sleep reactivation is greater for initially weaker traces, as some have argued (e.g., Schapiro et al., 2018). This analysis will either strengthen their conclusion or change it -- either outcome is good.

      Thanks for these interesting points. We have now performed a new analysis to check if there was a correlation between classification performance and pre-sleep performance, but we found no significant correlation (n = 14, r = -0.39, p = 0.17). We have included this in the results section as follows:

      “Finally, we wanted to know whether the extent to which participants learned the sequence during training might predict the extent to which we could identify reactivation during subsequent sleep. We therefore checked for a correlation between classification performance and pre-sleep performance to determine whether the degree of pre-sleep learning predicted the extent of reactivation, this showed no significant correlation (n = 14, r = -0.39, p = 0.17). “

      Note that we calculated the behavioural improvement while subtracting pre-sleep performance and then normalising by it for both the cued and un-cued sequences as follows:

      [(random blocks after sleep - the best 4 blocks after sleep) – (random blocks pre-sleep – the best 4 blocks pre-sleep)] / (random blocks pre-sleep – the best 4 blocks pre-sleep).

      Unlike Schönauer et al. (2017), they found a strong correspondence between REM reactivation and memory improvement across sleep; however, there was no benefit of TMR cues overall. These two results in tandem are puzzling. Could the authors discuss this more? What does it mean to have the correlation without the overall effect? Or else, is there anything else that may drive the individual differences they allude to in the Discussion?

      We have now added a discussion of this point as follows:

      “We are at a very early phase in understanding what TMR does in REM sleep, however we do know that the connection between hippocampus and neocortex is inhibited by the high levels of Acetylcholine that are present in REM (Hasselmo, 1999). This means that the reactivation which we observe in the cortex is unlikely to be linked to corresponding hippocampal reactivation, so any consolidation which occurs as a result of this is also unlikely to be linked to the hippocampus. The SRTT is a sequencing task which relies heavily on the hippocampus, and our primary behavioural measure (Sequence Specific Skill) specifically examines the sequencing element of the task. Our own neuroimaging work has shown that TMR in non-REM sleep leads to extensive plasticity in the medial temporal lobe (Cousins et al., 2016). However, if TMR in REM sleep has no impact on the hippocampus then it is quite possible that it elicits cortical reactivation and leads to cortical plasticity but provides no measurable benefit to Sequence Specific Skill. Alternatively, because we only measured behavioural improvement right after sleep it is possible that we may have missed behavioural improvements that would have emerged several days later, as we know can occur in this task (Rakowska et al., 2021).”

      Medium-level comments

      Lines 63-65: "We used two sequences and replayed only one of them in sleep. For control, we also included an adaptation night in which participants slept in the lab, and the same tones that would later be played during the experimental night were played."

      I believe the authors could make a stronger point here: their design allowed them to show that they are not simply decoding SOUNDS but actual memories. The null finding on the adaptation night is definitely helpful in ruling this possibility out.

      We agree and would like to thank the reviewer for this point. We have now included this in the text as follows: “This provided an important control, as a null finding from this adaptation night would ensure that we are decoding actual memories, not just sounds. “

      Lines 129-141: Does reactivation evidence go down (like in their prior study, Belal et al., 2018)? All they report is theta activity rather than classification evidence. Also, I am unclear why the Wilcoxon comparison was performed rather than a simple correlation in theta activity across TMR cues (though again, it makes more sense to me to investigate reactivation evidence across TMR cues instead).

      Thanks a lot for the interesting point. In our prior study (Belal et. al. 2018), the classification model was trained on wake data and then tested on sleep data, which enabled us to examine its performance at different timepoints in sleep. However in the current study the classifier was trained on sleep and tested on wake, so we can only test for differential replay at different times during the night by dividing the training data. We fear that dividing sleep trials into smaller blocks in this way will lead to weakly trained classifiers with inaccurate weight estimation due to the few training trials, and that these will not be generalisable to testing data. Nevertheless, following your comment, we tried this, by dividing our sleep trials into two blocks, e.g. the first half of stimulation during the night and the second half of stimulation during the night. When we ran the analysis on these blocks separately, no clusters were found for either the first or second halves of stimulation compared to adaptation, probably due to the reasons cited above. Hence the differences in design between the two studies mean that the current study does not lend itself to this analysis.

      Line 201: It seems unclear whether they should call this "wake-like activity" when the classifier involved training on sleep first and then showing it could decode wake rather than vice versa. I agree with the author's logic that wake signals that are specific to wake will be unhelpful during sleep, but I am not sure "wake-like" fits here. I'm not going to belabor this point, but I do encourage the authors to think deeply about whether this is truly the term that fits.

      We agree that a better terminology is needed, and have now changed this: “In this paper we demonstrated that memory reactivation after TMR cues in human REM sleep can be decoded using EEG classifiers. Such reactivation appears to be most prominent about one second after the sound cue onset. ”

      Reviewer #3 (Public Review):

      The authors investigated whether reactivation of wake EEG patterns associated with left- and right-hand motor responses occurs in response to sound cues presented during REM sleep.

      The question of whether reactivation occurs during REM is of substantial practical and theoretical importance. While some rodent studies have found reactivation during REM, it has generally been more difficult to observe reactivation during REM than during NREM sleep in humans (with a few notable exceptions, e.g., Schonauer et al., 2017), and the nature and function of memory reactivation in REM sleep is much less well understood than the nature and function of reactivation in NREM sleep. Finding a procedure that yields clear reactivation in REM in response to sound cues would give researchers a new tool to explore these crucial questions.

      The main strength of the paper is that the core reactivation finding appears to be sound. This is an important contribution to the literature, for the reasons noted above.

      The main weakness of the paper is that the ancillary claims (about the nature of reactivation) may not be supported by the data.

      The claim that reactivation was mediated by high theta activity requires a significant difference in reactivation between trials with high theta power and trials with low theta, but this is not what the authors found (rather, they have a "difference of significances", where results were significant for high theta but not low theta). So, at present, the claim that theta activity is relevant is not adequately supported by the data.

      The authors claim that sleep replay was sometimes temporally compressed and sometimes dilated compared to wakeful experience, but I am not sure that the data show compression and dilation. Part of the issue is that the methods are not clear. For the compression/dilation analysis, what are the features that are going into the analysis? Are the feature vectors patterns of power coefficients across electrodes (or within single electrodes?) at a single time point? or raw data from multiple electrodes at a single time point? If the feature vectors are patterns of activity at a single time point, then I don't think it's possible to conclude anything about compression/dilation in time (in this case, the observed results could simply reflect autocorrelation in the time-point-specific feature vectors - if you have a pattern that is relatively stationary in time, then compressing or dilating it in the time dimension won't change it much). If the feature vectors are spatiotemporal patterns (i.e., the patterns being fed into the classifier reflect samples from multiple frequencies/electrodes / AND time points) then it might in principle be possible to look at compression, but here I just could not figure out what is going on.

      Thank you. We have removed the analysis of temporal compression and dilation from the manuscript. However, we wanted to answer anyway. In this analysis, raw data were smoothed and used as time domain features. The data was then organized as trials x channels x timepoints then we segmented each trial in time based on the compression factor we are using. For instance, if we test if sleep is 2x faster than wake we look at the trial lengths in wake which was 1.1 sec. and we take half of this value which is 0.55 sec. we then take a different window in time from sleep data such that each sleep trial will have multiple smaller segments each of 0.55 sec., we then add those segments as new trials and label them with the respective trial label. Afterwards, we resize those segments temporally to match the length of wake trials. We now reshape our data from trials x channels x timepoints to trials x channels_timepoints so we aggregate channels and timepoints into one dimension. We then feed this to PCA to reduce the dimensionality of channels_timepoints into principal components. We then feed the resultant features to a LDA classifier for classification. This whole process is repeated for every scaling factor and it is done within participant in the same fashion the main classification was done and the error bars were the standard errors. We compared the results from the experimental night to those of the adaptation night.

      For the analyses relating to classification performance and behavior, the authors presently show that there is a significant correlation for the cued sequence but not for the other sequence. This is a "difference of significances" but not a significant difference. To justify the claim that the correlation is sequence-specific, the authors would have to run an analysis that directly compares the two sequences.

      Thanks a lot. We have now followed this suggestion by examining the sequence specific improvement after removing the effect of the un-cued sequence from the cued sequence. This was done by subtracting the improvement of the un-cued sequence from the improvement for the cued sequence, and then normalising the result by the improvement of the un-cued sequence. The resulting values, which we term ‘cued sequence improvement’ showed a significant correlation with classification performance (n = 14, r = 0.56, p = 0.04). We have therefore amended this section of the manuscript as follows: We have updated the text as follows: “We therefore set out to determine whether there was a relationship between the extent to which we could classify reactivation and overnight improvement on the cued sequence. This revealed a positive correlation (n = 14, r = 0.56, p = 0.04), Figure 3b.”

    1. Author response:

      Reviewer #1 (Public Review):

      In this study, Girardello et al. use proteomics to reveal the membrane tension sensitive caveolin-1 interactome in migrating cells. The authors use EM and surface rendering to demonstrate that caveolae formed at the rear of migrating cells are complex membrane-linked multilobed structures, and they devise a robust strategy to identify caveolin-1 associated proteins using APEX2-mediated proximity biotinylation. This important dataset is further validated using proximity ligation assays to confirm key interactions, and follows up with an interrogation of a surprising relationship between caveolae and RhoGTPase signalling, where caveolin-1 recruits ROCK1 under high membrane tension conditions, and ROCK1 activity is required to reform caveolae upon reversion to isotonic solution. However, caveolin-1 recruits the RhoA inactivator ARHGAP29 when membrane tension is low and ARHGAP29 overexpression leads to disassembly of caveolae and reduced cell motility. This study builds on previous findings linking caveolae to positive feedback regulation of RhoA signalling, and provides further evidence that caveolae serve to drive rear retraction in migration but also possess an intrinsic brake to limit RhoA activation, leading the authors to suggest that cycles of caveolae assembly and disassembly could thereby be central to establish a stable cell rear for persistent cell migration

      A major strength of the manuscript is the robust proteomic dataset. The experimental set up is well defined and mostly well controlled, and there is good internal validation in that the high abundance of core caveolar proteins in low membrane tension (isotonic) conditions, and absence under high membrane tension (brief hypo-osmotic shock) conditions, correlating very well with previous finding. The data could however be better presented to show where statically robust changes occur, and supplementary information should include a table of showing abundance. It's very good to see a link to PRIDE, providing a useful resource for the community.

      We thank the reviewer for the positive feedback. We have included the outputs from the search engine in Supplementary File 1.

      The authors detail several known interactions and their mechanosensitivty, but also report new interactors of caveolin-1. Several mechanosensitive interactions of caveolin-1 take place at the cell rear, but others are more diffuse across the cell looking at the PLA data (e.g FLN1, CTTN, HSPB1; Figure 4A-F and Figure 4 supplement 1). It is interesting to speculate that those at the cell rear are involved in caveolae, whilst others are linked specifically to caveolin-1 (e.g. dolines). PLA or localisation analysis with Cavin1/PTRF may be able to resolve this and further specify caveolae versus non-caveolae mechanosensitive interactions.

      We thank the reviewer for this interesting idea. It is true that many if not most proteins we identified to be associated with Cav1 are not restricted to the cell rear. To analyse to what extent the identified proteins interact with Cav1 at the rear we reanalysed our PLA data for some of the antibody combinations we looked at. This new analysis is now shown in Fig 5G. As expected, for Cav1/PTRF and Cav1/EHD2 most PLA dots (70-80%) were found at the rear. This rear bias is also evident from the representative images we show in the Figure panels 5A and 5E. On the contrary, much fewer PLA dots (~40%) were rear-localised for Cav1/CTTN and Cav1/FLNA antibody combinations. This reflects the much broader cellular distribution of these proteins compared to the core caveolae proteins, and might suggest that there are generally few links between caveolae and cortical actin. However, it is also possible that such links/interactions are more difficult to detect using PLA (because of the extended distance between caveolae and the actin cortex, or because of steric constraints).

      The Cav1/ARHGAP29 influence on YAP signalling is interesting, but appear to be quite isolated from the rest of the manuscript. Does overexpression of ARHGAP29 influence YAP signalling and/or caveolar protein expression/Cav1pY14?

      Our data and published work originally prompted us to speculate that there is a potential functional link between Cav1, YAP, and ARHGAP29. In an attempt to address this we have performed several Western blots on cell lysates from cells overexpressing ARHGAP29. We did not see major changes in Cav1 Y14 phosphorylation levels in cells overexpressing ARHGAP29, and YAP and pYAP levels also remained unchanged (not shown). In addition, based on previous literature 1,2 we expected to see an effect on ARHGAP29 mRNA levels and YAP target gene transcripts in Cav1 siRNA transfected cells. To our surprise, the mRNA levels of three independent YAP target genes and ARHGAP29 were unchanged in Cav1 siRNA treated cells (this is now shown in Figure 6 Figure Supplement 1). Our data therefore suggest that in RPE1 cells, the connection between Cav1 and ARHGAP29 is independent of YAP signalling, and that the increase in ARHGAP29 protein levels observed in Cav1 siRNA cells is due to some unknown post-translational mechanism.

      ARHGAP29 and RhoA/ROCK1 related observations are very interesting and potentially really important. However, the link between ARHGAP29 and caveolae is not well established (other than in proteomic data). PLA or FRET could help establish this.

      We agree that the physical and functional link between caveolae (or Cav1) and ARHGAP29 was not well worked out in the original manuscript. In an attempt to address this we have performed PLA assays in GFP-ARHGAP29 transfected cells (as we did not find a suitable ARHGAP29 antibody that works reliably in IF) using anti-Cav1 and anti-GFP antibodies. The PLA signal we obtained for Cav1 and ARHGAP29 was not significantly different to control PLA experiments. There was very little PLA signal to start with. This is not surprising given that ARHGAP29 localisation is mostly diffuse in the cytoplasm, whilst Cav1 is concentrated at the rear. In addition, in cases where we do see ARHGAP29 localisation at the cell cortex, Cav1 tends to be absent (this is now shown in Figure 6 – Figure Supplement 2E). In other words, with the tools we have available, we see little colocalization between Cav1 and ARHGAP29 at steady state. Altogether we speculate that ARHGAP29, through its negative effect on RhoA, flattens caveolae at the membrane or interferes with caveolae assembly at these sites.

      This of course prompts the question why ARHGAP29 was identified in the Cav1 proteome with such specificity and reproducibility in the first place? This can be explained by the way APEX2 labeling works. Proximity biotinylation with APEX2 is extremely sensitive and restricted to a labelling radius of ~20 nm 3. The labeling reaction is conducted on live and intact cells at room temperature for 1 min. Although 1 min appears short, dynamic cellular processes occur at the time scale of seconds and are ongoing during the labelling reaction. It is conceivable that within this 1 min time frame, ARHGAP29 cycles on and off the rear membrane (kiss and run). This allows ARHGAP29 to be biotinylated by Cav1-APEX2, resulting in its identification by MS. We have included this in the discussion section.

      The relationship between ARHGAP29 and RhoA signalling is not well defined. Is GAP activity important in determining the effect on migration and caveolae formation? What is the effect on RhoA activity? Alternatively, the authors could investigate YAP dependent transcriptional regulation downstream of overexpression.

      We have addressed this point using overexpression and siRNA transfections. We overexpressed ARHGAP29 or ARHGAP29 lacking its GAP domain and performed WB analysis against pMLC (which is a commonly used and reliable readout for RhoA and myosin-II activity). Much to our surprise, overexpression of ARHGAP29 increased (rather than decreased) pMLC levels, partially in a GAP-dependent manner (see Author response image 1). This is puzzling, as ARHGAP29 is expected to reduce RhoA-GTP levels, which in turn is expected to reduce ROCK activity and hence pMLC levels. In addition, and also surprisingly, siRNA-mediated silencing of ARHGAP29 did not significantly change pMLC levels. By contrast, pMLC levels were strongly reduced in Cav1 siRNA treated cells (this is shown in Fig. 6A and 6B in the revised manuscript). These new data underscore the important role of caveolae in the control of myosin-II activity, but do not allow us to draw any firm conclusions about the role of ARHGAP29 at the cell rear.

      Author response image 1.

      Overexpression of ARHGAP29 reduces, rather than increases pMLC in RPE1 cells.

      We are uncertain as to how to interpret the ARHGAP29 overexpression data presented in Author response image 1 and therefore decided not to include it in the manuscript. One possibility is that inactivation of RhoA below a certain critical threshold causes other mechanisms to compensate. For instance, the activity of alternative MLC kinases such as MLCK could be enhanced under these conditions. Another possibility is that ARHGAP29 controls MLC phosphorylation indirectly. For instance, it has been shown that ARHGAP29 promotes actin destabilization through inactivating LIMK/cofilin signalling 1. In agreement with this, we find that overexpression of ARHGAP29 reduces p-cofilin (serine 3) levels (see Author response image 2). Since cofilin and MLC crosstalk 4, it is possible that increased pMLC levels are the result of a feedback loop that compensates for the effect of actin depolymerisation. This is now discussed in the discussion section. Whichever the case, we hope the reviewers understand that deeper mechanistic insight into the intricate mechanisms of Rho signalling at the cell rear are beyond the scope of this manuscript.

      Author response image 2.

      Overexpression of ARHGAP29 reduces p-cofilin levels in RPE1.

      Reviewer #2 (Public Review):

      Girardello et al investigated the composition of the molecular machinery of caveolae governing their mechano-regulation in migrating cells. Using live cell imaging and RPE1 cells, the authors provide a spatio-temporal analysis of cavin-3 distribution during cell migration and reveal that caveolae are preferentially localized at the rear of the cell in a stable manner. They further characterize these structures using electron tomography and reveal an organization into clusters connected to the cell surface. By performing a proteomic approach, they address the interactome of caveolin-1 proteins upon mechanical stimulation by exposing RPE1 cells to hypo-osmotic shock (which aims to increase cell membrane tension) or not as a control condition. The authors identify over 300 proteins, notably proteins related to actin cytoskeleton and cell adhesion. These results were further validated in cellulo by interrogating protein-protein interactions using proximity ligation assays and hypo-osmotic shock. These experiments confirmed previous data showing that high membrane tension induces caveolae disassembly in a reversible manner. Eventually, based on literature and on the results collected by the proteomic analysis, authors investigated more deeply the molecular signaling pathway controlling caveolae assembly upon mechanical stimuli. First, they confirm the targeting of ROCK1 with Caveolin-1 and the implication of the kinase activity for caveolae formation (at the rear of the cell). Then, they show that RhoGAP ARHGAP29, a factor newly identified by the proteomic analysis, is also implicated in caveolae mechano-regulation likely through YAP protein and found that overexpression of RhoGAP ARHGAP29 affects cell motility. Overall, this paper interrogated the role of membrane tension in caveolae located at the rear of the cell and identified a new pathway controlling cell motility.

      Strengths:

      Using a proximity-based proteomic assay, the authors reveal the protein network interacting with caveolae upon mechanical stimuli. This approach is elegant and allows to identify a substantial new set of factors involved in the mechano-regulation of caveolin-1, some of which have been verified directly in the cell by PLA. This study provides a compelling set of data on the interactions between caveolae and its cortical network which was so far ill-characterized.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The methodology demonstrating an impact of membrane tension is not precise enough to directly assess a direct role on caveolae at a subcellular scale, that is between the front and the rear of the cell. First, a better characterization of the "front-rear" cellular model is encouraged.

      We agree with the reviewer that a quantitative analysis of the caveolae front-rear polarity would strengthen our conclusions. To address this, we have analysed the localisation of Cav1 and cavins in detail and in a large pool of cells, both in fixed and live cells. Our quantification clearly shows that Cav1 and cavins are enriched at the cell rear. This is now shown in Figure 1 and Figure 1 - Figure Supplement 1. To demonstrate that Cav1/cavins are truly rear-localised we analysed live migrating cells expressing tagged Cav1 or cavins. This analysis, which was performed on several individual time lapse movies, showed that caveolae rear localisation is remarkably stable (e.g. Figure 1C and 1D). We also present novel data panels and movies showing caveolae dynamics during rear retractions, in dividing cells, and in cells that polarise de novo. This new data is now described in the first paragraph of the results section.

      Secondly, authors frequently present osmotic shock as "high membrane tension" stimuli. While osmotic shock is widely used in the field, this study is focused only on caveolae localized at the rear of cell and it remains unclear how the level of a global mechanical stimuli triggered by an osmotic shock could mimic a local stimuli.

      We agree with the reviewer that osmotic shock will cause a global increase in membrane tension and therefore is only of limited value to understand how membrane tension is regulated at the rear, and how caveolae respond to such a local stimulus. It was not our aim nor is it our expertise to address such questions. To answer this sophisticated optogenetic approaches or localised membrane tension measurements (e.g. through the use of the Flipper-TR probe) are needed. It is beyond the scope of this manuscript to perform such experiments. However, given the strong enrichment of caveolae at the cell rear, we believe it is justified to propose that the changes we observe in the proteome do (mostly) reflect changes in caveolae at the rear. We have now included several quantifications on fixed cells, live cells, and PLA assays to support that caveolae are highly enriched at the rear. In addition, and importantly, a recent preprint by the Roux lab shows that membrane tension gradients indeed exist in many migrating and non-migrating cells 5. Using very similar hypotonic shock assays, the Caswell lab also showed that low membrane tension at the rear is required for caveolae formation 6. We have included a section in the discussion in which we elaborate on how membrane tension is controlled in migrating cells, and how it might regulate caveolae rear localisation.

      In the present case, it remains unknown the extent to which this mechanical stress is physiologically relevant to mimic mechanical forces applied at the rear of a migrating cell.

      This is true. Our study does not address the nature of mechanical forces at the cell rear. This a complex subject that is technically challenging to address, and therefore is beyond the scope of this manuscript.

      Some images are not satisfying to fully support the conclusions of the article.

      We agree that some of the images, in particular the ones presented for the PLA assays, do not always show a clear rear localisation of caveolae. We have explained above why this is the case. We hope that our new quantitative measurements, movies and figure panels, addresses the reviewer’s concern.

      At this stage, the lack of an unbiased quantitative analysis of the spatio-temporal analysis of caveolae upon well-defined mechanical stimuli is also needed.

      These are all very good points that were previously addressed beautifully by the Caswell group 6. To address this in part in our RPE1 cell system, we imaged RPE1 cells exposed to the ROCK inhibitor Y27632 (see Author response image 3). The data shows that cell rear retraction is impeded in response to ROCK inhibition, which is in line with several previous reports. Cavin-1 remained mostly associated with the cell rear, although the distribution appeared more diffuse. We believe this data does not add much new insight into how caveolae function at the rear, and hence was not included in the manuscript.

      Author response image 3.

      Effect of ROCK inhibition on cavin1 rear localisation and rear retraction. Cells were imaged one hour after the addition of Y27632.

      Cells on images, in particular Figure 1, are difficult to see. Signal-to noise ratio in different cell area could generate a biased. Since there is inconsistency between caveolae density and localization between Figures, more solid illustrations are needed along quantitative analysis.

      As mentioned above, we have carefully analysed the localisation of caveolae in fixed cells (using Cav1 and cavin1 antibodies as well as Cav1 and cavin fusion proteins) and in live cells transfected with various different caveolae proteins. The analysis clearly demonstrates an enrichment of caveolae at the rear (Figure 1 and Figure 1 – Figure Supplement 1). Our tomography and TEM data supports this as well (Figure 2).

      References:

      1. Qiao Y, Chen J, Lim YB, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell reports. 2017;19(8):1495-1502.

      2. Rausch V, Bostrom JR, Park J, et al. The Hippo Pathway Regulates Caveolae Expression and Mediates Flow Response via Caveolae. Curr Biol. 2019;29(2):242-255 e246.

      3. Hung V, Udeshi ND, Lam SS, et al. Spatially resolved proteomic mapping in living cells with the engineered peroxidase APEX2. Nat Protoc. 2016;11(3):456-475.

      4. Wiggan O, Shaw AE, DeLuca JG, Bamburg JR. ADF/cofilin regulates actomyosin assembly through competitive inhibition of myosin II binding to F-actin. Dev Cell. 2012;22(3):530-543.

      5. Juan Manuel García-Arcos AM, Julissa Sánchez Velázquez, Pau Guillamat, Caterina Tomba, Laura Houzet, Laura Capolupo, Giovanni D’Angelo, Adai Colom, Elizabeth Hinde, Charlotte Aumeier, Aurélien Roux. Actin dynamics sustains spatial gradients of membrane tension in adherent cells. bioRxiv 20240715603517. 2024.

      6. Hetmanski JHR, de Belly H, Busnelli I, et al. Membrane Tension Orchestrates Rear Retraction in Matrix-Directed Cell Migration. Dev Cell. 2019;51(4):460-475 e410.

      7. Tsai TY, Collins SR, Chan CK, et al. Efficient Front-Rear Coupling in Neutrophil Chemotaxis by Dynamic Myosin II Localization. Dev Cell. 2019;49(2):189-205 e186.

      8. Mueller J, Szep G, Nemethova M, et al. Load Adaptation of Lamellipodial Actin Networks. Cell. 2017;171(1):188-200 e116.

      9. De Belly H, Yan S, Borja da Rocha H, et al. Cell protrusions and contractions generate long-range membrane tension propagation. Cell. 2023.

      10. Matthaeus C, Sochacki KA, Dickey AM, et al. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 2022;13(1):7234.

      11. Sinha B, Koster D, Ruez R, et al. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 2011;144(3):402-413.

      12. Lieber AD, Schweitzer Y, Kozlov MM, Keren K. Front-to-rear membrane tension gradient in rapidly moving cells. Biophysical journal. 2015;108(7):1599-1603.

      13. Shi Z, Graber ZT, Baumgart T, Stone HA, Cohen AE. Cell Membranes Resist Flow. Cell. 2018;175(7):1769-1779 e1713.

      14. Grande-Garcia A, Echarri A, de Rooij J, et al. Caveolin-1 regulates cell polarization and directional migration through Src kinase and Rho GTPases. The Journal of cell biology. 2007;177(4):683-694.

      15. Grande-Garcia A, del Pozo MA. Caveolin-1 in cell polarization and directional migration. Eur J Cell Biol. 2008;87(8-9):641-647.

      16. Ludwig A, Howard G, Mendoza-Topaz C, et al. Molecular composition and ultrastructure of the caveolar coat complex. PLoS biology. 2013;11(8):e1001640.

    1. Author Response

      Reviewer #1 (Public Review):

      The study presented by AL Seufert et al. follows the trajectory of trained immunity research in the context of sterile inflammatory diseases such as gout, cardiovascular disease and obesity. Previous studies in mice have shown that a 4 week Western-type diet is sufficient to induce systemic trained immunity, with gross reorganization of the bone marrow to support a potentiated inflammatory response [PMID: 29328911]. The current study demonstrates that mice on a Western-type diet (WD) and the more extreme Ketogenic diet (KD; where carbohydrates are essentially eliminated from the diet) for 2 weeks results in a state of increased monocyte-driven immune responsiveness when compared to standard chow diets (SC). This increased immune responsiveness after high-fat diet resulted in a deadly hyper-inflammatory in the mice in response to endotoxin (LPS) challenge in vivo.

      These initial findings as displayed in Figure 1 are made difficult to interpret because the authors use a mix of male and female mice coupled with very small sample sizes (n = 5 - 9). Male and female mice are shown to have dimorphic responses to LPS exposure in vivo, with males having elevated cytokine levels (TNF, IL-6, IL1β, and also interesting IL-10) increased rates severe outcomes to LPS challenge [PMID: 27631979]. As a reader it is impossible to discern from their methodological description what the proportion of the sexes were in each group, and therefore cannot determine if their data are skewed or biased due to sexual dimorphic responses to LPS rather than diet. Additionally due to the very small sample sizes, the authors can't perform a stratified analysis based on sex to determine whether the diets are having the greatest effects in accordance with LPS induce inflammation.

      The Reviewer brings up an important point, all studies with endotoxemia in wild-type conventional mice were carried out in 6–8-week female BALB/c mice, as mentioned in the Methods section under “Ethical approval of animal studies” and “endotoxin-induced model of sepsis” sections. This is extremely important to mention more clearly in the results text, because the Reviewer 1 is correct, sexual dimorphism and age differences can have very large effects on LPS treatment outcome. This was not stated clearly enough in the results and now the age, sex, and background of mice have been explicitly stated in each Results and Figure Legend section for each experiment.

      When comparing SC to the KD, the authors identify large changes in fatty acid distribution circulating in the blood. The majority of the fatty acids were shown to relate to saturated fatty acids (SFA). Although Lauric, Myristic, and Myristovaccenic acid where the most altered after KD, the authors focus their research on the more thoroughly studied palmitic acid (PA).

      We followed up on multiple saturated fatty acids (SFAs; Myristic, Lauric, and Behenic acid) that were identified in the lipidomic data, and found no robust or repeatable phenotypes in vitro using physiologically relevant concentrations. The inability to reproduce some of the findings with these SFAs may be due to the instability of some of these fats in solution, and plan to troubleshoot these assays in order to understand the complexity of SFA-dependent control of inflammation in macrophages. Please see Fig. R1 in this document for data showing LPS-stimulated BMDMs pre-treated with Myristic (Fig R1 A-C), Lauric (Fig R1 D-F), or Behenic (Fig R1 G-I) fatty acids. The physiological concentrations used in these studies were referenced from Perreault et. al., 2014.

      Figure R1. The effect of Myristic Acid, Lauric Acid, and Behenic Acid on the response to LPS in macrophages. Primary bone marrowderived macrophages (BMDMs) were isolated from aged-matched (6-8 wk) C57BL/6 female and male mice. BMDMs were plated at 1x106 cells/mL and treated with either ethanol (EtOH; media with 0.05% or 0.35% ethanol to match MA and LA solutions respectively), media (Ctrl), LPS (10 ng/mL) for 24 h, or myristic or lauric acid (MA, LA stock diluted in 0.05%, or 0.35% EtOH; conjugated to 2% BSA) for 24 h, with and without a secondary challenge with LPS (10 ng/mL). After indicated time points, RNA was isolated and expression of (A, B) tnf, (D, E) il- 6, and (G, H) il-1β was measured via qRT-PCR. RAW 264.7 macrophages were thawed and cultured for 3-5 days, pelleted and resuspended in DMEM containing 5% FBS and 2% BSA, and treated identical to BMDM treatments with behenic acid (BA stock diluted in 1.7% EtOH) used as the primary stimulus. (C) tnf, (F) il-6, and (I) il-1β was measured via qRT-PCR. For all plates, all treatments were performed in triplicate. For all panels, a student’s t-test was used for statistical significance. p< 0.05; p < 0.01; **p< 0.001. Error bars shown mean ± SD.

      PA was shown to increase the expression of inflammatory cytokines gene expression and protein production of TNF, IL-6 and IL-1β in bone marrow derived macrophages (BMDMs). The authors tie these effects to ceramide synthesis through a pharmacological blockade as well as the use of oleic acid, which allegedly sequesters ceramide synthesis. The author's claim that oleic acid supplementation reverses the inflammatory signaling induced by PA is invalid, as oleic acid was shown to induce a high level of cytokines in their model. When PA was added along with oleic acid, the cytokine levels returned to the levels produced by BMDM's stimulated with PA alone (see Figure 4 panels D- F).

      This was an unfortunate oversight in our revisions of this manuscript, original Figure 5A-C was mislabeled (though colored the correct colors) – OA-12h → LPS-24h should have been switched with PA-12h → LPS-24h. These data were labeled correctly in the source file: Source_data_Fig5 and have since been updated in Figure 5 of the manuscript with correct labels. The corrected graphs have been split up in the resubmission in light of new data collected. Please see Fig 3K-M and Fig 5A-C.

      Finally the authors test whether injection of PA into mice can recapitulate the systemic inflammatory response seen by WD and KD feeding followed by LPS exposure. They were able to demonstrate that injecting 1 mM of PA, waiting for 12h, and then exposing the mice to LPS for 24h could similarly result in a hyper-inflammatory state resulting in greater mortality. The reviewer is skeptical that 1 mM of PA truly represents post-prandial PA levels as one would expect to see after a single fatty meal, and whether this injection is generally well tolerated by mice. Looking into the paper cited by Eguchi et al. to inform their methods, it's shown that the earlier study continuously infused an emulsified ethyl palmitate solution (which contained 600 mM) at a rate of 0.2 uL/min. As far as I can read by Eguchi, they only managed to reach a serum PA concentration of 0.5 mM. This is hardly the same thing as a single i.p. injection of 1 mM PA. and reflects a single bolus injection of double the serum concentration of PA achieved by Eguchi et al.

      The reviewer brings up an important point, Eguchi et al. did use infusions. From their data (Fig 1A), we calculated that after 600mM of i.v. injection (total = 267uL within 14h; 0.2L/min) there was ~420uM absolute PA within the blood. They were using C57BL/6 mice that were 23g on average. Using these results, we extrapolated that one single 200uL injection of a 750mM PA solution within 6–8-week female BALB/c mice (~15-18g) would equate to ~500-1mM of PA within the blood. Considering obese healthy and unhealthy humans vary widely in total PA concentrations in the blood (0.3-4.1 mM) (1, 2), we moved forward with these calculations. Considering this, we thank the reviewer for this advice, and we agree that we have not definitively shown we are increasing systemic levels of PA. Thus, we ran a lipidomic analysis of serum from SC-fed mice with Veh or PA for 12 h. We show that a 750 mM i.p. injection of ethyl palmitate enhances free PA levels in the serum to 173-425 μM at 2 h post-injection, which is within the reported range for humans on high-fat diets (0.34.1mM). We have added this new data to Fig. S7A of the main manuscript.

      Importantly, the concentration in the PA-treated mice is greater than that of the Veh-treated mice, however we believe the value shown is an underestimate of maximum serum PA levels enhanced by i.p. injection, because free PA is known to be packaged into chylomicrons within enterocytes and travel through the circulation with a half-life of less than an hour (3, 4). Thus, serum concentrations of free PA are only transiently enhanced by i.p. injection, and is quickly taken up by adipose tissue, skeletal muscle, heart, and liver tissue. These complex lipid transport processes make it difficult to determine maximum concentrations of free PA in the serum.

      While all of the details concerning PA circulation following an i.p. injection are unknown, we suggest that this method of “force-feeding” is similar to dietary intake in that uptake of PA into the circulation occurs within the peritoneal space prior to traveling to the blood via the thoracic duct and right lymphatic duct (5).

      PA is known to induce inflammation in monocytes and macrophages, therefore the findings certainly make sense in the context of previously published literature. However the authors have made some poor methodological decisions in their mouse studies, namely haphazardly switching between groups of young and old mice (4-6 weeks, 8-9 weeks, and 14-23 weeks), using different LPS injection protocols (6, 10, and 50 mg/ml of LPS), and including multiple sexes of mice. All of which are drastically alter the interpretation of the data, and preventing solid conclusions from being drawn.

      We appreciate this review and suggest that:

      1) For the LPS models, mice were all female and aged matched between 6-8 weeks. We are aware of sex differences in the endotoxemia model, which is why we specifically use female mice in our studies (6, 7). This is mentioned twice in the methods under the sections “Endotoxin-induced model of sepsis” and “Ethical approval of animal studies”. We have added these specifics of our model to all Results and Figure Legend sections for clarification.

      2) For Germ-free models, it is notoriously difficult to breed C57BL/6 germ-free mice. It was inherently difficult to obtain enough mice within the same sex and age to carry out these experiments, however since we have published in this model before with mixed sex and age we were aware that our WD phenotype is robust enough in these backgrounds (7). Further, we believe that seeing our robust phenotype independent of age or sex within germ-free mice provides more evidence of the strength of this phenotype. It is important to note that we induce endotoxemia within Germ-free mice with 50mg/kg, instead of 6mg/kg which is used in conventional mice, because this is our reported LD50 for mixed sex Germ-free C57BL/6, as we have published previously in detail (7). This difference is due to the presence of the microbiota (8, 9) and also germ-free mice have an immature immune system that correlates with a hyporesponsiveness to microbial products (10-12). We agree with the reviewer that the ages of the C57BL/6 germ-free mice are significantly older than our conventional 6-8 week mice, thus we confirmed that WD- and KD-fed conventional C57BL/6 female mice aged 20 – 21 weeks old still show enhanced disease severity and mortality in an LPS-induced endotoxemia model, compared to mice fed SC (Fig. S1G-H).

      Figure R2. PA treatment enhances survival in both female and male RAG-/- mice. Age-matched (8-9 wk) RAG-/- mice were injected i.v. with ethyl palmitate (PA, 750mM) or vehicle (Veh) solutions 12 h before C. albicans infection. Survival was monitored for 40h post-infection.

      3) In our preliminary results, we stratified survival during C. albicans infection between male and female C57BL/6 and found no notable difference in survival at 40h post IP infection with Candida albicans (Fig R2 A-B). However, the data presented in the manuscript on CFU is female kidney burden and we do not have data on fungal burden within male mice. This is an important piece of data that we would like to collect for understanding sex differences in the PA-dependent enhanced resistance to systemic C. albicans. We are currently addressing this question within the lab as well as elucidating the cell type and mechanism of PA-dependent enhanced fungal resistance.

    1. Author Response

      Reviewer #1 (Public Review):

      Esmaily and colleagues report two experimental studies in which participants make simple perceptual decisions, either in isolation or in the context of a joint decision-making procedure. In this "social" condition, participants are paired with a partner (in fact, a computer), they learn the decision and confidence of the partner after making their own decision, and the joint decision is made on the basis of the most confident decision between the participant and the partner. The authors found that participants' confidence, response times, pupil dilation, and CPP (i.e. the increase of centro-parietal EEG over time during the decision process) are all affected by the overall confidence of the partner, which was manipulated across blocks in the experiments. They describe a computational model in which decisions result from a competition between two accumulators, and in which the confidence of the partner would be an input to the activity of both accumulators. This model qualitatively produced the variation in confidence and RTs across blocks.

      The major strength of this work is that it puts together many ingredients (behavioral data, pupil and EEG signals, computational analysis) to build a picture of how the confidence of a partner, in the context of joint decision-making, would influence our own decision process and confidence evaluations. Many of these effects are well described already in the literature, but putting them all together remains a challenge.

      We are grateful for this positive assessment.

      However, the construction is fragile in many places: the causal links between the different variables are not firmly established, and it is not clear how pupil and EEG signals mediate the effect of the partner's confidence on the participant's behavior.

      We have modified the language of the manuscript to avoid the implication of a causal link.

      Finally, one limitation of this setting is that the situation being studied is very specific, with a joint decision that is not the result of an agreement between partners, but the automatic selection of the most confident decisions. Thus, whether the phenomena of confidence matching also occurs outside of this very specific setting is unclear.

      We have now acknowledged this caveat in the discussion in line 485 to 504. The final paragraph of the discussion now reads as follows:

      “Finally, one limitation of our experimental setup is that the situation being studied is confined to the design choices made by the experimenters. These choices were made in order to operationalize the problem of social interaction within the psychophysics laboratory. For example, the joint decisions were not made through verbal agreement (Bahrami et al., 2010, 2012). Instead, following a number of previous works (Bang et al., 2017, 2020) joint decisions were automatically assigned to the most confident choice. In addition, the partner’s confidence and choice were random variables drawn from a distribution prespecified by the experimenter and therefore, by design, unresponsive to the participant’s behaviour. In this sense, one may argue that the interaction partner’s behaviour was not “natural” since they did not react to the participant's confidence communications (note however that the partner’s confidence and accuracy were not entirely random but matched carefully to the participant’s behavior prerecorded in the individual session). How much of the findings are specific to these experimental setting and whether the behavior observed here would transfer to real-life settings is an open question. For example, it is plausible that participants may show some behavioral reaction to a human partner’s response time variations since there is some evidence indicating that for binary choices such as those studied here, response times also systematically communicate uncertainty to others (Patel et al., 2012). Future studies could examine the degree to which the results might be paradigm-specific.”

      Reviewer #2 (Public Review):

      This study is impressive in several ways and will be of interest to behavioral and brain scientists working on diverse topics.

      First, from a theoretical point of view, it very convincingly integrates several lines of research (confidence, interpersonal alignment, psychophysical, and neural evidence accumulation) into a mechanistic computational framework that explains the existing data and makes novel predictions that can inspire further research. It is impressive to read that the corresponding model can account for rather non-intuitive findings, such as that information about high confidence by your collaborators means people are faster but not more accurate in their judgements.

      Second, from a methodical point of view, it combines several sophisticated approaches (psychophysical measurements, psychophysical and neural modelling, electrophysiological and pupil measurements) in a manner that draws on their complementary strengths and that is most compelling (but see further below for some open questions). The appeal of the study in that respect is that it combines these methods in creative ways that allow it to answer its specific questions in a much more convincing manner than if it had used just either of these approaches alone.

      Third, from a computational point of view, it proposes several interesting ways by which biologically realistic models of perceptual decision-making can incorporate socially communicated information about other's confidence, to explain and predict the effects of such interpersonal alignment on behavior, confidence, and neural measurements of the processes related to both. It is nice to see that explicit model comparison favor one of these ways (top-down driving inputs to the competing accumulators) over others that may a priori have seemed more plausible but mechanistically less interesting and impactful (e.g., effects on response boundaries, no-decision times, or evidence accumulation).

      Fourth, the manuscript is very well written and provides just the right amount of theoretical introduction and balanced discussion for the reader to understand the approach, the conclusions, and the strengths and limitations.

      Finally, the manuscript takes open science practices seriously and employed preregistration, a replication sample, and data sharing in line with good scientific practice.

      We are grateful to the reviewer for their positive assessment of our work.

      Having said all these positive things, there are some points where the manuscript is unclear or leaves some open questions. While the conclusions of the manuscript are not overstated, there are unclarities in the conceptual interpretation, the descriptions of the methods, some procedures of the methods themselves, and the interpretation of the results that make the reader wonder just how reliable and trustworthy some of the many findings are that together provide this integrated perspective.

      We hope that our modifications and revisions in response to the criticisms listed below will be satisfactory. To avoid redundancies, we have combined each numbered comment with the corresponding recommendation for the Authors.

      First, the study employs rather small sample sizes of N=12 and N=15 and some of the effects are rather weak (e.g., the non-significant CPP effects in study 1). This is somewhat ameliorated by the fact that a replication sample was used, but the robustness of the findings and their replicability in larger samples can be questioned.

      Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Strikingly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings because, like the reviewer, we believe in the importance of adequate sampling. We increased our sample size to N=15 participants to enhance the reliability of our findings. However, we acknowledge the limitation of generalizing to larger samples, which we have now discussed in our revised manuscript and included a cautionary note regarding further generalizations.

      To complement our results and add a measure of their reliability, here we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1 (see table and graph pasted below). The results showed that N=13 would be an adequate sample size for 80% power for behavoural and eye-tracking measurements. Power analysis for the EEG measurements indicated that we needed N=17. Combining these power analyses. Our sample size of N=15 for Study 2 was therefore reasonably justified.

      We have now added a section to the discussion (Lines 790-805) that communicates these issues as follows:

      “Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Importantly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings in a new sample with N=15 participants to enhance the reliability of our findings and examine our hypothesis in a stringent discovery-replication design. In Figure 4-figure supplement 5, we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1.”

      We conducted Monte Carlo simulations to determine the sample size required to achieve sufficient statistical power (80%) (Szucs & Ioannidis, 2017). In these simulations, we utilized the data from study 1. Within each sample size (N, x-axis), we randomly selected N participants from our 12 partpincats in study 1. We employed the with-replacement sampling method. Subsequently, we applied the same GLMM model used in the main text to assess the dependency of EEG signal slopes on social conditions (HCA vs LCA). To obtain an accurate estimate, we repeated the random sampling process 1000 times for each given sample size (N). Consequently, for a given sample size, we performed 1000 statistical tests using these randomly generated datasets. The proportion of statistically significant tests among these 1000 tests represents the statistical power (y-axis). We gradually increased the sample size until achieving an 80% power threshold, as illustrated in the figure.The the number indicated by the red circle on the x axis of this graph represents the designated sample size.

      Second, the manuscript interprets the effects of low-confidence partners as an impact of the partner's communicated "beliefs about uncertainty". However, it appears that the experimental setup also leads to greater outcome uncertainty (because the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners) and response uncertainty (because subjects need to consider not only their own confidence but also how that will impact on the low-confidence partner). While none of these other possible effects is conceptually unrelated to communicated confidence and the basic conclusions of the manuscript are therefore valid, the reader would like to understand to what degree the reported effects relate to slightly different types of uncertainty that can be elicited by communicated low confidence in this setup.

      We appreciate the reviewer’s advice to remain cautious about the possible sources of uncertainty in our experiment. In the Discussion (lines 790-801) we have now added the following paragraph.

      “We have interpreted our findings to indicate that social information, i.e. partner’s confidence, impacts the participants beliefs about uncertainty. It is important to underscore here that, similar to real life, there are other sources of uncertainty in our experimental setup that could affect the participants' belief. For example, under joint conditions, the group choice is determined through the comparison of the choices and confidences of the partners. As a result, the participant has a more complex task of matching their response not only with their perceptual experience but also coordinating it with the partner to achieve the best possible outcome. For the same reason, there is greater outcome uncertainty under joint vs individual conditions. Of course, these other sources of uncertainty are conceptually related to communicated confidence but our experimental design aimed to remove them, as much as possible, by comparing the impact of social information under high vs low confidence of the partner.”

      In addition to the above, we would like to clarify one point here with specific respect to the comment. Note that the computer-generated partner’s accuracy was identical under high and low confidence. In addition, our behavioral findings did not show any difference in accuracy under HCA and LCA conditions. As a consequence, the argument that “the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners)” is not valid because the low-confidence partner’s performance is identical to that of the high-confidence partner. It is possible, of course, that we have misunderstood the reviewer’s point here and we would be happy to discuss this further if necessary.

      Third, the methods used for measurement, signal processing, and statistical inference in the pupil analysis are questionable. For a start, the methods do not give enough details as to how the stimuli were calibrated in terms of luminance etc so that the pupil signals are interpretable.

      Here we provide in Author response image 1 the calibration plot for our eye tracking setup, describing the relationship between pupil size and display luminance. Luminance of the random dot motion stimuli (ie white dots on black background) was Cd/m2 and, importantly, identical across the two critical social conditions. We hope that this additional detail satisfies the reviewer’s concern. For the purpose of brevity, we have decided against adding this part to the manuscript and supplementary material.

      Author response image 1.

      Calibration plot for the experimental setup. Average pupil size (arbitrary units from eyelink device) is plotted against display luminance. The plot is obtained by presenting the participant with uniform full screen displays with 10 different luminance levels covering the entire range of the monitor RGB values (0 to 255) whose luminance was separately measured with a photometer. Each display lasted 10 seconds. Error bars are standard deviation between sessions.

      Moreover, while the authors state that the traces were normalized to a value of 0 at the start of the ITI period, the data displayed in Figure 2 do not show this normalization but different non-zero values. Are these data not normalized, or was a different procedure used? Finally, the authors analyze the pupil signal averaged across a wide temporal ITI interval that may contain stimulus-locked responses (there is not enough information in the manuscript to clearly determine which temporal interval was chosen and averaged across, and how it was made sure that this signal was not contaminated by stimulus effects).

      We have now added the following details to the Methods section in line 1106-1135.

      “In both studies, the Eye movements were recorded by an EyeLink 1000 (SR- Research) device with a sampling rate of 1000Hz which was controlled by a dedicated host PC. The device was set in a desktop and pupil-corneal reflection mode while data from the left eye was recorded. At the beginning of each block, the system was recalibrated and then validated by 9-point schema presented on the screen. For one subject was, a 3-point schema was used due to repetitive calibration difficulty. Having reached a detection error of less than 0.5°, the participants proceeded to the main task. Acquired eye data for pupil size were used for further analysis. Data of one subject in the first study was removed from further analysis due to storage failure.

      Pupil data were divided into separate epochs and data from Inter-Trials Interval (ITI) were selected for analysis. ITI interval was defined as the time between offset of trial (t) feedback screen and stimulus presentation of trial (t+1). Then, blinks and jitters were detected and removed using linear interpolation. Values of pupil size before and after the blink were used for this interpolation. Data was also mid-pass filtered using a Butterworth filter (second order,[0.01, 6] Hz)[50]. The pupil data was z-scored and then was baseline corrected by removing the average of signal in the period of [-1000 0] ms interval (before ITI onset). For the statistical analysis (GLMM) in Figure 2, we used the average of the pupil signal in the ITI period. Therefore, no pupil value is contaminated by the upcoming stimuli. Importantly, trials with ITI>3s were excluded from analysis (365 out of 8800 for study 1 and 128 out 6000 for study 2. Also see table S7 and Selection criteria for data analysis in Supplementary Materials)”

      Fourth, while the EEG analysis in general provides interesting data, the link to the well-established CPP signal is not entirely convincing. CPP signals are usually identified and analyzed in a response-locked fashion, to distinguish them from other types of stimulus-locked potentials. One crucial feature here is that the CPPs in the different conditions reach a similar level just prior to the response. This is either not the case here, or the data are not shown in a format that allows the reader to identify these crucial features of the CPP. It is therefore questionable whether the reported signals indeed fully correspond to this decision-linked signal.

      Fifth, the authors present some effective connectivity analysis to identify the neural mechanisms underlying the possible top-down drive due to communicated confidence. It is completely unclear how they select the "prefrontal cortex" signals here that are used for the transfer entropy estimations, and it is in fact even unclear whether the signals they employ originate in this brain structure. In the absence of clear methodical details about how these signals were identified and why the authors think they originate in the prefrontal cortex, these conclusions cannot be maintained based on the data that are presented.

      Sixth, the description of the model fitting procedures and the parameter settings are missing, leaving it unclear for the reader how the models were "calibrated" to the data. Moreover, for many parameters of the biophysical model, the authors seem to employ fixed parameter values that may have been picked based on any criteria. This leaves the impression that the authors may even have manually changed parameter values until they found a set of values that produced the desired effects. The model would be even more convincing if the authors could for every parameter give the procedures that were used for fitting it to the data, or the exact criteria that were used to fix the parameter to a specific value.

      Seventh, on a related note, the reader wonders about some of the decisions the authors took in the specification of their model. For example, why was it assumed that the parameters of interest in the three competing models could only be modulated by the partner's confidence in a linear fashion? A non-linear modulation appears highly plausible, so extreme values of confidence may have much more pronounced effects. Moreover, why were the confidence computations assumed to be finished at the end of the stimulus presentation, given that for trials with RTs longer than the stimulus presentation, the sensory information almost certainly reverberated in the brain network and continued to be accumulated (in line with the known timing lags in cortical areas relative to objective stimulus onset)? It would help if these model specification choices were better justified and possibly even backed up with robustness checks.

      Eight, the fake interaction partners showed several properties that were highly unnatural (they did not react to the participant's confidence communications, and their response times were random and thus unrelated to confidence and accuracy). This questions how much the findings from this specific experimental setting would transfer to other real-life settings, and whether participants showed any behavioral reactions to the random response time variations as well (since several studies have shown that for binary choices like here, response times also systematically communicate uncertainty to others). Moreover, it is also unclear how the confidence convergence simulated in Figure 3d can conceptually apply to the data, given that the fake subjects did not react to the subject's communicated confidence as in the simulation.

    1. Author Response

      Reviewer #1 (Public Review):

      This work by Shen et al. demonstrates a single molecule imaging method that can track the motions of individual protein molecules in dilute and condensed phases of protein solutions in vitro. The authors applied the method to determine the precise locations of individual molecules in 2D condensates, which show heterogeneity inside condensates. Using the time-series data, they could obtain the displacement distributions in both phases, and by assuming a two-state model of trapped and mobile states for the condensed phase, they could extract diffusion behaviors of both states. This approach was then applied to 3D condensate systems, and it was shown that the estimates from the model (i.e., mobile fraction and diffusion coefficients) are useful to quantitatively compare the motions inside condensates. The data can also be used to reconstruct the FRAP curves, which experimentally quantify the mobility of the protein solution.

      This work introduces an experimental method to track single molecules in a protein solution and analyzes the data based on a simple model. The simplicity of the model helps a clear understanding of the situation in a test tube, and I think that the model is quite useful in analyzing the condensate behaviors and it will benefit the field greatly. However, the manuscript in its current form fails to situate the work in the right context; many previous works are omitted in this manuscript, exaggerating the novelty of the work. Also, the two- state model is simple and useful, but I am concerned about the limits of the model. They extract the parameters from the experimental data by assuming the model. It is also likely that the molecules have a continuum between fully trapped and fully mobile states, and that this continuum model can also explain the experimental data well.

      We thank the reviewer for the warm overview of our work and the insightful comments on the areas that need to be improved. We are very encouraged by the reviewer’s general positive assessment of our approach. We have addressed these comments in the revised manuscript

      Reviewer #2 (Public Review):

      In this paper, Shen and co-workers report the results of experiments using single particle tracking and FRAP combined with modeling and simulation to study the diffusion of molecules in the dense and dilute phases of various kinds of condensates, including those with strong specific interactions as well as weak specific interactions (IDR-driven). Their central finding is that molecules in the dense phase of condensates with strong specific interactions tend to switch between a confined state with low diffusivity and a mobile state with a diffusivity that is comparable to that of molecules in the dilute phase. In doing so, the study provides experimental evidence for the effect of molecular percolation in biomolecular condensates.

      Overall, the experiments are remarkably sophisticated and carefully performed, and the work will certainly be a valuable contribution to the literature. The authors' inquiry into single particle diffusivity is useful for understanding the dynamics and exchange of molecules and how they change when the specific interaction is weak or strong. However, there are several concerns regarding the analysis and interpretation of the results that need to be addressed, and some control experiments that are needed for appropriate interpretation of the results, as detailed further below.

      We thank the reviewer for the warm support of our work (assessing that our work is “remarkably sophisticated and carefully performed” and “will certainly be a valuable contribution”) and for the constructive comments/critiques, which we have now addressed in the revised manuscript (please refer to our detailed responses below).

      (1) The central finding that the molecules tend to experience transiently confined states in the condensed phase is remarkable and important. This finding is reminiscent of transient "caging"/"trapping" dynamics observed in diverse other crowded and confined systems. Given this, it is very surprising to see the authors interpret the single-molecule motion as being 'normal' diffusion (within the context of a two-state diffusion model), instead of analyzing their data within the context of continuous time random walks or anomalous diffusion, which is generally known to arise from transient trapping in crowded/confined systems. It is not clear that interpreting the results within the context of simple diffusion is appropriate, given their general finding of the two confined and mobile states. Such a process of transient trapping/confinement is known to lead to transient subdiffusion at short times and then diffusive behavior at sufficiently long times. There is a hint of this in the inset of Fig 3, but these data need to be shown on log-log axes to be clearly interpreted. I encourage the authors to think more carefully and critically about the nature of the diffusive model to be used to interpret their results.

      We thank the reviewer for the insightful comments and suggestions, which have been very helpful for us to think deeper about the experimental data and the possible underlying mechanism of our findings. Indeed, the phase separated systems studied here resemble previously studied crowed and confined systems with transient caging/trapping dynamics in the literature ((Akimoto et al., 2011; Bhattacharjee and Datta, 2019; Wong et al., 2004) for examples)(references have been added in the revised manuscript). In our PSD system in Figure 3, The caging/trapping of NR2B in the condensed phase is likely due to its binding to the percolated PSD network. Thus, NR2B molecules in the condensed phase should undergo subdiffusive motions. Indeed, from our single molecule tracking data, the motion of NR2B fits well with the continuous time random walk (CTRW) model, as surmised by this reviewer. We have now fitted the MSD curve of all tracks of NR2B in the condensed phase with an anomalous diffusion model: MSD(t)=4Dtα (see Response Figure 1 below). The fitted α is 0.74±0.03, indicating that NR2B molecules in the condensed phase indeed undergo sub- diffusive motions. The fitted diffusion coefficient D is 0.014±0.001 μm2/s. We have now replaced the Brownian motion fitting in Figure 3E in the original manuscript with this sub- diffusive model fitting in the revised manuscript to highlight the complexity of NR2B diffusion in PSD condensed phase we observed.

      Response Figure 1: Fitted the MSD curve (mean value as red dot with standard error as error bar) in condensed phase with an anomalous diffusion model (blue curve, MSD=4Dtα). The fitting gives D=0.014±0.001 μm2/s and α=0.74±0.03.

      We find it useful to interpret the apparent diffusion coefficient (D=0.014±0.001 μm2/s) derived from this particular anomalous diffusion model as containing information of NR2B motions in a broadly construed mobile state (i.e., corresponding to the network unbound form) as well as in a broadly construed confined state (i.e., corresponding to NR2B molecules bound to percolated PSD networks). The global fitting using the sub-diffusive model does not pin down motion properties of NR2B in these different motion states. This is why we used, at least as a first approximation, the two-state motion switch model (HMM model) to analyse our data (please refer also to our detailed response to the comment #7 from reviewer 1 and corresponding additional analyses made during the revision as highlighted in Response Figure 4).

      As described in our response to the comment points #4 and #7 from reviewer 1, the two- state model is most likely a simplification of NR2B motions in the condensed phase. Both the mobile state and the confined state in our simplified interpretative framework likely represent ensemble averages of their respective motion states. However, the tracking data available currently do not allow us to further distinguish the substates, but further analysis using more refined model in the future may provide more physical insight, as we now emphasize in the revised “Discussion” section: “With this in mind, the two motion states in our simple two-state model for condensed-phase dynamics should be understood to be consisting of multiple sub-states. For instance, one might envision that the percolated molecular network in the condensed phase is not uniform (e.g., existence of locally denser or looser local networks) and dynamic (i.e., local network breaking and forming). Therefore, individual proteins binding to different sub-regions of the network will have different motion properties/states. … In light of this basic understanding, the “confined state” and “mobile state” as well as the derived diffusion coefficients in this work should be understood as reflections of ensemble-averaged properties arising from such an underlying continuum of mobilities. Further development of experimental techniques in conjunction with more refined models of anomalous diffusion (Joo et al., 2020; Kuhn et al., 2021; Muñoz-Gil et al., 2021) will be necessary to characterize these more subtle dynamic properties and to ascertain their physical origins” (p.23 of the revised manuscript).

      A practical reason for using the two-state motion switch HMM model to analyse our tracking data in the condensed phase is that the lifetime of the putative mobile state (when the per-frame molecular displacements are relatively large) is very short and such relatively faster short trajectories are interspersed by long confined states (see Response Figure 4C for an example). Statistically, ascertaining a particular anomalous diffusion model by fitting to such short tracks is likely not reliable. Therefore, here we opted for a semi-quantitative interpretative framework by using fitted diffusion coefficients in a two-state HMM as well as the new correlation-based approach for demarcating a low-mobility state and a high- mobility state (see our detailed response to reviewer 1’s point #7) in the present manuscript (which is quite an extensive study already) while leaving refinements of our computational modelling to future effort.

      Even in the context of the 'normal' two-state diffusion model they present, if they wish to stick with that-although it seems inappropriate to do so-can the authors provide some physical intuition for what exactly sets the diffusivities they extract from their data. (0.17 and 0.013 microns squared per second for the mobile and confined states). Can these be understood using e.g., the Stoke-Einstein or Ogston models somehow?

      As stated above, we are in general agreement with this reviewer that the motion of NR2B in the condensed phase is more complex than the simple two-state picture we adopted as a semi-quantitative interpretation that is adequate for our present purposes. Within the multi-pronged analysis we have performed thus far, NR2B molecules clearly undergo anomalous diffusions in solution containing dense, percolated, and NR2B-binding molecular networks. As a first approximation, our simple two-state HMM analysis yielded two simple diffusion coefficients (0.17 μm2/s for the mobile state and 0.013 μm2/s for the confined state). For the diffusion coefficient in the mobile state, we regard it as providing a time scale for relatively faster diffusive motions (which may be further classified into various motion substates in the future) that are not bound or only weakly associated with the percolated network of strong interactions in the PSD condensed phase. For the confined or low-mobility state in our present formulation, these molecules are likely bound relatively tightly to the percolated networks, thus the diffusion coefficient should be much smaller than the unbounded form (i.e., the mobile state) according to the Stoke-Einstein model. However, due to the detection limitation of the supper resolution imaging method (resolution of ~20 nm), we could not definitively tell the actual diffusivity beyond the resolution limit. So the diffusion coefficient in the confined state can also be interpreted as a Gaussian distributed microscope detection error (𝑓(𝑥) =1 , which is x~N(0, σ2), where σ is the standard deviation of the Gaussian distribution viewed as the resolution of localization-based microscopy, x is the detection error between recorded localization and molecule’s actual position). The track length in the confined state is the distance between localizations in consecutive frames, which can be calculated by subtraction of two independent Gaussian distributions, and the distribution of this track length (r) will be r~N(0, 2σ2). To link the detection error with the fitted diffusion coefficient, we calculated the log likelihood function of Gaussian distributed localization error (, where σ is the standard deviation of the Gaussian distribution) for the maximum likelihood estimation process to fit the HMM model. The random walk shares a similar log likelihood term () in performing maximum likelihood estimation.

      These two log likelihood functions will produce same fitting results with 2σ2 equivalent to 4Dt according to the likelihood function. In this way, the diffusion coefficient yielded by our HMM analyses for the confined state (0.0127 μm2/s) can be interpreted as the standard deviation of localization detection error (or microscope resolution limit), which is 𝜎 =√2𝐷𝑡 = 19.5 𝑛𝑚. We have included this consideration as an alternate interpretation of the confined-state or low-mobility motions with the results now provided in the “Materials and Methods” section in the sentence, viz., “… the L-component distribution may be reasonably fitted (albeit with some deviations, see below) to a simple-diffusion functional form with a parameter s =13.6 ± 3.7 nm, where s may be interpreted as a microscope detection error due to imaging limits or alternately expressed as s = DLt with DL = 0.006149 μm2/s being the fitted confined-state diffusion coefficient and t = 0.03s is the time interval of the time step between experimental frames. (The HMM-estimated confined-state Dc = 0.0127 μm2/s corresponds to s = 19.5 nm)” (p.32 of the revised manuscript).

      (2) Equation 1 (and hence equation 2) is concerning. Consider a limit when P_m=1, that is, in the condensed phase, there are no confined particles, then the model becomes a diffusion equation with spatially dependent diffusivity, \partial c /\partial t = \nabla * (D(x) \nabla c). The molecules' diffusivity D(x) is D_d in the dilute phase and D_m in the condensed phase. No matter what values D_d and D_m are, at equilibrium the concentration should always be uniform everywhere. According to Equation 1, the concentration ratio will be D_d/D_m, so if D_d/D_m \neq 1, a concentration gradient is generated spontaneously, which violates the second law of thermodynamics. Can the authors please justify the use of this equation?

      Indeed, the derivation of Equation 1 appears to be concerning. The flux J is proportional to D * dc/dx (not kDc as in the manuscript). At equilibrium dc/dx = 0 on both sides and c is constant everywhere. Can the authors please comment?

      So then another question is, why does the Monte Carlo simulation result agree with Equation 1? I suspect this has to do with the behavior of particles crossing the boundary. Consider another limit where D_m = 0, that is, particles freeze in the condensed phase. If once a particle enters the condensed phase, it cannot escape, then eventually all particles will end up in the condensed phase and EF=infty. The authors likely used this scheme. But as mentioned above this appears to violate the second law.

      Thanks for the incisive comment. After much in-depth considerations, we are in agreement with the reviewer that Eq.1 should not be presented as a relation that is generally applicable to diffusive motions of molecules in all phase-separated systems. There are cases in which this relation can need to unphysical outcomes as correctly pointed out by the reviewer.

      Nonetheless, based on our theoretical/computational modeling, it is also clear, empirically, that Eq.1 holds approximately for the NR2B/PSD system we studied, and as such it is a useful approximate relation in our analysis. We have therefore provided a plausible physical perspective for Eq.1’s applicability as an approximate relation based upon a schematic consideration of diffusion on an underlying rugged (free) energy landscape (Zhang and Chan, 2012) of a phase-separated system (See Figure 3G in the revised manuscript), while leaving further studies of such energy landscape models to future investigations.

      This additional perspective is now included in the following added passage under a new subheading in the revised manuscript:

      "Physical picture and a two-state, two-phase diffusion model for equilibrium and dynamic properties of PSD condensates"

      (3) Despite the above two major concerns described in (1) and (2), the enrichment due to the presence of a "confined state", is reasonable. The equilibrium between "confined" and "mobile" states is determined by its interaction with the other proteins and their ratio at equilibrium corresponds to the equilibrium constant. Therefore EF=1/Pm is reasonable and comes solely from thermodynamics. In fact, the equilibrium partition between the dilute and dense phases should solely be a thermodynamic property, and therefore one may expect that it should not have anything to do with diffusivity. Can the authors please comment on this alternative interpretation?

      Thanks for this thought-provoking comment. We agree with the reviewer that the relative molecular densities in the condensed versus dilute phases are governed by thermodynamics unless there is energy input into the system. However, in our formulation, the mobile ratio should not be the only parameters for determining the enrichment fold in a phase separated system. In fact, the approximate relation (Eq.1) is EF ≈ Dd/PmDm, and thus EF ≈ 1/Pm only when Dd ≈ Dm . But the speed of mobile-state diffusion in the condensed phase is found to be appreciably smaller than that of diffusion in the dilute phase (Dd > Dm). In general, a hallmark of a phase separation system is to enrich involved molecules in the condensed phase, regardless whether the molecule is a driver (or scaffold) or a client of the system. Such enrichment is expected to be resulted from the net free energy gain due to increased molecular interactions of the condensed phase (as envisioned in Response Figure 9). For example, in the phase separation systems containing PrLD-SAMME (Figure 4 of the manuscript), Pm is close to 1, but the enrichment of PrLD-SAMME in the condensed phase is much greater than 1 (estimated to be ~77, based on the fluorescence intensity of the protein in the dilute and condensed phase; Figure 5—figure supplement 1). As far as Eq.1 is concerned, this is mathematically correct because the diffusion coefficient of PrLD-SAMME in the condensed phase (D ~0.2 μm2/s) is much smaller than the diffusion coefficient of a monomeric molecule with a similar molecular mass in dilute solution (D~ 100 μm2/s, measured by FRAP-based assay; the mobility of the molecules in the dilute solution in 3D is too fast to be tracked). Physically, it’s most likely that the slower molecular motion in the condensed phase is caused by favorable intermolecular interactions and the same favorable interactions underpinning the dynamic effects lead also to a larger equilibrium Boltzmann population.

    1. Author Response

      Reviewer #1 (Public Review):

      Nandan et al. attempt to demonstrate how a phenomenology in the molecular signaling network inside a cell could translate to changes in the behavior of the cell and its ability to respond/adapt to changes in the environment over time and space. While this investigation is performed in the context of mammalian cells, the result holds significance for eukaryotic cells at large and demonstrates a mechanism by which cells may use transient memory states to respond robustly to complex environmental cues. To study such mechanisms, it is important to show how the cell may encode such transient memory, how this memory is generated from environmental cues, how it translates to cellular motion, and how it enables cells to have persistent directional motion in the case of transient disruptions in the signal while responding to significant and long-lasting disruptions. The authors attempt to answer all of these questions.

      Strengths:<br /> The manuscript attempts to combine mathematical theory, mechano-chemical models, numerical simulations, and experimental evidence. Thus, the investigation spans diverse methods and spatio-temporal scales (from receptors to continuum mechanical models to whole-cell motion) to answer a unified question. The mathematical theory of dynamic states and bifurcation theory provides the basis for the generation of "ghost" states that can encode transient memory; the mechano-chemical models show how such dynamical states can be realized in the EGFR signaling network; the numerical simulations show both how cells can respond to environmental cues by generating polarised states, and by navigating complex environmental cues, and experiments provide evidence that this may be the case for epithelial cells in the presence of growth factors. The manuscript is well-structured with the main conclusions clearly identified and separated from each other in the different sections. The theoretical investigation is thorough and the main text provides an intuition as to what the authors are trying to convey, while the Methods reveal the calculations performed and the approximations made. The modeling and numerical simulations are detailed and provide a baseline expectation for the system in different parameter regimes. The experiments and the analysis extensively characterize the system. I commend the authors for having delved into so many methods to answer this problem, and the authors demonstrate significant knowledge of the different methods with many novel contributions.

      Weaknesses:<br /> The key weakness of the results is in establishing clear distinctions between what would be expected (naively and based on results from other groups) from alternate explanations, and what is realized in the experimental results that support the hypothesis put forward by the authors. For example, the authors quote a relatively long time scale of persistence of polarisation, but it is unclear if this is longer than is expected from slow dephosphorylation to provide evidence for the existence of the "ghost" state from the saddle-node bifurcation. Further, key experimental results regarding the persistence of motion following gradient washout seem to differ from the authors' own predictions from simulations.<br /> There are several other models that attempt to describe eukaryotic chemotactic motion that persists despite brief disruptions and is able to adapt to changes in the environment over longer timescales. In my opinion, the main strength of the paper does not lie in providing another such model, but in providing a mechanistic understanding that bridges several scales. However, this places the burden on the authors to justify the link between the different scales.<br /> This is an ambitious manuscript and the authors are clearly very bold for attempting such a comprehensive treatment of such a complex system. The authors provide an excellent framework to understand mammalian cellular chemotaxis on multiple scales and attempt to justify the framework using several experiments and extensive analysis. However, they require further analysis and characterization to demonstrate that their experimental results provide the necessary justification for their conclusions as opposed to alternate possibilities.

      We thank the referee for his/her in-depth suggestions and valuable comments how to improve the manuscript, that we implemented in details in the amended version. We have especially focused on providing the necessary justification for working memory emerging from a “ghost” signaling state as opposed to slow dephosphorylation mechanism. For this, we have fitted the single-cell EGFRp temporal profiles after gradient wash-out with and without Lapatnib inhibition, with an inverse sigmoid function and quantified the respective half-life and the Hill coefficient. The analysis included in the new Figure 2 – figure supplement 2 shows that under Lapatinib treatment which inhibits the kinase activity of the receptor and thereby the dynamics of the system is guided by the dephosphorylating activity of the phosphatases, the system relaxes to the basal state in an almost exponential process (half-life ~10min., Hill coefficient ~1.3). In contrast, under normal conditions EGFR phosphorylation relaxes to the basal state in ~30min, corroborating that the system remains trapped in the “ghost” state. Moreover, the transition from the memory to the basal state is rapid, as reflected in an estimated Hill coefficient ~ 3. Additionally, we also discuss how the identified slow-time scale that emerges from the “ghost” state serves as a possible mechanistic link between the rapid phosphorylation/de-phosphorylation events and the ~40min of memory in cell shape polarization/directional cell migration after growth factor removal.

      Moroever, we include additional quantification of memory in single-cell directional motility in the cases with and without EGFR inhibitor (Figure 3 – figure supplement 3), and relate these results to previously proposed mechanisms on memory in directional migration from cytoskeletal asymmetries, but also highlight the importance of memory in polarized receptor signaling as a necessary means to couple cellular processes that occur on different time-scales. We have further expanded the manuscript by providing theoretical predictions how the organization at criticality uniquely enables resolving simultaneous signals. We address the referee’s comments as outlined below:

      Reviewer #2 (Public Review):

      Nandan, Das et al. set out to study the mechanism by which single cells are able to follow extracellular signals in variable environments generate persistent directional migration in the presence of changing chemoattractant fields. Importantly, cells are able to (1) maintain the orientation acquired during the initial signal despite disruptions or noise while still (2) adapting migrational direction in response to newly-encountered signals. Previous models have accounted for either of these properties, but not both simultaneously. To reconcile these observations, this work proposes an underlying mechanism in which cells utilize a form of working memory.

      The authors present a dynamical systems framework in which the presence of dynamical 'ghosts' in an underlying signaling network allow the cell to retain a memory of previously encountered signals. These are generated as follows: a pitchfork bifurcation confers a symmetry-breaking transition from a non-polarised to polarised signaling state/ direction-oriented cell shape. After a subsequent saddle-node bifurcation, a 'ghost' of the stable attractor emerges. This 'ghost' state is metastable, however, which is what allows cells to integrate new signals as well as to adapt their direction of migration.

      The authors demonstrate these dynamics in the Epidermal Growth Factor Receptor (EGFR) signaling network. This pathway is central in many embryonic and adult processes conserved in most animal groups, making it an ideal choice to characterise a phenomenon observed in such a diverse range of cells. The authors couple a mechanical model of the cell with the biochemical signaling model for EGFR, which nicely allows them to thoroughly simulate cellular deformations that they predict will occur during polarization and motility.

      Key features of the model are well-supported by empirical data from experiments: (1) quantitative live-cell imaging of polarised EGFR signaling shows the existence of a distinct polarised 'ghost' state after removal of extracellular signals and (2) motility experiments confirm the manifestation of this memory in allowing for persistent cell migration upon loss of a signal. In an extension of the latter experiment, the authors also show that cells displaying this working memory are still able to respond to changes in the chemoattractant field as necessary.

      The experiments using Lapatinib to disrupt the EGFR dynamics are less convincing. The authors show that subjecting cells to this inhibitor results in the absence of memory and removes the ability of cells to maintain their orientation after the gradient was disrupted. Clarification of which aspect(s) of the EGFR network within the context of the model are precisely disrupted by Lapatinib would be helpful in strengthening the authors' claims here that it is the mechanism of working memory and not other features of the EGFR network, that is responsible for the results shown.

      We thank the referee for the detailed comments and suggestions that helped us to improve the manuscript. In the amended version of the manuscript, we describe that Lapatinib hinders EGFR kinase activity, thus in the model, this will mainly affect the autocatalytic rate constant. We have performed numerical simulations where the autocatalytic rate constant is decreased after gradient removal, and show that the EGFRp temporal profile shows a slow decay after gradient removal, whereas the state-space trajectory directly transits from the polarized to the basal state without intermidate state-space trapping, thereby qualitatively resembling the experimental observations under Lapatinib treatment (compare Figure 2 – figure supplement 2C, D with Figure 2G in the amended version of the manuscript).

      Reviewer #3 (Public Review):

      Cell navigation in chemoattractant fields is important to many physiological processes, including in development and immunity. However, the mechanisms by which cells break symmetry to navigate up concentration gradients, while also adapting to new gradient directions, remain unclear. In this study, the authors propose a new theoretical model for this process: cells are poised near a subcritical pitchfork bifurcation, which allows them to simultaneously maintain the memory of a polarized state over intermediate timescales and respond to new cues. They show analytically that a model of EGFR phosphorylation dynamics has a subcritical pitchfork bifurcation, and use simulations of in silico cells to demonstrate both memory and adaptability in this system. They further measure EGFR phosphorylation profiles, as well as migration tracks under external gradients, in real cells.

      This work contributes an interesting new theoretical framework, bolstered by substantial analysis and simulations, as well as valuable measurements of cell behavior and polarization. Both the modeling and the measurements are careful and thorough, and each represents a substantial contribution to decoding the complex problem of cell navigation. The measurements support and quantify the phenomenon of directional memory. The main weakness is that it is not clear that they also support the mechanism proposed by the model.

      Theoretical framework

      One of the main strengths of this work is the thorough theoretical analysis of a model of symmetry breaking in EGFR phosphorylation. The authors perform linear stability analysis and a weakly nonlinear amplitude equation analysis to characterize the transition. Additionally, they convincingly demonstrate in simulations that this model can generate robust polarization, with memory over intermediate timescales and responsiveness to new gradient directions. However, the relationship between the full dynamical system and the bifurcation diagrams shown in Figure 1A and Figure 1-Figure Supplement 1B is not clear. In particular, there is an implicit reduction from an infinite dimensional system (continuous in space) to an ODE system.<br /> From Methods 5.15, it appears that this was accomplished by approximating the continuous cell perimeter as a diffusively-coupled two-component system, representing the left and right halves of the cell (Methods 5.15 Equation 18 to Equation 19). However, this is not stated explicitly in the methods, and not at all in the main text, making the argument difficult to follow. Additionally, the main text and methods describe the emergence of an unstable odd spatial eigenmode as the key requirement for the pitchfork bifurcation. It is not clear why it is sufficient to show this emergence in the two-component system.

      We thank the referee for the detailed and insightful comments which we implemented in details in the amended version of the manuscript. Indeed, as the referee commented, we have assumed a simplified one-dimensional geometry composed of two compartments (front and back), resembling a projection of the membrane along the main diagonal of the cell. The standard approach of modeling the diffusion along the membrane in this case is simple exchange of the diffusing components. The one-dimensional projection, as demonstrated in the analysis, preserves all of the main features of the PDE model. The numerical bifurcation analysis was only performed for comparative purposes. In the amended version of the manuscript we thus extend the description of this simplification, as well as the purpose of its implementation. Additionally, one of the reasons for developing the theoretical network for us was to provide a method how subcritical PB can be identified in general in PDE models.

      The schematic of the bifurcation in Figure 1A / now in Figure 1 – figure supplement 1A, as well as the numerical bifurcation analysis of the EGFR model in Figure 1-Supplement 1C represent a subcritical pitchfork bifurcation, but the alignment of IHSS branches is slightly different in the EGFR model. This however has no influence on the full dynamics of the system, or the proposed hypothesis. Moreover, in order to explain in details the dynamical transitions - how the unfolding of the PB results in robust polarization and how the organization at criticality enables temporal memory in polarization to be maintained, we included a revised schematic in Figure1 – figure supplement 1A that shows the signal induced transitions that were previously depicted in a compact way in Figure1A, and included respective description in Methods, Section 5.15. The corresponding transitions for the one-dimensional projection EGFR model is also included in the detailed response (Figure 2) for comparison.

      Relationship between the measurements and model

      The second main strength of this work is the contribution of controlled measurements of cell motility, polarization, and phosphorylated EGFR profiles. The measurements of cell migration presented here support the claim that the cells have a memory of past gradients. Additionally, the authors contribute very nice quantifications of the memory timescale. The Lapatinib experiments also support the claim that this memory is related to EGFR activity. However, there are a number of ways in which the real cells appear not to behave like the in silico cells. Polarization in phosphorylated EGFR is present only some of the time in the data, and if present, appears to be weak and/or variable, in magnitude and direction (phosphorylated EGFR profiles, figure 2C, Figure 2-Figure supplement 1D, E). Even for the subset of cells that display polarized EGFR phosphorylation profiles, the average profile is shown after aligning to the peak for each cell (Figure 2-Figure Supplement 1C), so it is not clear that they polarize in the direction of the gradient.

      We thank the referee for these comments which we used as a basis to improve the presentation of the results in the amended version of the manuscript. In order to demonstrate that cells polarize in the direction of the maximal EGF concentration, we have used the EGF647 intensity to quantify the growth factor distribution around each cell and calculated the angle between the maximum of the EGF647 distribution and projection of EGFRp spatial distribution (summarized in Figure 2 – figure supplement 1F and Methods). In brief, for quantification of EGF647 distribution outside each cell, the cell masks were extended by 23 pixels, and the outer rim of 15 pixels was used for the quantification. A radial histogram of the obtained angles confirms that the polarization of EGFRp is in the direction of maximal EGF647, with the variability arising from the positioning of the cells within the gradient chamber. That cells polarize in direction of the gradient can be indirectly inferred also from the migration data (Fig. 3C), where we have estimated the projection of the relative displacement angles with respect to the gradient direction. The cos 𝜃 values during and for ~50min after gradient removal are maintained around 1 (cells migrate in direction of the gradient), before re-setting to 0, which is characteristic for the no-stimulus case.

      The length of the memory in EGFRp polarization is indeed variable in single cells, being on average ~40-50min. The length of the memory is directly related to the total EGFR concentration on the plasma membrane – the closer EGFRt is to the value for which the SNPB is exhibited, the longer the duration of the memory is, and in theory

      𝑀𝑒𝑚𝑜𝑟𝑦 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 ∝ 𝐸𝐺𝐹𝑅𝑡1/2. From the experimental measurments we have indeed observed a correlation between these two quantities, which we include here for the referee’s perusal (Figure 1). However, direct fitting to the experimental data with the given dependency could not be performed because of the following reasons: In general, the fitting function is 𝑓(𝐸𝐺𝐹𝑅𝑇) = 𝑐 ∗ (𝑐𝐸𝐺𝐹𝑅𝑇,𝑆𝑁−𝐸𝐺𝐹𝑅𝑇)n, where c= const. and 𝑐𝐸𝐺𝐹𝑅𝑇,𝑆𝑁 is the total EGFR concentration at the plasma membrane that marks the position of the SNPB. This value however cannot be identified with certainty from the experiments. Thus, we have chosen a fixed value based on the spread of the data and in this case, the fitting resulted to n = 0.49, which approximates well the theoretical value. However, since one of the parameters must be arbitrarily chose, we refrain from presenting the fit.

      *Figure 1: Correlation between single-cell transient memory duration and plasma membrane abundance of 𝐸𝐺𝐹𝑅𝑚𝐶𝑖𝑡𝑟𝑖𝑛𝑒. *

      The real cells also appear to track the gradient far less reliably than the in silico cells (e.g. Figure 4B vs. 4C). Thus the measurements demonstrate and quantify the phenomenon of directional memory, but it is not clear that they support the mechanism proposed by the model, i.e. a symmetry-breaking transition in phosphorylated EGFR.

      We would like to emphasize here that the symmetry-breaking transition via a subcritical pitchfork bifurcation gives rise to robust polarization in the direction of the growth factor signal, whereas critical organization at the SNPB – temporal memory of the polarized state, as well as capability for integration of signals that change both over time and space. The analytical as well as the numerical analysis of the experimentally identified EGFR network verifies that this network exhibits a subcritical PB. In the amended version of the manuscript, we have also included quantification of the directionality of polarization (Figure 2 – figure supplement 1F).

      We would like to note however, that the difference between the simulations and the experiments in Figure 4 lies in the fact that the directional migration in the physical model of the cell, due to the complexity of connecting the signaling with the physical model, is realized as a ballistic movement, whereas experimentally we have identified that cells perform persistent biased random walk (Figure 3D). In the amended version of the manuscript we have discussed these differences in relation to Fig.4.

      Moreover, in the experiments, the EGF647 gradient is established from the top of the microfluidic chamber, and therefore there will be variability due to the position of cells within the chamber, the disruption of the gradient due to the presence of neighboring cells etc. The single cell trajectories (several examples shown in Figure 4 – figure supplement 1F) and the quantification of the relative displacement angles (Figure 4D,E) however clearly depict that cells migrate in the gradient direction and rapidly adapt to the changes in the external cues.

      Additionally, in the authors' model, the features of memory and adaptability in cell navigation depend on the system being poised near a critical point. Thus, in silico, the sensing system 'breaks' when the system parameters are moved away from this point. In particular, cells with increased receptor concentration on their surface cannot adapt to new gradient directions (Section 1, final paragraph; Figure 1-Figure Supplement 1E-G). Based on this, the authors' theoretical framework makes a nonintuitive prediction: overexpression of the surface receptor EGFR in real cells should render them insensitive to changes in the concentration gradient. The fact that the model suggests a surprising, testable prediction is a strength of the framework. A weakness is that the consistency of this prediction with empirical data is not discussed (though the authors note similarities between this regime and unrealistic features of previous models).

      The organization at criticality is indeed dependent on the total concentration of receptors at the plasma membrane. The trafficking of the epidermal growth factor receptors has been previously characterized in details and demonstrated that the ligandless receptors continuously recycle to the plasma membrane, whereas the ligandbound receptors are unidirectionally removed and are trafficked to the lysosome where they await degradation [5]. Thus, how quickly the system will move away from criticality depends directly on the dose and the duration of the EGF stimulus, as this is directly proportional to the fraction of liganded receptors; whereas re-setting of the system at criticality will be afterwards depended on the time scale for biosynthesis of new receptors [17].<br /> Overexpression of EGFR receptors will cause the system to display either permanent polarization (organization in the stable IHSS state) or uniform activation (high HSS branch). We have tested numerically the features of the system when it displays permanent memory (Figure 4 – figure supplement 1C,D) and demonstrated that in this case, cells are not able to resolve signals from opposite directions and therefore migration will be halted. Additionally we also now tested numerically the capability of the cells for resolving simultaneous signals with different amplitudes from opposite direction, and demonstrate that permanent memory as resulting from receptor organization hinders the cells in this comparison task, in contrast to organization at criticality (Figure 4 – figure supplement 2). In the amended version of the manuscript we included a discussion of these points raised by the referee and hope that this allows for more clear presentation of our findings and their implications.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to extend modeling of bispecific engager pharmacology through explicit modelling of the search of T cells for tumour cells, the formation of an immunological synapse and the dissociation of the immunological synapse to enable serial killing. These features have not been included in prior models and their incorporation may improve the predictive value of the model.

      Thank you for the positive feedback.

      The model provides a number of predictions that are of potential interest- that loss of CD19, the target antigen, to 1/20th of its initial expression will lead to escape and that the bone marrow is a site where the tumour cells may have the best opportunity to develop loss variants due to the limited pressure from T cells.

      Thank you for the positive feedback.

      A limitation of the model is that adhesion is only treated as a 2D implementation of the blinatumomab mediated bridge between T cell and B cells- there is no distinct parameter related to the distinct adhesion systems that are critical for immunological synapse formation. For example, CD58 loss from tumours is correlated with escape, but it is not related to the target, CD19. While they begin to consider the immunological synapse, they don't incorporate adhesion as distinct from the engager, which is almost certainly important.

      We agree that adhesion molecules play critical roles in cell-cell interaction. In our model, we assumed these adhesion molecules are constant (or not showing difference across cell populations). This assumption made us to focus on the BiTE-mediated interactions.

      Revision: To clarify this point, we added a couple of sentences in the manuscript.

      “Adhesion molecules such as CD2-CD58, integrins and selectins, are critical for cell-cell interaction. The model did not consider specific roles played by these adhesion molecules, which were assumed constant across cell populations. The model performed well under this simplifying assumption”.

      In addition, we acknowledged the fact that “synapse formation is a set of precisely orchestrated molecular and cellular interactions. Our model merely investigated the components relevant to BiTE pharmacologic action and can only serve as a simplified representation of this process”.

      While the random search is a good first approximation, T cell behaviour is actually guided by stroma and extracellular matrix, which are non-isotropic. In a lymphoid tissue the stroma is optimised for a search that can be approximated as brownian, or more accurately, a correlated random walk, but in other tissues, particularly tumours, the Brownian search is not a good approximation and other models have been applied. It would be interesting to look at observations from bone marrow or other sites to determine the best approximating for the search related to BiTE targets.

      We agree that the tissue stromal factors greatly influence the patterns of T cell searching strategy. Our current model considered Brownian motion as a good first approximation for two reasons: 1) we define tissues as homogeneous compartments to attain unbiased evaluations of factors that influence BiTE-mediated cell-cell interaction, such as T cell infiltration, T: B ratio, and target expression. The stromal factors were not considered in the model, as they require spatially resolved tissue compartments to represent the gradients of stromal factors; 2) our model was primarily calibrated against in vitro data obtained from a “well-mixed” system that does not recapitulate specific considerations of tissue stromal factors. We did not obtain tissue-specific data to support the prediction of T cell movement. This is under current investigation in our lab. Therefore, we are cautious about assuming different patterns of T cell movement in the model when translating into in vivo settings. We acknowledged the limitation of our model for not considering the more physiologically relevant T-cell searching strategies.

      Revision: In the Discussion, we added a limitation of our model: “We assumed Brownian motion in the model as a good first approximation of T cell movement. However, T cells often take other more physiologically relevant searching strategies closely associated with many stromal factors. Because of these stromal factors, the cell-cell encounter probabilities would differ across anatomical sites.”

      Reviewer #3 (Public Review):

      Liu et al. combined mechanistic modeling with in vitro experiments and data from a clinical trial to develop an in silico model to describe response of T cells against tumor cells when bi-specific T cell engager (BiTE) antigens, a standard immunotherapeutic drug, are introduced into the system. The model predicted responses of T cell and target cell populations in vitro and in vivo in the presence of BiTEs where the model linked molecular level interactions between BiTE molecules, CD3 receptors, and CD19 receptors to the population kinetics of the tumor and the T- cells. Furthermore, the model predicted tumor killing kinetics in patients and offered suggestions for optimal dosing strategies in patients undergoing BiTE immunotherapy. The conclusions drawn from this combined approach are interesting and are supported by experiments and modeling reasonably well. However, the conclusions can be tightened further by making some moderate to minor changes in their approach. In addition, there are several limitations in the model which deserves some discussion.

      Strengths

      A major strength of this work is the ability of the model to integrate processes from the molecular scales to the populations of T cells, target cells, and the BiTE antibodies across different organs. A model of this scope has to contain many approximations and thus the model should be validated with experiments. The authors did an excellent job in comparing the basic and the in vitro aspects of their approach with in vitro data, where they compared the numbers of engaged target cells with T cells as the numbers of the BiTE molecules, the ratio of effector and target cells, and the expressions of the CD3 and CD19 receptors were varied. The agreement with the model with the data were excellent in most cases which led to several mechanistic conclusions. In particular, the study found that target cells with lower CD19 expressions escape the T cell killing.

      The in vivo extension of the model showed reasonable agreements with the kinetics of B cell populations in patients where the data were obtained from a published clinical trial. The model explained differences in B cell population kinetics between responders and non-responders and found that the differences were driven by the differences in the T cell numbers between the groups. The ability of the model to describe the in vivo kinetics is promising. In addition, the model leads to some interesting conclusions, e.g., the model shows that the bone marrow harbors tumor growth during the BiTE treatment. The authors then used the model to propose an alternate dosage scheme for BiTEs that needed a smaller dose of the drug.

      Thank you for the positive comments.

      Weaknesses

      There are several weaknesses in the development of the model. Multiscale models of this nature contain parameters that need to be estimated by fitting the model with data. Some these parameters are associated with model approximations or not measured in experiments. Thus, a common practice is to estimate parameters with some 'training data' and then test model predictions using 'test data'. Though Supplementary file 1 provides values for some of the parameters that appeared to be estimated, it was not clear which dataset were used for training and which for test. The confidence intervals of the estimated parameters and the sensitivity of the proposed in vivo dosage schemes to parameter variations were unclear.

      We agree with the reviewer on the model validation.

      Revision: To ensure reproducibility, we summarized model assumptions and parameter values/sources in the supplementary file 1. To mimic tumor heterogeneity and evolution process, we applied stochastic agent-based models, which are challenging to be globally optimized against the data. The majority of key parameters was obtained or derived from the literature. Details have been provided in the response to Reviewer 3 - Question 1. In our modeling process, we manually optimized sensitive coefficient (β) for base model using pilot in-vitro data and sensitive coefficient (β) for in-vivo model by re-calibrating against the in-vitro data at a low BiTE concentration. BiTE concentrations in patients (mostly < 2 ng/ml) is only relevant to the low bound of the concentration range we investigated in vitro (0.65-2000 ng/ml). We have added some clarification/limitation of this approach in the text (details are provided in the following question). We understand the concerns, but the agent-based modeling nature prevent us to do global optimization.

      The model appears to show few unreasonable behaviors and does not agree with experiments in several cases which could point to missing mechanisms in the model. Here are some examples. The model shows a surprising decrease in the T cell-target cell synapse formation when the affinity of the BiTEs to CD3 was increased; the opposite should have been more intuitive. The authors suggest degradation of CD3 could be a reason for this behavior. However, this probably could be easily tested by removing CD3 degradation in the model. Another example is the increase in the % of engaged effector cells in the model with increasing CD3 expressions does not agree well with experiments (Fig. 3d), however, a similar fold increase in the % of engaged effector cells in the model agrees better with experiments for increasing CD19 expressions (Fig. 3e). It is unclear how this can be explained given CD3 and CD19 appears to be present in similar copy numbers per cell (~104 molecules/cell), and both receptors bind the BiTE with high affinities (e.g., koff < 10-4 s-1).

      Thank you for pointing this out. The bidirectional effect of CD3 affinity on IS formation is counterintuitive. In a hypothetical situation when there is no CD3 downregulation, the bidirectional effect disappears (as shown below), consistent with our view that CD3 downregulation accounts for the counterintuitive behavior. We have included the simulation to support our point. From a conceptual standpoint, the inclusion of CD3 degradation means the way to maximize synapse formation is for the BiTE to first bind tumor antigen, after which the tumor-BiTE complex “recruits” a T cell through the CD3 arm.

      We agree that the model did not adequately capture the effect of CD3 expression at the highest BiTE concentration 100 ng/ml, while the effects at other BiTE concentrations were well captured (as shown below, left). The model predicted a much moderate effect of CD3 expression on IS formation at the highest concentration. This is partly because the model assumed rapid CD3 downregulation upon antibody engagement. We did a similar simulation as above, with moderate CD3 downregulation (as shown below, right). This increases the effect of CD3 expression at the highest BiTE concentration, consistent with experiments. Interestingly, a rapid CD3 downregulation rate, as we concluded, is required to capture data profiles at all other conditions. Considering BiTE concentration at 100 ng/ml is much higher than therapeutically relevant level in circulation (< 2 ng/ml), we did not investigate the mechanism underlying this inconsistent model prediction but we acknowledged the fact that the model under-predicted IS formation in Figure 3d. Notably, this discrepancy may rarely appear in our clinical predictions as the CD3 expression is low level and blood BiTE concentration is very low (< 2 ng/ml).

      Revision: we have made text adjustment to increase clarity on these points. In addition, we added: “The base model underpredicted the effect of CD3 expression on IS formation at 100 ng/ml BiTE concentration, which is partially because of the rapid CD3 downregulation upon BiTE engagement and assay variation across experimental conditions.”

      The model does not include signaling and activation of T cells as they form the immunological synapse (IS) with target cells. The formation IS leads to aggregation of different receptors, adhesion molecules, and kinases which modulate signaling and activation. Thus, it is likely the variations of the copy numbers of CD3, and the CD19-BiTE-CD3 will lead to variations in the cytotoxic responses and presumably to CD3 degradation as well. Perhaps some of these missing processes are responsible for the disagreements between the model and the data shown in Fig. 3. In addition, the in vivo model does not contain any development of the T cells as they are stimulated by the BiTEs. The differences in development of T cells, such as generation of dysfunctional/exhausted T cells could lead to the differences in responses to BiTEs in patients. In particular, the in vivo model does not agree with the kinetics of B cells after day 29 in non-responders (Fig. 6d); could the kinetics of T cell development play a role in this?

      We agree that intracellular signaling is critical to T cell activation and cytotoxic effects. IS formation, T cell activation, and cytotoxicity are a cascade of events with highly coordinated molecular and cellular interactions. Compared to the events of T cell activation and cytotoxicity, IS formation occurs at a relatively earlier time. As shown in our study, IS formation can occur at 2-5 min, while the other events often need hours to be observed. We found that IS formation is primarily driven by two intercellular processes: cell-cell encounter and cell-cell adhesion. The intracellular signaling would be initiated in the process of cell-cell adhesion or at the late stage of IS formation. We think these intracellular events are relevant but may not be the reason why our model did not adequately capture the profiles in Figure 3d at the highest BiTE concentrations. Therefore, we did not include intracellular signaling in the models. Another reason was that we simulated our models at an agent level to mimic the process of tumor evolution, which is computationally demanding. Intracellular events for each cell may make it more challenging computationally.

      T cell activation and exhaustion throughout the BiTE treatment is very complicated, time-variant and impacted by multiple factors like T cell status, tumor burden, BiTE concentration, immune checkpoints, and tumor environment. T cell proliferation and death rates are challenging to estimate, as the quantitative relationship with those factors is unknown. Therefore, T cell abundance (expansion) was considered as an independent variable in our model. T cell counts are measured in BiTE clinical trials. We included these data in our model to reveal expanded T cell population. Patients with high T cell expansion are often those with better clinical response. Notably, the T cell decline due to rapid redistribution after administration was excluded in the model. T cell abundance was included in the simulations in Figure 6 but not proof of concept simulations in Figure 7.

      In Figure 6d, kinetics of T cell abundance had been included in the simulations for responders and non-responders in MT103-211 study. Thus, the kinetics of T cell development can’t be used to explain the disagreement between model prediction and observation after day 29 in non-responders. The observed data is actually median values of B-cell kinetics in non-responders (N = 27) with very large inter-subject variation (baseline from 10-10000/μL), which makes it very challenging to be perfectly captured by the model. A lot of non-responders with severe progression dropped out of the treatment at the end of cycle 1, which resulted in a “more potent” efficacy in the 2nd cycle. This might be main reason for the disagreement.

      Variation in cytotoxic response was not included in our models. Tumor cells were assumed to be eradicated after the engagement with effecter cells, no killing rate or killing probability was implemented. This assumption reduced the model complexity and aligned well with our in-vitro and clinical data. Cytotoxic response in vivo is impacted by multiple factors like copy number of CD3, cytokine/chemokine release, tumor microenvironment and T cell activation/exhaustion. For example, the cytotoxic response and killing rate mediated by 1:1 synapse (ET) and other variants (ETE, TET, ETEE, etc.) are supposed to be different as well. Our model did not differentiate the killing rate of these synapse variants, but the model has quantified these synapse variants, providing a framework for us to address these questions in the future. We agree that differentiate the cytotoxic responses under different scenarios cell may improve model prediction and more explorations need to be done in the future.

      Revision: We added a discussion of the limitations which we believe is informative to future studies.

      “Our models did not include intracellular signaling processes, which are critical for T activation and cytotoxicity. However, our data suggests that encounter and adhesion are more relevant to initial IS formation. To make more clinically relevant predictions, the models should consider these intracellular signaling events that drive T cell activation and cytotoxic effects. Of note, we did consider the T cell expansion dynamics in organs as independent variable during treatment for the simulations in Figure 6. T cell expansion in our model is case-specific and time-varying.”

      References:

      Chen W, Yang F, Wang C, Narula J, Pascua E, Ni I, Ding S, Deng X, Chu ML, Pham A, Jiang X, Lindquist KC, Doonan PJ, Blarcom TV, Yeung YA, Chaparro-Riggers J. 2021. One size does not fit all: navigating the multi-dimensional space to optimize T-cell engaging protein therapeutics. MAbs 13:1871171. DOI: 10.1080/19420862.2020.1871171, PMID: 33557687

      Dang K, Castello G, Clarke SC, Li Y, AartiBalasubramani A, Boudreau A, Davison L, Harris KE, Pham D, Sankaran P, Ugamraj HS, Deng R, Kwek S, Starzinski A, Iyer S, Schooten WV, Schellenberger U, Sun W, Trinklein ND, Buelow R, Buelow B, Fong L, Dalvi P. 2021. Attenuating CD3 affinity in a PSMAxCD3 bispecific antibody enables killing of prostate tumor cells with reduced cytokine release. Journal for ImmunoTherapy of Cancer 9:e002488. DOI: 10.1136/jitc-2021-002488, PMID: 34088740

      Gong C, Anders RA, Zhu Q, Taube JM, Green B, Cheng W, Bartelink IH, Vicini P, Wang BPopel AS. 2019. Quantitative Characterization of CD8+ T Cell Clustering and Spatial Heterogeneity in Solid Tumors. Frontiers in Oncology 8:649. DOI: 10.3389/fonc.2018.00649, PMID: 30666298

      Mejstríková E, Hrusak O, Borowitz MJ, Whitlock JA, Brethon B, Trippett TM, Zugmaier G, Gore L, Stackelberg AV, Locatelli F. 2017. CD19-negative relapse of pediatric B-cell precursor acute lymphoblastic leukemia following blinatumomab treatment. Blood Cancer Journal 7: 659. DOI: 10.1038/s41408-017-0023-x, PMID: 29259173

      Samur MK, Fulciniti M, Samur AA, Bazarbachi AH, Tai YT, Prabhala R, Alonso A, Sperling AS, Campbell T, Petrocca F, Hege K, Kaiser S, Loiseau HA, Anderson KC, Munshi NC. 2021. Biallelic loss of BCMA as a resistance mechanism to CAR T cell therapy in a patient with multiple myeloma. Nature Communications 12:868. DOI: 10.1038/s41467-021-21177-5, PMID: 33558511

      Xu X, Sun Q, Liang X, Chen Z, Zhang X, Zhou X, Li M, Tu H, Liu Y, Tu S, Li Y. 2019. Mechanisms of relapse after CD19 CAR T-cell therapy for acute lymphoblastic leukemia and its prevention and treatment strategies. Frontiers in Immunology 10:2664. DOI: 10.3389/fimmu.2019.02664, PMID: 31798590

      Yoneyama T, Kim MS, Piatkov K, Wang H, Zhu AZX. 2022. Leveraging a physiologically-based quantitative translational modeling platform for designing B cell maturation antigen-targeting bispecific T cell engagers for treatment of multiple myeloma. PLOS Computational Biology 18: e1009715. DOI: 10.1371/journal.pcbi.1009715, PMID: 35839267

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors present a new technique for analysing low complexity regions (LCRs) in proteins- extended stretches of amino acids made up from a small number of distinct residue types. They validate their new approach against a single protein, compare this technique to existing methods, and go on to apply this to the proteomes of several model systems. In this work, they aim to show links between specific LCRs and biological function and subcellular location, and then study conservation in LCRs amongst higher species.

      The new method presented is straightforward and clearly described, generating comparable results with existing techniques. The technique can be easily applied to new problems and the authors have made code available.

      This paper is less successful in drawing links between their results and the importance biologically. The introduction does not clearly position this work in the context of previous literature, using relatively specialised technical terms without defining them, and leaving the reader unclear about how the results have advanced the field. In terms of their results, the authors further propose interesting links between LCRs and function. However, their analyses for these most exciting results rely heavily on UMAP visualisation and the use of tests with apparently small effect sizes. This is a weakness throughout the paper and reduces the support for strong conclusions.

      We appreciate the reviewer’s comments on our manuscript. To address comments about the clarity of the introduction and the position of our findings with respect to the rest of the field, we have made several changes to the text. We have reworked the introduction to provide a clearer view of the current state of the LCR field, and our goals for this manuscript. We also have made several changes to the beginnings and ends of several sections in the Results to explicitly state how each section and its findings help advance the goal we describe in the introduction, and the field more generally. We hope that these changes help make the flow of the paper more clear to the reader, and provide a clear connection between our work and the field.

      We address comments about the use of UMAPs and statistical tests in our responses to the specific comments below.

      Additionally, whilst the experimental work is interesting and concerns LCRs, it does not clearly fit into the rest of the body of work focused as it is on a single protein and the importance of its LCRs. It arguably serves as a validation of the method, but if that is the author's intention it needs to be made more clearly as it appears orthogonal to the overall drive of the paper.

      In response to this comment, we have made more explicit the rationale for choosing this protein at the beginning of this section, and clarify the role that these experiments play in the overall flow of the paper.

      Our intention with the experiments in Figure 2 was to highlight the utility of our approach in understanding how LCR type and copy number influence protein function. Understanding how LCR type and copy number can influence protein function is clearly outlined as a goal of the paper in the Introduction.

      In the text corresponding to Figure 2, we hypothesize how different LCR relationships may inform the function of the proteins that have them, and how each group in Figure 2A/B can be used to test these hypotheses. The global view provided by our method allows proteins to be selected on the basis of their LCR type and copy number for further study.

      To demonstrate the utility of this view, we select a key nucleolar protein with multiple copies of the same LCR type (RPA43, a subunit of RNA Pol I), and learn important features driving its higher-order assembly in vivo and in vitro. We learned that in vivo, a least two copies of RPA43’s K-rich LCRs are required for nucleolar integration, and that these K-rich LCRs are also necessary for in vitro phase separation.

      Despite this protein being a single example, we were able to gain important insights about how K-rich LCR copy number affects protein function, and that both in vitro higher order assembly and in vivo nucleolar integration can be explained by LCR copy number. We believe this opens the door to ask further questions about LCR type and copy number for other proteins using this line of reasoning.

      Overall I think the ideas presented in the work are interesting, the method is sound, but the data does not clearly support the drawing of strong conclusions. The weakness in the conclusions and the poor description of the wider background lead me to question the impact of this work on the broader field.

      For all the points where Reviewer #1 comments on the data and its conclusions, we provide explanations and additional analyses in our responses below showing that the data do indeed support our conclusions. In regards to our description of the wider background, we have reworked our introduction to more clearly link our work to the broader field, such that a more general audience can appreciate the impact of our work.

      Technical weaknesses

      In the testing of the dotplot based method, the manuscript presents a FDR rate based on a comparison between real proteome data and a null proteome. This is a sensible approach, but their choice of a uniform random distribution would be expected to mislead. This is because if the distribution is non-uniform, stretches of the most frequent amino will occur more frequently than in the uniform distribution.

      Thank you for pointing this out. The choice of null proteome was a topic of much discussion between the authors as this work was being performed. While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      More generally I think the results presented suggest that the results dotplot generates are comparable to existing methods, not better and the text would be more accurate if this conclusion was clearer, in the absence of an additional set of data that could be used as a "ground truth".

      We did not intend to make any strong claims about the relative performance of our approach vs. existing methods with regard to the sequence entropy of the called LCRs beyond them being comparable, as this was not the main focus of our paper. To clarify the text such that it reflects this, we have removed ‘or better’ from the text in this section.

      The authors draw links between protein localisation/function and LCR content. This is done through the use of UMAP visualisation and wilcoxon rank sum tests on the amino acid frequency in different localisations. This is convincing in the case of ECM data, but the arguments are substantially less clear for other localisations/functions. The UMAP graphics show generally that the specific functions are sparsely spread. Moreover when considering the sample size (in the context of the whole proteome) the p-value threshold obscures what appear to be relatively small effect sizes.

      We would first like to note that some of the amino acid frequency biases have been documented and experimentally validated by other groups, as we write and reference in the manuscript. Nonetheless, we have considered the reviewer's concerns, and upon rereading the section corresponding to Figure 3, we realize that our wording may have caused confusion in the interpretation there. In addition to clarifying this in the manuscript, we believe the following clarification may help in the interpretations drawn from that section.

      Each point in this analysis (and on the UMAP) is an LCR from a protein, and as such multiple LCRs from the same protein will appear as multiple points. This is particularly relevant for considering the interpretation of the functional/higher order assembly annotations because it is not expected that for a given protein, all of the LCRs will be directly relevant to the function/annotation. Just because proteins of an assembly are enriched for a given type of LCR does not mean that they only have that kind of LCR. In addition to the enriched LCR, they may or may not have other LCRs that play other roles.

      For example, a protein in the Nuclear Speckle may contain both an R/S-rich LCR and a Q-rich LCR. When looking at the Speckle, all of the LCRs of a protein are assigned this annotation, and so such a protein would contribute a point in the R/S region as well as elsewhere on the map. Because such "non-enriched" LCRs do not occur as frequently, and may not be relevant to Speckle function, they are sparsely spread.

      We have now changed the wording in that section of the main text to reflect that the expectation is not all LCRs mapping to a certain region, but enrichment of certain LCR compositions.

      Reviewer #3 (Public Review):

      The authors present a systematic assessment of low complexity sequences (LCRs) apply the dotplot matrix method for sequence comparison to identify low-complexity regions based on per-residue similarity. By taking the resulting self-comparison matrices and leveraging tools from image processing, the authors define LCRs based on similarity or non-similarity to one another. Taking the composition of these LCRs, the authors then compare how distinct regions of LCR sequence space compare across different proteomes.

      The paper is well-written and easy to follow, and the results are consistent with prior work. The figures and data are presented in an extremely accessible way and the conclusions seem logical and sound.

      My big picture concern stems from one that is perhaps challenging to evaluate, but it is not really clear to me exactly what we learn here. The authors do a fine job of cataloging LCRs, offer a number of anecdotal inferences and observations are made - perhaps this is sufficient in terms of novelty and interest, but if anyone takes a proteome and identifies sequences based on some set of features that sit in the tails of the feature distribution, they can similarly construct intriguing but somewhat speculative hypotheses regarding the possible origins or meaning of those features.

      The authors use the lysine-repeats as specific examples where they test a hypothesis, which is good, but the importance of lysine repeats in driving nucleolar localization is well established at this point - i.e. to me at least the bioinformatics analysis that precedes those results is unnecessary to have made the resulting prediction. Similarly, the authors find compositional biases in LCR proteins that are found in certain organelles, but those biases are also already established. These are not strictly criticisms, in that it's good that established patterns are found with this method, but I suppose my concern is that this is a lot of work that perhaps does not really push the needle particularly far.

      As an important caveat to this somewhat muted reception, I recognize that having worked on problems in this area for 10+ years I may also be displaying my own biases, and perhaps things that are "already established" warrant repeating with a new approach and a new light. As such, this particular criticism may well be one that can and should be ignored.

      We thank the reviewer for taking the time to read and give feedback for our manuscript. We respectfully disagree that our work does not push the needle particularly far.

      In the section titled ‘LCR copy number impacts protein function’, our goal is not to highlight the importance of lysines in nucleolar localization, but to provide a specific example of how studying LCR copy number, made possible by our approach, can provide specific biological insights. We first show that K-rich LCRs can mediate in vitro assembly. Moreover, we show that the copy number of K-rich LCRs is important for both higher order assembly in vitro and nucleolar localization in cells, which suggests that by mediating interactions, K-rich LCRs may contribute to the assembly of the nucleolus, and that this is related to nucleolar localization. The ability of our approach to relate previously unrelated roles of K-rich LCRs not only demonstrates the value of a unified view of LCRs but also opens the door to study LCR relationships in any context.

      Furthermore, our goal in identifying established biases in LCR composition for certain assemblies was to validate that the sequence space captures higher order assemblies which are known. In addition to known biases, we use our approach to uncover the roles of LCR biases that have not been explored (e.g. E-rich LCRs in nucleoli, see Figure 4 in revised manuscript), and discover new regions of LCR sequence space which have signatures of higher order assemblies (e.g. Teleost-specific T/H-rich LCRs). Collectively, our results show that a unified view of LCRs relates the disparate functions of LCRs.

      In response to these comments, we have added additional explanations at the end of several sections to clarify the impact of our findings in the scope of the broader field. Furthermore, as we note in our main response, we have added experimental data with new findings to address this concern.

      That overall concern notwithstanding, I had several other questions that sprung to mind.

      Dotplot matrix approach

      The authors do a fantastic job of explaining this, but I'm left wondering, if one used an algorithm like (say) SEG, defined LCRs, and then compared between LCRs based on composition, would we expect the results to be so different? i.e. the authors make a big deal about the dotplot matrix approach enabling comparison of LCR type, but, it's not clear to me that this is just because it combines a two-step operation into a one-step operation. It would be useful I think to perform a similar analysis as is done later on using SEG and ask if the same UMAP structure appears (and discuss if yes/no).

      Thank you for your thoughtful question about the differences between SEG and the dotplot matrix approach. We have tried our best to convey the advantages of the dotplot approach over SEG in the paper, but we did not focus on this for the following reasons:

      1) SEG and dotplot matrices are long-established approaches to assessing LCRs. We did not see it in the scope of our paper to compare between these when our main claim is that the approach as a whole (looking at LCR sequence, relationships, features, and functions) is what gives a broader understanding of LCRs across proteomes. The key benefits of dotplots, such as direct visual interpretation, distinguishing LCR types and copy number within a protein, are conveyed in Figure 1A-C and Figure 1 - figure supplements 1 and 4. In fact, these benefits of dotplots were acknowledged in the early SEG papers, where they recommended using dotplots to gain a prior understanding of protein sequences of interest, when it was not yet computationally feasible to analyze dotplots on the same scale as SEG (Wootton and Federhen, Methods in Enzymology, vol. 266, 1996, Pages 554-571). Thus, our focus is on the ability to utilize image processing tools to "convert" the intuition of dotplots into precise read-out of LCRs and their relationships on a multi-proteome scale. All that being said, we have considered differences between these methods as you can see from our technical considerations in part 2 below.

      2) SEG takes an approach to find LCRs irrespective of the type of LCR, primarily because SEG was originally used to mask LCR-containing regions in proteins to facilitate studies of globular domains. Because of this, the recommended usage of SEG commonly fuses nearby LCRs and designates the entire region as "low complexity". For the original purpose of SEG, this is understandable because it takes a very conservative approach to ensure that the non-low complexity regions (i.e. putative folded domains) are well-annotated. However, for the purpose of distinguishing LCR composition, this is not ideal because it is not stringent in separating LCRs that are close together, but different in composition. Fusion can be seen in the comparison of specific LCR calls of the collagen CO1A1 (Figure 1 - figure supplement 3E), where even the intermediate stringency SEG settings fuse LCR calls that the dotplot approach keeps separate. Finally, we did also try downstream UMAP analysis with LCRs called from SEG, and found that although certain aspects of the dotplot-based LCR UMAP are reflected in the SEG-based LCR UMAP, there is overall worse resolution with default settings, which is likely due to fused LCRs of different compositions. Attempting to improve resolution using more stringent settings comes at the cost of the number of LCRs assessed. We have attached this analysis to our rebuttal for the reviewer, but maintain that this comparison is not really the focus of our manuscript. We do not make strong claims about the dotplot matrices being better at calling LCRs than SEG, or any other method.

      UMAPs generated from LCRs called by SEG

      LCRs from repeat expansions

      I did not see any discussion on the role that repeat expansions can play in defining LCRs. This seems like an important area that should be considered, especially if we expect certain LCRs to appear more frequently due to a combination of slippy codons and minimal impact due to the biochemical properties of the resulting LCR. The authors pursue a (very reasonable) model in which LCRs are functional and important, but it seems the alternative (that LCRs are simply an unavoidable product of large proteomes and emerge through genetic events that are insufficiently deleterious to be selected against). Some discussion on this would be helpful. it also makes me wonder if the authors' null proteome model is the "right" model, although I would also say developing an accurate and reasonable null model that accounts for repeat expansions is beyond what I would consider the scope of this paper.

      While the role of repeat expansions in generating LCRs has been studied and discussed extensively in the LCR field, we decided to focus on the question of which LCRs exist in the proteome, and what may be the function downstream of that. The rationale for this is that while one might not expect a functional LCR to arise from repeat expansion, this argument is less of a concern in the presence of evidence that these LCRs are functional. For example, for many of these LCRs (e.g. a K-rich LCR, R/S-rich LCR, etc as in Figure 3), we know that it is sufficient for the integration of that sequence into the higher order assembly. Moreover, in more recent cases, variation of the length of an LCR was shown to have functional consequences (Basu et al., Cell, 2020), suggesting that LCR emergence through repeat expansions does not imply lack of function. Therefore, while we think the origin of a LCR is an interesting question, whether or not that LCR was gained through repeat expansions does not fall into the scope of this paper.

      In regards to repeat expansions as it pertains to our choice of null model, we reasoned that because the origin of an LCR is not necessarily coupled to its function, it would be more useful to retain LCR sequences even if they may be more likely to occur given a background proteome composition. This way, instead of being tossed based on an assumption, LCRs can be evaluated on their function through other approaches which do not assume that likelihood of occurrence inversely relates to function.

      While we maintain that the uniform background is the most appropriate, the question from this reviewer and the other reviewers made us realize that a thorough explanation was warranted for this choice of null proteome. For a complete explanation for our choice of this uniform null model, please see the newly added appendix section, Appendix 1.

      The authors would also like to point out that the original SEG algorithm (Wootton and Federhen, 1993) also made the intentional choice of using a uniform background model.

      Minor points

      Early on the authors discuss the roles of LCRs in higher-order assemblies. They then make reference to the lysine tracts as having a valence of 2 or 3. It is possibly useful to mention that valence reflects the number of simultaneous partners that a protein can interact with - while it is certainly possible that a single lysine tracts interacts with a single partner simultaneously (meaning the tract contributes a valence of 1) I don't think the authors can know that, so it may be wise to avoid specifying the specific valence.

      Thank you for pointing this out. We agree with the reviewer's interpretation and have removed our initial interpretation from the text and simply state that a copy number of at least two is required for RPA43’s integration into the nucleolus.

      The authors make reference to Q/H LCRs. Recent work from Gutiérrez et al. eLife (2022) has argued that histidine-richness in some glutamine-rich LCRs is above the number expected based on codon bias, and may reflect a mode of pH sensing. This may be worth discussing.

      We appreciate the reviewer pointing out this publication. While this manuscript wasn’t published when we wrote our paper, upon reading it we agree it has some very relevant findings. We have added a reference to this manuscript in our discussion when discussing Q/H-rich LCRs.

      Eric Ross has a number of very nice papers on this topic, but sadly I don't think any of them are cited here. On the question of LCR composition and condensate recruitment, I would recommend Boncella et al. PNAS (2020). On the question of proteome-wide LCR analysis, see Cascarina et al PLoS CompBio (2018) and Cascarina et al PLoS CompBio 2020.

      We appreciate the reviewer for noting this related body of work. We have updated the citations to include work from Eric Ross where relevant.

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines the factors underlying the assembly of MreB, an actin family member involved in mediating longitudinal cell wall synthesis in rod-shaped bacteria. Required for maintaining rod shape and essential for growth in model bacteria, single molecule work indicates that MreB forms treadmilling polymers that guide the synthesis of new peptidoglycan along the longitudinal cell wall. MreB has proven difficult to work with and the field is littered with artifacts. In vitro analysis of MreB assembly dynamics has not fared much better as helpfully detailed in the introduction to this study. In contrast to its distant relative actin, MreB is difficult to purify and requires very specific conditions to polymerize that differ between groups of bacteria. Currently, in vitro analysis of MreB and related proteins has been mostly limited to MreBs from Gram-negative bacteria which have different properties and behaviors from related proteins in Gram-positive organisms.

      Here, Mao and colleagues use a range of techniques to purify MreB from the Gram-positive organism Geobacillus stearothermophilus, identify factors required for its assembly, and analyze the structure of MreB polymers. Notably, they identify two short hydrophobic sequences-located near one another on the 3-D structure-which are required to mediate membrane anchoring.

      With regard to assembly dynamics, the authors find that Geobacillus MreB assembly requires both interactions with membrane lipids and nucleotide binding. Nucleotide hydrolysis is required for interaction with the membrane and interaction with lipids triggers polymerization. These experiments appear to be conducted in a rigorous manner, although the salt concentration of the buffer (500mM KCl) is quite high relative to that used for in vitro analysis of MreBs from other organisms. The authors should elaborate on their decision to use such a high salt buffer, and ideally, provide insight into how it might impact their findings relative to previous work.

      Response 1.1. MreB proteins are notoriously difficult to maintain in a soluble form. Some labs deleted the N-terminal amphipathic or hydrophobic sequences to increase solubility, while other labs used full-length protein but high KCl concentration (300 mM KCl) (Harne et al, 2020; Pande et al., 2022; Popp et al, 2010; Szatmari et al, 2020). Early in the project, we tested many conditions and noticed that high KCl helped keeping a slightly better solubility of full length MreBGs, without the need for deleting a part of the protein. In addition, concentrations of salt > 100 mM would better mimic the conditions met by the protein in vivo. While 50-100 mM KCl is traditionally used in actin polymerization assays, physiological salt concentrations are around 100-150 mM KCl in invertebrates and vertebrates (Schmidt-Nielsen, 1975), around 50-250 in fungal and plant cells (Rodriguez-Navarro, 2000) and 200-300 mM in the budding yeast (Arino et al, 2010). However, cytoplasmic K+ concentration varies greatly (up to 800 mM) depending on the osmolality of the medium in both E. coli (Cayley et al, 1991; Epstein & Schultz, 1965; Rhoads et al, 1976), and B. subtilis, in which the basal intracellular concentration of KCl was estimated to be ~ 350 mM (Eisenstadt, 1972; Whatmore et al, 1990). 500 mM KCl can therefore be considered as physiological as 100 mM KCl for bacterial cells. Since we observed plenty of pairs of protofilaments at 500 mM KCl and this condition helped to avoid aggregation, we kept this high concentration as a standard for most of our experiments. Nonetheless, we had also performed TEM polymerization assays at 100 mM in line with most of MreB and F-actin in vitro literature, and found no difference in the polymerization (or absence of polymerization) conditions. This was indicated in the initial submission (e.g. M&M section L540 and footnote of Table S2) but since two reviewers bring it up as a main point, it is evident we failed at communicating it clearly, for which we apologize. This has been clarified in the revised version of the manuscript. We have also almost systematically added the 100 mM KCl concentration too as per reviewer #2 request and to conciliate our salt conditions with those used for some in vitro analysis of MreBs from other organisms (see also response to reviewer #2 comments 1A and 1B = Responses 2.1A, 2.1B below). We then decided to refer to the 100 mM KCl concentration as our “standard condition” in the revised version of the manuscript, but we compile and compare the results obtained at 500 mM too, as both concentrations are within the physiological range in Bacillus.

      Additionally, this study, like many others on MreB, makes much of MreB's relationship to actin. This leads to confusion and the use of unhelpful comparisons. For example, MreB filaments are not actin-like (line 58) any more than any polymer is "actin-like." As evidenced by the very beautiful images in this manuscript, MreB forms straight protofilaments that assemble into parallel arrays, not the paired-twisted polymers that are characteristic of F-actin. Generally, I would argue that work on MreB has been hindered by rather than benefitted from its relationship to actin (E.g early FP fusion data interpreted as evidence for an MreB endoskeleton supporting cell shape or depletion experiments implicating MreB in chromosome segregation) and thus such comparisons should be avoided unless absolutely necessary.

      Response 1.2. We completely agree with reviewer #1 regarding unhelpful comparisons of actin and MreB, and that work on MreB has been traditionally hindered from its relationship to eukaryotic actin. MreB is nonetheless a structural homolog of actin, with a close structural fold and common properties (polymerization into pairs of protofilaments, ATPase activity…). It still makes sense to refer to a protein with common features, common ancestry and widely studied as long as we don’t enclose our mind into a conceptual framework. This said, actin and MreB diverged very early in evolution, which may account for differences in their biochemical properties and cellular functions. Current data on MreB filaments confirm that they display F-actin-like and F-actin-unlike properties. We thank the reviewer for this insightful comment. We have revised the text to remove any inaccurate or unhelpful comparison to actin (in particular the ‘actin-like filaments’ statement, previously used once)

      Reviewer #2 (Public Review):

      The paper "Polymerization cycle of actin homolog MreB from a Gram-positive bacterium" by Mao et al. provides the second biochemical study of a gram-positive MreB, but importantly, the first study examines how gram-positive MreB filaments bind to membranes. They also show the first crystal structure of a MreB from a Gram-positive bacterium - in two nucleotide-bound forms, finally solving structures that have been missing for too long. They also elucidate what residues in Geobacillus MreB are required for membrane associations. Also, the QCM-D approach to monitoring MreB membrane associations is a direct and elegant assay.

      While the above findings are novel and important, this paper also makes a series of conclusions that run counter to multiple in vitro studies of MreBs from different organisms and other polymers with the actin fold. Overall, they propose that Geobacillus MreB contains biochemical properties that are quite different than not only the other MreBs examined so far but also eukaryotic actin and every actin homolog that has been characterized in vitro. As the conclusions proposed here would place the biochemical properties of Geobacillus MreB as the sole exception to all other actin fold polymers, further supporting experiments are needed to bolster these contrasting conclusions and their overall model.

      Response 2.0. We are grateful to reviewer #2 for stressing out the novelty and importance of our results. Most of our conclusions were in line with previous in vitro studies of MreBs (formation of pairs of straight filaments on a lipid layer, both ATP and GTP binding and hydrolysis, distortion of liposomes…), to the exception of the claimed requirement of NTP hydrolysis for membrane binding prior to polymerization based on the absence of pairs of filaments in free solution or in the presence of AMP-PNP in our experimental conditions (which we agree was not sufficient to make such a bold claim, see below). Thanks to the reviewer’s comments, we have performed many controls and additional experiments that lead us to refine our results and largely conciliate them with the literature. Please see the answer to the global review comments - our conclusions have been revised on the basis of our new data.

      1. (Difference 1) - The predominant concern about the in vitro studies that makes it difficult to evaluate many of their results (much less compare them to other MreB/s and actin homologs) is the use of a highly unconventional polymerization buffer containing 500(!) mM KCL. As has been demonstrated with actin and other polymers, the high KCl concentration used here (500mM) is certain to affect the polymerization equilibria, as increasing salt increases the hydrophobic effect and inhibits salt bridges, and therefore will affect the affinity between monomers and filaments. For example, past work has shown that high salt greatly changes actin polymerization, causing: a decreased critical concentration, increased bundling, and a greatly increased filament stiffness (Kang et al., 2013, 2012). Similarly, with AlfA, increased salt concentrations have been shown to increase the critical concentration, decrease the polymerization kinetics, and inhibit the bundling of AlfA filaments (Polka et al., 2009).

      A more closely related example comes from the previous observation that increasing salt concentrations increasingly slow the polymerization kinetics of B. subtilis MreB (Mayer and Amann, 2009). Lastly, These high salt concentrations might also change the interactions of MreB(Gs) with the membrane by screening charges and/or increasing the hydrophobic effect. Given that 500mM KCl was used throughout this paper, many (if not all) of the key experiments should be repeated in more standard salt concentration (~100mM), similar to those used in most previous in vitro studies of polymers.

      Response 2.1A. As per reviewer #2 request, we have done at 100 mM KCl too most experiments (TEM, cryo-EM, QCMD and ATPase assays) initially performed at 500 mM KCl only. The KCl concentration affects both membrane binding and filament stiffness as anticipated by the reviewer but the main conclusions are the same. The revised version of the manuscript compiles and compares the results obtained at both high and low [KCl], both concentrations being within the physiological range in Bacillus. Please see point 1 of the response to the global review comments and the first response to reviewer 1 (Response 1.1) for further elaboration.

      Please note that in Mayer & Amann, 2009 (B. subtilis MreB), light scattering in free solution was inversely proportional to the KCl concentration, with the higher light scattering signal at 0 mM KCl (!), a > 2-fold reduction below 30 mM KCl and no scatter at all at 250 mM, suggesting a “salting in” phenomenon (see also the “Other Points to address” answers 1A and 2, below) (Mayer & Amann, 2009). Since no effective polymer formation (e.g. polymers shown by EM) was demonstrated in these experiments, it cannot be excluded that KCl was simply preventing aggregation of B. subtilis MreB in solution, as we observe. For all their other light scattering experiments, the ‘standard polymerization condition’ used by Mayer & Amann was 0.2 mM ATP, 5 mM MgCl2, 1 mM EGTA and 10 mM imidazole pH 7.0, to which MreB (in 5 mM Tris pH 8.0) was added. No KCl was present in their ‘standard’ polymerization conditions.

      This would test if the many divergent properties of MreB(Gs) reported here arise from some difference in MreB(Gs) relative to other MreBs (and actin homologs), or if they arise from the 400mM difference in salt concentration between the studies. Critically, it would also allow direct comparisons to be made relative to previous studies of MreB (and other actin homologs) that used much lower salt, thereby allowing them to definitively demonstrate whether MreB(Gs) is indeed an outlier relative to other MreB and actin homologs. I would suggest using 100mM KCL, as historically, all polymerization assays of actin and numerous actin homologs have used 50-100mM KCL: 50mM KCl (for actin in F buffer) or 100mM KCl for multiple prokaryotic actin homologs and MreB (Deng et al., 2016; Ent et al., 2014; Esue et al., 2006, 2005; Garner et al., 2004 ; Polka et al., 2009 ; Rivera et al., 2011 ; Salje et al., 2011). Likewise, similar salt concentrations are standard for tubulin (80 mM K-Pipes) and FtsZ (100 mM KCl or 100mM KAc in HMK100 buffer).

      Response 2.1B. We appreciate the reviewer’s feedback on this point. Please note that, although actin polymerization assays are historically performed at 50-100 mM KCl and thus 100 mM KCl was used for other bacterial actin homologs (MamK, ParM and AlfA), MreB polymerization assays have previously been reported at 300 mM KCl too (Harne et al., 2020; Pande et al., 2022; Popp et al., 2010; Szatmari et al., 2020), which is closer to the physiological salt concentration in bacterial cells (see Response 1.1), but also in the absence of KCl (see above). As a matter of fact, we originally wanted to use a “standard polymerization condition” based on the literature on MreB, before realizing there was none: only half used KCl (the other half used NaCl, or no monovalent salt at all) and among these, KCl concentrations varied (out of 8 publications, 2 used 20 mM KCl, 2 used 50 mM KCl and 4 used 300 mM KCl).

      1. (Difference 2) - One of the most important differences claimed in this paper is that MreB(Gs) filaments are straight, a result that runs counter to the curved T. Maritima and C. crescentus filaments detailed by the Löwe group (Ent et al., 2014; Salje et al., 2011). Importantly, this difference could also arise from the difference in salt concentrations used in each study (500mM here vs. 100mM in the Löwe studies), and thus one cannot currently draw any direct comparisons between the two studies.

      One example of how high salt could be causing differences in filament geometry: high salts are known to greatly increase the bending stiffness of actin filaments, making them more rigid (Kang et al., 2013). Likewise, increasing salt is known to change the rigidity of membranes. As the ability of filaments to A) bend the membrane or B) Deform to the membrane depends on the stiffness of filaments relative to the stiffness of the membrane, the observed difference in the "straight vs. curved" conformation of MreB filaments might simply arise from different salt concentrations. Thus, in order to draw several direct comparisons between their findings and those of other MreB orthologs (as done here), the studies of MreB(GS) confirmations on lipids should be repeated at the same buffer conditions as used in the Löwe papers, then allowing them to be directly compared.

      Response 2.2. We fully agreed with reviewer #2 that the salts could be affecting the assay and did cryo-EM experiments also in the presence of 100 mM KCl as requested. The results unambiguously showed countless curved liposomes on the contact areas with MreB (Fig. 2F-G and Fig. 2-S5), very similar to what was reported for Thermotoga and Caulobacter MreBs by the Lowe group. Our results therefore confirm the previous findings that MreBs can bend lipids, and suggest that, indeed, high salt may increase filament stiffness as it has been shown for actin filaments. We are very grateful to reviewer #2 for his suggestion and for drawing our attention to the work of Kang et al, 2013. The different bending observed when varying the salt concentration raise relevant questions regarding the in vivo behavior of MreB, since KCl was shown to vary greatly depending on the medium composition. The manuscript has been updated accordingly in the Results (from L243) and Discussion sections (L585-595).

      1. (Difference 3) - The next important difference between MreB(Gs) and other MreBs is the claim that MreB polymers do not form in the absence of membranes.

      A) This is surprising relative to other MreBs, as MreBs from 1) T. maritime (multiple studies), E.coli (Nurse and Marians, 2013), and C. crescentus (Ent et al., 2014) have been shown to form polymers in solution (without lipids) with electron microscopy, light scattering, and time-resolved multi-angle light scattering. Notably, the Esue work was able to observe the first phase of polymer formation and a subsequent phase of polymer bundling (Esue et al., 2006) of MreB in solution. 2) Similarly, (Mayer and Amann, 2009) demonstrated B. subtilis MreB forms polymers in the absence of membranes using light scattering.

      Response 2.3A. The literature does convincingly show that Thermotoga MreB forms polymers in solution, without lipids (note that for Caulobacter MreB filaments were only reported in the presence of lipids, (van den Ent et al, 2014)). Assemblies reported in solution are bundles or sheets (included in at the earlier time points in the time-resolved EM experiments reported by Esue et al. 2006 mentioned by the reviewer – ‘2 minutes after adding ATP, EM revealed that MreB formed short filamentous bundles’) (Esue et al, 2006). However, and as discussed above (Response 2.1A), the light scattering experiments in Mayer et Amann, 2009 do not conclusively demonstrate the presence of polymers of B. subtilis MreB in solution (Mayer & Amann, 2009). We performed many light scattering experiments of B. subtilis MreB in solution in the past (before finding out that filaments were only forming in the presence of lipids), and got similar scattering curves (see two examples of DLS experiments in Author response image 1) in conditions in which NO polymers could ever been observed by EM while plenty of aggregates were present.

      Author response image 1.

      We did not consider these results publishable in the absence of true polymers observed by TEM. As pointed out on the interesting study from Nurse et al. (on E. coli MreB) (Nurse & Marians, 2013), one cannot rely only on light scattering only because non-specific aggregates would show similar patterns than polymers. Over the last two decades, about 15 publications showed polymers of MreB from several Gram-negative species, while none (despite the efforts of many) showed a single convincing MreB polymer from a Gram-positive bacterium by EM. A simple hypothesis is that a critical parameter was missing, and we present convincing evidence that lipids are critical for Geobacillus MreB to form pairs of filaments in the conditions tested. However, in solution too we do occasionally see pairs of filaments (Fig 2-S2), and also sheet-like structures among aggregates when the concentration of MreB is increased (Fig. 2-S2 and Fig. 3-S2). Thus, we agree with the reviewer that it cannot be claimed that Geobacillus MreB is unable to polymerize in the absence of lipids, but rather that lipids strongly stimulate its polymerization, condition depending.

      B) The results shown in figure 5A also go against this conclusion, as there is only a 2-fold increase in the phosphate release from MreB(Gs) in the presence of membranes relative to the absence of membranes. Thus, if their model is correct, and MreB(Gs) polymers form only on membranes, this would require the unpolymerized MreB monomers to hydrolyze ATP at 1/2 the rate of MreB in filaments. This high relative rate of hydrolysis of monomers compared to filaments is unprecedented. For all polymers examined so far, the rate of monomer hydrolysis is several orders of magnitude less than that of the filament. For example, actin monomers are known to hydrolyze ATP 430,000X slower than the monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      Response 2.3B. We agree with the reviewer. We have now found conditions where sheets of MreB form in solution (at high MreB concentration) in the presence of ADP and AMP-PNP. However, we have now added several controls that exclude efficient formation of polymers in solution in the presence of ATP at low concentrations of MreBGs (≤ 1.5 µM), the condition used for the malachite green assays. At these MreB concentrations, pairs of filaments are observed in the presence of lipids, but very unfrequently in solution, and sheets are not observed in solution either (Fig. 2-S2A, B). Yet, albeit puzzling, in these conditions Pi release is reproducibly observed in solution, reduced only ~ 2 to 3-fold relative to Pi release in the presence of lipids (Fig. 5A and Fig. 5-S1). A reinforcing observation is when the ATPase assays is performed at 100 mM KCl (Fig. 5A). In this condition MreB binding to lipids is increased relative to 500 mM KCl (Fig. 4-S4C), and the stimulation of the ATPase activity by the presence of lipids is also stronger that at 500 mM (Fig. 5-S1A). Further work is needed to characterize in detail the ATPase activity of MreB proteins, for which data in the literature is very scarce. We can’t exclude that MreB could nucleate in solution or form very unstable filaments that cannot be seen in our EM assay but consume ATP in the process. At the moment, the significance of the Pi released in solution is unknown and will require further investigation.

      C) Thus, there is a strong possibility that MreB(Gs) polymers are indeed forming in solution in addition to those on the membrane, and these "solution polymers" may not be captured by their electron microscopy assay. For example, high salt could be interfering with the absorption of filaments to glow discharged lacking lipids.

      Response 2.3C. We appreciate the reviewer’s insight about this critical point. Polymers presented in the original Fig. 2A were obtained at 500 mM KCl but we had tested the polymerization of MreB at 100 mM KCl as well, without noticing differences. We have nonetheless redone this quantitatively and used these data for the revised Fig. 2A, as we are now using 100 mM KCl as our standard polymerization condition throughout the revised manuscript. We also followed the other suggestion of the reviewer and tested glow discharged grids (a more classic preparation for soluble proteins) vs non-glow discharged EM grids, as well as a higher concentration of MreB. Grids are generally glow-discharged to make them hydrophilic in order to adsorb soluble proteins, but the properties of MreB (soluble but obviously presenting hydrophobic domains) made difficult to predict what support putative soluble polymers would preferentially interact with. Septins for example bind much better to hydrophobic grids despite their soluble properties (I. Adriaans, personal communication). Virtually no double filaments were observed in solution at either low or high [MreB]. The fact that in some conditions (high [MreB], other nucleotides) we were able to detect sheet-like structures excluded a technical issue that would prevent the detection of existing but “invisible” polymers here. We have added these new data in Fig. 2-S2.

      As indicated above, the reviewer’s comments made us realize that we could not state or imply that MreB cannot polymerize in the absence of lipids. As a matter of fact, we always saw some random filaments in the EM fields, both in solution and in the presence of non-hydrolysable analogues, at very low frequency (Fig. 2A). And we do see now sheets at high MreB concentration (Fig. 2-S2B). We could be just missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that no polymers could ever form in the absence of ATP or lipids. Therefore, we have:

      1) analyzed all TEM data to present it as semi-quantitative TEM, using our methodology originally implemented for the analysis of the mutants

      2) reworked the text to remove any issuing statements and to indicate that MreBGs was only found to bind to a lipid monolayer as a double protofilament in the presence of ATP/GTP but that this does not exclude that filaments may also form in other conditions.

      In order to definitively prove that MreB(Gs) does not have polymers in solution, the authors should:

      i) conduct orthogonal experiments to test for polymers in solution. The simplest test of polymerization might be conducting pelleting assays of MreB(Gs) with and without lipids, sweeping through the concentration range as done in 2B and 5a.

      Response 2.3Ci. Following reviewer #2 suggestion, we conducted a series of sedimentation assays in the presence and in the absence of lipids, at low (100 mM) and high (500 mM) salt, for both the wild-type protein and the three membrane-anchoring mutants (all at 1.3 µM). Sedimentation experiments in salt conditions preventing aggregation in solution (500 mM KCl) fitted with our TEM results: MreB wild-type pelleting increased in the presence of both ATP and lipids (Fig. R1). The sedimentation was further increased at 100 mM KCl, which would fit our other results indicating an increased interaction of MreB with the membrane. However, in addition to be poorly reproducible (in our hands), the approach does not discriminate between polymers and aggregates (or monomers bound to liposomes) and since MreB has a strong tendency to aggregate, we believe that the technique is ill-suited to reliably address MreB polymerization and prefer not to include sedimentation data in our manuscript. The recent work from Pande et al. (2022) illustrates well this issue since no sedimentation of MreB (at 2 µM) was observed in solution in conditions supporting polymerization (at 300 mM KCl): ‘the protein does not pellet on its own in the absence of liposome, irrespective of its polymerization state’, implying that sedimentation does not allow to detect MreB5 filaments in solution (Pande et al., 2022).

      ii) They also could examine if they see MreB filaments in the absence of lipids at 100mM salt (as was seen in both Löwe studies), as the high salt used here might block the charges on glow discharged grids, making it difficult for the polymer to adhere.

      See above, Response 2.3C

      iii) Likewise, the claim that MreB lacking the amino-terminus and the α2β7 hydrophobic loop "is required for polymerization" is questionable as if deleting these resides blocks membrane binding, the lack of polymers on the membrane on the grid is not unexpected, as these filaments that cannot bind the membrane would not be observable. Given these mutants cannot bind the membrane, mutant polymers could still indeed exist in solution, and thus pelleting assays should be used to test if non-membrane associated filaments composed of these mutants do or do not exist.

      Response 2.3Ciii. This is a fair point, we thank the reviewer for this remark. We did not mean to state or imply that the hydrophobic loop was required for polymerization per se, but that polymerization into double filaments only efficiently occurs upon membrane binding, which is mediated by the two hydrophobic sequences. We tested all three mutants by sedimentation as suggested by reviewer #2. In the salt condition that limits aggregation (500 mM KCl) the mutants did not pellet while the wild-type protein did (in the presence of lipids) (Fig. R2 below), in agreement with our EM data. We tested the absence of lipids on the mutant bearing the 2 deletions and observed that the (partial) sedimentation observed at low KCl concentration was ATP and lipid dependent (Fig. R3).

      Given our concerns about MreB sedimentation assays (see above, Response 2.3Ci), we prefer not to include these sedimentation data in our manuscript. Instead, we tested by TEM the possible polymerization of the mutants in solution (we only tested them in the presence of lipids in the initial submission). No filaments were detected in solution for any of the mutants (Fig. 4-S3A).

      A final note, the results shown in "Figure 1 - figure supplement 2, panel C" appear to directly refute the claim that MreB(Gs) requires lipids to polymerize. As currently written, it appears they can observe MreB(Gs) filaments on EM grids without lipids. If these experiments were done in the presence of lipids, the figure legend should be updated to indicate that. If these experiments were done in the absence of lipids, the claim that membrane association is required for MreB polymerizations should be revised.

      The TEM experiments show were indeed performed in the presence of lipids. We apologize for this was not clearly stated in the legend. To prevent all confusion, we have nevertheless removed these images in this figure since the polymerization conditions and lipid requirement are not yet presented when this figure is referred to in the text. We have instead added a panel with the calibration curve for the size exclusion profiles as per request of reviewer #3. The main point of this figure is to show the tendency of MreBGs to aggregate: analytical size-exclusion chromatography shows a single peak corresponding to the monomeric MreBGs, molecular weight ~ 37 KDa, in our purification conditions, but it can readily shift to a peak corresponding to high MW aggregates, depending on the protein concentration and/or storage conditions.

      1. (Difference 4) - The next difference between this study and previous studies of MreB and actin homologs is the conclusion that MreB(Gs) must hydrolyze ATP in order to polymerize. This conclusion is surprising, given the fact that both T. Maritima (Salje · 2011, Bean 2008) and B. subtilis MreB (Mayer 2009) have been shown to polymerize in the presence of ATP as well as AMP-PNP.

      Likewise, MreB polymerization has been shown to lag ATP hydrolysis in not only T. maritima MreB (Esue 2005), eukaryotic actin, and all other prokaryotic actin homologs whose polymerization and phosphate release have been directly compared: MamK (Deng et al., 2016), AlfA (Polka et al., 2009), and two divergent ParM homologs (Garner et al., 2004; Rivera et al., 2011). Currently, the only piece of evidence supporting the idea that MreB(Gs) must hydrolyze ATP in order to polymerize comes from 2 observations: 1) using electron microscopy, they cannot see filaments of MreB(Gs) on membranes in the presence of AMP-PNP or ApCpp, and 2) no appreciable signal increase appears testing AMPPNP- MreB(Gs) using QCM-D. This evidence is by no means conclusive enough to support this bold claim: While their competition experiment does indicate AMPPNP binds to MreB(Gs), it is possible that MreB(Gs) cannot polymerize when bound to AMPPNP.

      For example, it has been shown that different actin homologs respond differently to different non-hydrolysable analogs: Some, like actin, can hydrolyze one ATP analog but not the other, while others are able to bind to many different ATP analogs but only polymerize with some of one of them.

      Response 2.4. We agree with the reviewer, it is uncertain what analogs bind because they are quite different to ATP and some proteins just do not like them, they can change conditions such that filaments stop forming as well and be (theoretically) misleading. This is why we had tested ApCpp in addition to AMP-PNP as non-hydrolysable analog (Fig. 3A). As indicated above, our new complementary experiments (Fig. 3-S1B-D) now show that some rare (i.e. unfrequently and in limited amount) dual polymers are detected in the presence of ApCpp (Fig. 3A) and at high MreB concentration only in the presence of AMP-PNP (Fig. 3-S1B-D), suggesting different critical concentrations in the presence of alternative nucleotides. We have dampened our conclusions, in the light of our new data, and modified the discussion accordingly.

      Thus, to further verify their "hydrolysis is needed for polymerization" conclusion, they should:

      A. Test if a hydrolysis deficient MreB(Gs) mutant (such as D158A) is also unable to polymerize by EM.

      Response 2.4A. We thank the reviewer for this suggestion. As this conclusion has been reviewed on the basis of our new data (see previous response), testing putative ATPase deficient mutants is no longer required here. The study of ATPase mutants is planned for future studies (see Response 3.10 to reviewer #3).

      B. They also should conduct an orthogonal assay of MreB polymerization aside from EM (pelleting assays might be the easiest). They should test if polymers of ATP, AMP-PNP, and MreB(Gs)(D158A) form in solution (without membranes) by conducting pelleting assays. These could also be conducted with and without lipids, thereby also addressing the points noted above in point 3.

      Response 2.4B. Please see Response 2.3Ci above.

      C. Polymers may indeed form with ATP-gamma-S, and this non-hydrolysable ATP analog should be tested.

      Response 2.4C. It is fairly possible that ATP-γ-S supports polymerization since it is known to be partially hydrolysable by actin giving a mild phenotype (Mannherz et al, 1975). This molecule can even be a bona fide substrate for some ATPases (e.g. (Peck & Herschlag, 2003). Thus, we decided to exclude this “non-hydrolysable” analog and tested instead AMP-PNP and ApCpp. We know that ATP-γ-S has been and it is still frequently used, but we preferred to avoid it for the moment for the above-indicated reasons. We chose AMPPNP and AMPPCP instead because (1) they were shown to be completely non-hydrolysable by actin, in contrast to ATP-γ-S; (2) they are widely used (the most commonly used for structural studies; (Lacabanne et al, 2020), (3) AMPPNP was previously used in several publications on MreB (Bean & Amann, 2008; Nurse & Marians, 2013; Pande et al., 2022; Popp et al., 2010; Salje et al, 2011; van den Ent et al., 2014)and thus would allow direct comparison. AMPPCP was added to confirm the finding with AMP-PNP. There are many other analogs that we are planning to explore in future studies (see next Response, 2.4D).

      D. They could also test how the ADP-Phosphate bound MreB(Gs) polymerizes in bulk and on membranes, using beryllium phosphate to trap MreB in the ADP-Pi state. This might allow them to further refine their model.

      Response 2.4D. We plan to address the question of the transition state in depth in following-up work, using a series of analogs and mutants presumably affected in ATPase activity, both predicted and identified in a genetic screen. As indicated above, it is uncertain what analogs bind because they are quite different to ATP and some may bind but prevent filament formation. Thus, we anticipate that trying just one may not be sufficient, they can change conditions and be (theoretically) misleading and thus a thorough analysis is needed to address this question. Since our model and conclusions have been revised on the basis of our new data, we believe that these experiments are beyond the scope of the current manuscript.

      E. Importantly, the Mayer study of B. subtilis MreB found the same results in regard to nucleotides, "In polymerization buffer, MreB produced phosphate in the presence of ATP and GTP, but not in ADP, AMP, GDP or AMP-PNP, or without the readdition of any nucleotide". Thus this paper should be referenced and discussed

      Response 2.4E. We agree that Pi release was detected previously. We have added the reference (L121)

      1. (Difference 5) - The introduction states (lines 128-130) "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolysable ATP analog AMP-PNP."

      A) While this is a great way to introduce the problem, the statement is a bit vague and should be clarified, detaining the conflicting results and appropriate references. For example, what conflicting in vivo results are they referring to? Regarding "MreB polymerization in AMP-PNP", multiple groups have shown the polymerization of MreB(Tm) in the presence of AMP-PNP, but it is not clear what papers found opposing results.

      Response 2.5A. Thanks for the comment. We originally did not detail these ‘conflicting results’ in the Introduction because we were doing it later in the text, with the appropriate references, in particular in the Discussion (former L433-442). We have now removed this from the Discussion section and added a sentence in the introduction too (L123-130) quickly detailing the discrepancies and giving the references.

      • For more clarity, we have removed the “in vivo” (which referred to the distinct results reported for the presumed ATPase mutants by the Garner and Graumann groups) and focus on the in vitro discrepancies only.

      • These discrepancies are the following: while some studies showed indeed polymerization (as assessed by EM) of MreBTm in the presence of AMPPNP, the studies from Popp et al and Esue et al on T. maritima MreB, and of Nurse et al on E. coli MreB reported aggregation in the presence of AMP-PNP (Esue et al., 2006; Popp et al., 2010) or ADP (Nurse & Marians, 2013), or no assembly in the presence of ADP (Esue et al., 2006). As for the studies reporting polymerization in the presence of AMP-PNP by light scattering only (Bean & Amann, 2008; Gaballah et al, 2011; Mayer & Amann, 2009; Nurse & Marians, 2013), they could not differentiate between aggregates or true polymers and thus cannot be considered conclusive.

      B) The statement "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolyzable ATP analog AMP-PNP" is technically incorrect and should be rephrased or further tested.

      i. For all actin (or tubulin) family proteins, it is not that a given filament "cannot polymerize" in the presence of ADP but rather that the ADP-bound form has a higher critical concentration for polymer formation relative to the ATP-bound form. This means that the ADP polymers can indeed polymerize, but only when the total protein exceeds the ADP critical concentration. For example, many actin-family proteins do indeed polymerize in ADP: ADP actin has a 10-fold higher critical concentration than ATP actin, (Pollard, 1984) and the ADP critical concentrations of AlfA and ParM are 5X and 50X fold higher (respectively) than their ATP-bound forms(Garner et al., 2004; Polka et al., 2009)

      Response 2.5Bi. Absolutely correct. We apologize for the lack of accuracy of our phrasing and have corrected it (L123).

      ii. Likewise, (Mayer and Amann, 2009) have already demonstrated that B. subtilis MreB can polymerize in the presence of ADP, with a slightly higher critical concentration relative to the ATP-bound form.

      Response 2.5Bii. In Mayer and Amann, 2009, the same light scattering signal (interpreted as polymerization) occurred regardless of the nucleotide, and also in the absence of nucleotide (their Fig. 10) and ATP-, ADP- and AMP-PNP-MreB ‘displayed nearly indistinguishable critical concentrations’. They concluded that MreB polymerization is nucleotide-independent. Please see below (responses to ’Other points to address’) our extensive answer to the Mayer & Amann recurring point of reviewer #2

      Thus, to prove that MreB(Gs) polymers do not form in the presence of ADP would require one to test a large concentration range of ADP-bound MreB(Gs). They should test if ADP- MreB(Gs) polymerizes at the highest MreB(Gs) concentrations that can be assayed. Even if this fails, it may be the MreB(Gs) ADP polymerizes at higher concentrations than is possible with their protein preps (13uM). An even more simple fix would be to simply state MreB(Gs)-ADP filaments do not form beneath a given MreB(Gs) concentration.

      We agree with the reviewer. Our wording was overstating our conclusions. Based on our new quantifications (Fig. 3-S1B, D), we have rephrased the results section and now indicate that pairs of filaments are occasionally observed in the presence of ADP in our conditions across the range of MreB concentration that could be tested, suggesting a higher critical concentration for MreB-ADP (L310-312). Only at the highest MreB concentration, sheet- and ribbon-like structures were observed in the presence of ADP (Fig. 3-S2B).

      Other Points to address:

      1) There are several points in this paper where the work by Mayer and Amann is ignored, not cited, or readily dismissed as "hampered by aggregation" without any explanation or supporting evidence of that fact.

      We have cited the Mayer study where appropriate. However, we cannot cite it as proof of polymerization in such or such condition since their approach does not show that polymers were obtained in their conditions. Again, they based all their conclusions solely on light scattering experiments, which cannot differentiate between polymers and aggregates.

      A) Lines 100-101 - While the irregular 3-D formations seen formed by MreB in the Dersch 2020 paper could be interpreted as aggregates, stating that the results from specifically the Gaballah and Meyer papers (and not others) were "hampered by aggregation" is currently an arbitrary statement, with no evidence or backing provided. Overall, these lines (and others in the paper) dismiss these two works without giving any evidence to that point. Thus, they should provide evidence for why they believe all these papers are aggregation, or remove these (and other) dismissive statements.

      We apologize if our statements about these reports seemed dismissive or disrespectful, it was definitely not our intention. Light scattering shows an increase of size of particles over time, but there is no way to tell if the scattering is due to organized (polymers) or disorganized (aggregation) assemblies. Thus, it cannot be considered a conclusive evidence of polymerization without the proof that true filaments are formed by the protein in the conditions tested, as confirmed by EM for example. MreB is known to easily aggregate (see our size exclusion chromatography profiles and ones from Dersch 2020 (Dersch et al, 2020), and note that no chromatography profiles were shown in the Mayer report) and, as indicated above, we had similar light scattering results for MreB for years, while only aggregates could be observed by TEM (see above Response 2.3A). Several observations also suggest that aggregation instead of polymerization might be at play in the Mayer study, for example ‘polymerization’ occurring in salt-less buffer but ‘inhibited’ with as low as 100 mM KCl, which should rather be “salting in” (see below). We did not intend to be dismissive, but it seemed wrong to report their conclusions as conclusive evidence. We thought that we had cited these papers where appropriate but then explained that they show no conclusive proof of polymerization and why, but it is evident that we failed at communicating it clearly. We have reworked the text to remove any issuing and arbitrary statement about our concerns regarding these reports (e.g. L93 & L126).

      One important note - There are 2 points indicating that dismissing the Meyer and Amann work as aggregation is incorrect:

      1) the Meyer work on B. subtilis MreB shows both an ATP and a slightly higher ADP critical concentration. As the emergence of a critical concentration is a steady-state phenomenon arising from the association/dissociation of monomers (and a kinetically limiting nucleation barrier), an emergent critical concentration cannot arise from protein aggregation, critical concentrations only arise from a dynamic equilibrium between monomer and polymer.

      • Critical concentration for ATP, ADP or AMPPNP were described in Mayer & Amann (Mayer & Amann, 2009) as “nearly indistinguishable” (see Response 2.5Bii)
      • Protein aggregation depends on the solution (pH and ions), protein concentration and temperature. And above a certain concentration, proteins can become instable, thus a critical concentration for aggregation can emerge.

      2) Furthermore, Meyer observed that increased salt slowed and reduced B. subtilis MreB light scattering, the opposite of what one would expect if their "polymerization signal" was only protein aggregation, as higher salts should increase the rate of aggregation by increasing the hydrophobic effect.

      It is true that at high salt concentration proteins can precipitate, a phenomenon described as “salting out”. However, it is also true that salts help to solubilize proteins (“salting in”), and that proteins tend to precipitate in the absence of salt. Considering that the starting point of the Mayer and Amann experiment (Mayer & Amann, 2009) is the absence of salt (where they observed the highest scattering) and that they gradually reduce this scattering by increasing KCl (the scattering is almost abolished below 100 mM only!) it is plausible that a salting-in phenomenon might be at play, due to increased solubility of MreB by salt. In any case, this cannot be taken as a proof that polymerization rather than aggregation occurred.

      B) Lines 113-137 -The authors reference many different studies of MreB, including both MreB on membranes and MreB polymerized in solution (which formed bundles). However, they again neglect to mention or reference the findings of Meyer and Amann (Mayer and Amann, 2009), as it was dismissed as "aggregation". As B. subtilis is also a gram-positive organism, the Meyer results should be discussed.

      We did cite the Mayer and Amann paper but, as explained above, we cannot cite this study as an example of proven polymerization. We avoided as much as possible to polemicize in the text and cited this paper when possible. Again, we have reworked the text to avoid any issuing or dismissive statement. Also, we forgot mentioned this study at L121 as an example of reported ATPase activity, and this has now been corrected.

      2) Lines 387-391 state the rates of phosphate release relative to past MreB findings: "These rates of Pi release upon ATP hydrolysis (~ 1 Pi/MreB in 6 min at 53{degree sign}C) are comparable to those observed for MreBTm and MreB(Ec) in vitro". While the measurements of Pi release AND ATP hydrolysis have indeed been measured for actin, this statement does not apply to MreB and should be corrected: All MreB papers thus far have only measured Pi release alone, not ATP hydrolysis at the same time. Thus, it is inaccurate to state "rates of Pi release upon ATP hydrolysis" for any MreB study, as to accurately determine the rate of Pi release, one must measure: 1. The rate of polymer over time, 2) the rate of ATP hydrolysis, and 3) the rate of phosphate release. For MreB, no one has, so far, even measured the rates of ATP hydrolysis and phosphate release with the same sample.

      We completely agree with the reviewer, we apologize if our formulation was inaccurate. We have corrected the sentence (L479). Thank you for pointing out this mistake.

      3) The interpretation of the interactions between monomers in the MreB crystal should be more carefully stated to avoid confusion. While likely not their intention, the discussions of the crystal packing contacts of MreB can appear to assume that the monomer-monomer contacts they see in crystals represent the contacts within actual protofilaments. One cannot automatically assume the observations of monomer-monomer contacts within a crystal reflect those that arise in the actual filament (or protofilament).

      We agree, we thank the reviewer for his comments. We have revamped the corresponding paragraph.

      A) They state, "the apo form of MreBGs forms less stable protofilaments than its G- homologs ." Given filaments of the Apo form of MreB(GS) or b. subtilis have never been observed in solution, this statement is not accurate: while the contacts in the crystal may change with and without nucleotide, if the protein does not form polymers in solution in the apo state, then there are no "real" apo protofilaments, and any statements about their stability become moot. Thus this statement should be rephrased or appropriately qualified.

      see above.

      B) Another example: while they may see that in the apo MreB crystal, the loop of domain IB makes a single salt bridge with IIA and none with IIB. This contrasts with every actin, MreB, and actin homolog studied so far, where domain IB interacts with IIB. This might reflect the real contacts of MreB(Gs) in the solution, or it may be simply a crystal-packing artifact. Thus, the authors should be careful in their claims, making it clear to the reader that the contacts in the crystal may not necessarily be present in polymerized filaments.

      Again, we agree with the reviewer, we cannot draw general conclusions about the interactions between monomers from the apo form. We have rephrased this paragraph.

      4) lines 201-202 - "Polymers were only observed at a concentration of MreB above 0.55 μM (0.02 mg/mL)". Given this concentration dependence of filament formation, which appears the same throughout the paper, the authors could state that 0.55 μM is the critical concentration of MreB on membranes under their buffer conditions. Given the lack of critical concentration measurement in most of the MreB literature, this could be an important point to make in the field.

      Following reviewer’s #2 suggestion, we have now estimated the critical concentration (Cc=0.4485 µM) and reported it in the text. (L218).

      5) Both mg/ml and uM are used in the text and figures to refer to protein concentration. They should stick to one convention, preferably uM, as is standard in the polymer field.

      Sorry for the confusion. We have homogenized to MreB concentrations to µM throughout the text and figures.

      6) Lines 77-78 - (Teeffelen et al., 2011) should be referenced as well in regard to cell wall synthesis driving MreB motion.

      This has been corrected, sorry for omitting this reference.

      7) Line 90 - "Do they exhibit turnover (treadmill) like actin filaments?". This phrase should be modified, as turnover and treadmilling are two very different things. Turnover is the lifetime of monomers in filaments, while treadmilling entails monomer addition at one end and loss at the other. While treadmilling filaments cause turnover, there are also numerous examples of non-treadmilling filaments undergoing turnover: microtubules, intermediate filaments, and ParM. Likewise, an antiparallel filament cannot directionally treadmill, as there is no difference between the two filament ends to confer directional polarity.

      This is absolutely true, we apologize for our mistake. The sentence has been corrected (L82).

      8) Throughout the paper, the term aggregation is used occasionally to describe the polymerization shown in many previous MreB studies, almost all of which very clearly showed "bundled" filaments, very distinct entities from aggregates, as a bundle of polymers cannot form without the filaments first polymerizing on their own. Evidence to this point, polymerization has been shown to precede the bundling of MreB(Tm) by (Esue et al., 2005).

      We agree with reviewer #2 about polymers preceding bundles and “sheets”. However, we respectfully disagree that we used the word aggregation “throughout the paper” to describe structures that clearly showed polymers or sheets of filaments. A search (Ctrl-F: “aggreg”) reveals only 6 matches, 3 describing our own observations (L152, 163/5, and 1023/28), one referring to (Salje et al., 2011) (L107) but citing her claim that they observed aggregation (due to the N-terminus), and the last two (L100, L440) refer (again) to the Gaballah/Mayer/Dersch publications to say that aggregation could not be excluded in these reports as discussed above (Dersch et al., 2020; Gaballah et al., 2011; Mayer & Amann, 2009).

      9) lines 106-108 mention that "The N-terminal amphipathic helix of E. coli MreB (MreBEc) was found to be necessary for membrane binding. " This is not accurate, as Salje observed that one single helix could not cause MreB to mind to the membrane, but rather, multiple amphipathic helices were required for membrane association (Salje et al., 2011).

      Salje et al showed that in vivo the deletion of the helix abolishes the association of MreB to the membrane. This publication also shows that in vitro, addition of the helix to GFP (not to MreB) prompts binding to lipid vesicles, and that this was increased if there are 2 copies of the helix, but they could not test this directly in vitro with MreB (which is insoluble when expressed with its N-terminus). This prompted them to speculate that multiple MreBs could bind better to the membrane than monomers. However, this remained to be demonstrated. Additional hydrophobic regions in MreB such as the hydrophobic loop could participate to membrane anchoring but are absent in their in vitro assays with GFP.

      The Salje results imply that dimers (or further assemblies) of MreB drive membrane association, a point that should be discussed in regard to the question "What prompts the assembly of MreB on the inner leaflet of the cytoplasmic membrane?" posed on lines 86-87.

      We agree that this is an interesting point. As it is consistent with our results, we have incorporated it to our model (Fig. 6) and we are addressing it in the discussion L573-575.

      10) On lines 414-415, it is stated, "The requirement of the membrane for polymerization is consistent with the observation that MreB polymeric assemblies in vivo are membrane-associated only." While I agree with this hypothesis, it must be noted that the presence or absence of MreB polymers in the cytoplasm has not been directly tested, as short filaments in the cytoplasm would diffuse very quickly, requiring very short exposures (<5ms) to resolve them relative to their rate of diffusion. Thus, cytoplasmic polymers might still exist but have not been tested.

      This is also an interesting point. Indeed if a nucleated form, or very short (unbundled) polymers exist in the cytoplasm, they have not been tested by fluorescence microscopy. However, the polymers that localize at the membrane (~ 200 nm), if soluble, would have been detected in the cytoplasm by the work of reviewer #2, us or others.

      11) lines 429-431 state, "but polymerization in the presence of ADP was in most cases concluded from light scattering experiments alone, so the possibility that aggregation rather than ordered polymerization occurred in the process cannot be excluded."

      A) If an increased light scattering signal is initiated by the addition of ADP (or any nucleotide), that signal must come from polymerization or multimerization. What the authors imply is that there must be some ADP-dependent "aggregation" of MreB, which has not been seen thus far for any polymer. Furthermore, why would the addition of ADP initiate aggregation?

      We did not mean that ADP itself would prompt aggregation, but that the protein would aggregate in the buffer regardless of the presence of ADP or other nucleotides. The Mayer & Amann study claims that MreB “polymerization” is nucleotide-independent, as they got identical curves with ATP, ADP, AMPPNP and even with no nucleotides at all (Fig. 10 in their paper, pasted here) (Mayer & Amann, 2009).

      Their experiments with KCl are also remarkable as when they lowered the salt they got faster and faster “polymerization”, with the strongest light scattering signal in the absence of any salt. The high KCl concentration in which they got almost no more “polymers” was 75 mM KCl, and ‘polymerization was almost entirely inhibited at 100 mM’ (Fig. 7, pasted below). Yet the intracellular level of KCl in bacteria is estimated to be ~300 mM (see Response 1.1)

      B) Likewise, the statement "Differences in the purity of the nucleotide stocks used in these studies could also explain some of the discrepancies" is unexplained and confusing. How could an impurity in a nucleotide stock affect the past MreB results, and what is the precedent for this claim?

      We meant that the presence of ATP in the ADP stocks might have affected the outcome of some assays, generating the conflicting results existing in the literature. We agree this sentence was confusing, we have removed it.

      12) lines 467-469 state, "Thus, for both MreB and actin, despite hydrolyzing ATP before and after polymerization, respectively, the ADP-Pi-MreB intermediate would be the long-lived intermediate state within the filaments."

      A) For MreB, this statement is extremely speculative and unbiased, as no one has measured 1) polymerization, 2) ATP hydrolysis, and 3) phosphate release. For example, it could be that ATP hydrolysis is slow, while phosphate release is fast, as is seen in the actin from Saccharomyces cerevisiae.

      We agree that this was too speculative. This has been removed from the (extensively) modified Discussion section. Thanks for the comment.

      B) For actin, the statement of hydrolysis of ATP of monomer occurring "before polymerization" is functionally irrelevant, as the rate of ATP hydrolysis of actin monomers is 430,000 times slower than that of actin monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      We agree that the difference of hydrolysis rate between G-actin and F-actin implies that ATP hydrolysis occurs after polymerization. We are afraid that we do not follow the reviewer’s point here, we did not say or imply that ATP hydrolysis by actin monomers was functionally relevant.

      13) Lines 442-444. "On the basis of our data and the existing literature, we propose that the requirement for ATP (or GTP) hydrolysis for polymerization may be conserved for most MreBs." Again, this statement both here (and in the prior text) is an extremely bold claim, one that runs contrary to a large amount of past work on not just MreB, but also eukaryotic actin and every actin homolog studied so far. They come to this model based on 1) one piece of suggestive data (the behavior of MreB(GS) bound to 2 non-hydrolysable ATP analogs in 500mM KCL), and 2) the dismissal (throughout the paper) of many peer-reviewed MreB papers that run counter to their model as "aggregation" or "contaminated ATP stocks ." If they want to make this bold claim that their finding invalidates the work of many labs, they must back it up with further validating experiments.

      We respectfully disagree that our model was based on “one piece of suggestive data” and backed-up by dismissing most past work in the field. We only wanted to raise awareness about the conflicting data between some reports (listed in response 2.5a), and that the claims made by some publications are to be taken with caution because they only rely on light scattering or, when TEM was performed, showed only disorganized structures.

      This said, we clearly failed in proposing our model and we are sorry to see that we really annoyed the reviewer with our suspicion that the work by Mayer & Amann reports aggregation. As indicated above, we have amended our manuscript relative to this point. We also agree that our suggestion to generalize our findings to most MreBs was unsupported, and overstated considering how confusing some result from the literature are. We have refined our model and reworked the text to take on board the reviewer’s remarks as well as the new data generated during the revision process.

      We would like to thank reviewer #2 for his in-depth review of our manuscript.  

      Reviewer #3 (Public Review):

      The major claim from the paper is the dependence of two factors that determine the polymerization of MreB from a Gram-positive, thermophilic bacteria 1) The role of nucleotide hydrolysis in driving the polymerization. 2) Lipid bilayer as a facilitator/scaffold that is required for hydrolysis-dependent polymerization. These two conclusions are contrasting with what has been known until now for the MreB proteins that have been characterized in vitro. The experiments performed in the paper do not completely justify these claims as elaborated below.

      We understand the reviewer’ concerns in view of the existing literature on actin and Gram-negative MreBs. We may just be missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that polymers could never form in the absence of ATP or lipids. Our new data actually shows that MreBGs at higher concentration can assemble into bundle- and sheet-like structures in solution and in the presence of ADP/AMP-PNP. Pairs of filaments are however only observed in the presence of lipids for all conditions tested. As indicated in the answers to the global review comments, we have included our new data in the manuscript, revised our conclusions and claims about the lipid requirement and expanded on these points in the Discussion.

      Major comments:

      1) No observation of filaments in the absence of lipid monolayer can also be accounted due to the higher critical concentration of polymerization for MreBGS in that condition. It is seen that all the negative staining without lipid monolayer condition has been performed at a concentration of 0.05 mg/mL. It is important to check for polymerization of the MreBGS at higher concentration ranges as well, in order to conclusively state the requirement of lipids for polymerization.

      Response 3.1. 0.05 mg/ml (1.3µM) is our standard condition, and our leeway was limited by the rapid aggregation observed at higher MreB concentrations, as indicated in the text. We have now tested as well 0.25 mg/ml (6.5 µM - the maximum concentration possible before major aggregation occurs in our experimental conditions). At this higher concentration, we see some sheet-like structures in solution, confirming a requirement of a higher concentration of MreB for polymerization in these conditions (see the answers to the global review comments for more details)

      We thank the reviewer for pushing us to address this point. We have revised our conclusions accordingly.

      2) The absence of filaments for the non-hydrolysable conditions in the lipid layer could also be because the filaments that might have formed are not binding to the planar lipid layer, and not necessarily because of their inability to polymerize.

      Response 3.2. This is a fair point. To test the possibility that polymers would form but would not bind to the lipid layer we have now added additional semi-quantitative EM controls (for both the non-hydrolysable ATP analogs and the three ‘membrane binding’ deletion mutants) testing polymerization in solution (without lipids) and also using plasma-treated grids. These showed that in our standard polymerization conditions, virtually no polymers form in solution (Fig. 3-S1B and Fig. 4-S4A). Albeit at very low frequency, some dual protofilaments were however detected in the presence of ADP or AMP-PNP at the high MreB concentration (Fig. 3-S1D). At this high MreB concentration, the sheet-like structures occasionally observed in solution in the presence of ATP were frequent in the presence of ADP and very frequent in the presence of AMP-PNP (Fig. 3-S2B). We have revised our conclusions on the basis of these new data: MreBGs can form polymeric assemblies in solution and in the absence of ATP hydrolysis at a higher critical concentration than in the presence of ATP and lipids.

      See the answers to the global review comments (point 2) and Response 2.3C to reviewer #2 for more details.

      3) Given the ATPase activity measurements, it is not very convincing that ATP rather than ADP will be present in the structure. The ATP should have been hydrolysed to ADP within the structure. The structure is now suggestive that MreB is not capable of hydrolysis, which is contradictory to the ATP hydrolysis data.

      Response 3.3. We thank the reviewer for her insightful remarks about the MreB-ATP crystal structure. The electron density map clearly demonstrates the presence of 3 phosphates. However, as suggested by the reviewer, the density which was attributed to a Mg2+ ion was to be interpreted as a water molecule. The absence of Mg2+ in the crystal could thus explain why the ATP had not been hydrolyzed.

      References

      Arino J, Ramos J, Sychrova H (2010) Alkali metal cation transport and homeostasis in yeasts. Microbiology and molecular biology reviews 74: 95-120

      Bean GJ, Amann KJ (2008) Polymerization properties of the Thermotoga maritima actin MreB: roles of temperature, nucleotides, and ions. Biochemistry 47: 826-835

      Cayley S, Lewis BA, Guttman HJ, Record MT, Jr. (1991) Characterization of the cytoplasm of Escherichia coli K-12 as a function of external osmolarity. Implications for protein-DNA interactions in vivo. Journal of molecular biology 222: 281-300

      Dersch S, Reimold C, Stoll J, Breddermann H, Heimerl T, Defeu Soufo HJ, Graumann PL (2020) Polymerization of Bacillus subtilis MreB on a lipid membrane reveals lateral co-polymerization of MreB paralogs and strong effects of cations on filament formation. BMC Mol Cell Biol 21: 76

      Eisenstadt E (1972) Potassium content during growth and sporulation in Bacillus subtilis. Journal of bacteriology 112: 264-267

      Epstein W, Schultz SG (1965) Cation Transport in Escherichia coli: V. Regulation of cation content. J Gen Physiol 49: 221-234

      Esue O, Wirtz D, Tseng Y (2006) GTPase activity, structure, and mechanical properties of filaments assembled from bacterial cytoskeleton protein MreB. Journal of bacteriology 188: 968-976

      Gaballah A, Kloeckner A, Otten C, Sahl HG, Henrichfreise B (2011) Functional analysis of the cytoskeleton protein MreB from Chlamydophila pneumoniae. PloS one 6: e25129

      Harne S, Duret S, Pande V, Bapat M, Beven L, Gayathri P (2020) MreB5 Is a Determinant of Rod-to-Helical Transition in the Cell-Wall-less Bacterium Spiroplasma. Curr Biol 30: 4753-4762 e4757

      Kang H, Bradley MJ, McCullough BR, Pierre A, Grintsevich EE, Reisler E, De La Cruz EM (2012) Identification of cation-binding sites on actin that drive polymerization and modulate bending stiffness. Proceedings of the National Academy of Sciences of the United States of America 109: 16923-16927

      Lacabanne D, Wiegand T, Wili N, Kozlova MI, Cadalbert R, Klose D, Mulkidjanian AY, Meier BH, Bockmann A (2020) ATP Analogues for Structural Investigations: Case Studies of a DnaB Helicase and an ABC Transporter. Molecules 25

      Mannherz HG, Brehme H, Lamp U (1975) Depolymerisation of F-actin to G-actin and its repolymerisation in the presence of analogs of adenosine triphosphate. Eur J Biochem 60: 109-116

      Mayer JA, Amann KJ (2009) Assembly properties of the Bacillus subtilis actin, MreB. Cell motility and the cytoskeleton 66: 109-118

      Nurse P, Marians KJ (2013) Purification and characterization of Escherichia coli MreB protein. The Journal of biological chemistry 288: 3469-3475

      Pande V, Mitra N, Bagde SR, Srinivasan R, Gayathri P (2022) Filament organization of the bacterial actin MreB is dependent on the nucleotide state. The Journal of cell biology 221

      Peck ML, Herschlag D (2003) Adenosine 5 '-O-(3-thio)triphosphate (ATP-gamma S) is a substrate for the nucleotide hydrolysis and RNA unwinding activities of eukaryotic translation initiation factor eIF4A. Rna 9: 1180-1187

      Popp D, Narita A, Maeda K, Fujisawa T, Ghoshdastider U, Iwasa M, Maeda Y, Robinson RC (2010) Filament structure, organization, and dynamics in MreB sheets. The Journal of biological chemistry 285: 15858-15865

      Rhoads DB, Waters FB, Epstein W (1976) Cation transport in Escherichia coli. VIII. Potassium transport mutants. J Gen Physiol 67: 325-341

      Rodriguez-Navarro A (2000) Potassium transport in fungi and plants. Biochimica et biophysica acta 1469: 1-30

      Salje J, van den Ent F, de Boer P, Lowe J (2011) Direct membrane binding by bacterial actin MreB. Molecular cell 43: 478-487

      Schmidt-Nielsen B (1975) Comparative physiology of cellular ion and volume regulation. J Exp Zool 194: 207-219

      Szatmari D, Sarkany P, Kocsis B, Nagy T, Miseta A, Barko S, Longauer B, Robinson RC, Nyitrai M (2020) Intracellular ion concentrations and cation-dependent remodelling of bacterial MreB assemblies. Sci Rep-Uk 10

      van den Ent F, Izore T, Bharat TA, Johnson CM, Lowe J (2014) Bacterial actin MreB forms antiparallel double filaments. eLife 3: e02634

      Whatmore AM, Chudek JA, Reed RH (1990) The Effects of Osmotic Upshock on the Intracellular Solute Pools of Bacillus subtilis. Journal of general microbiology 136: 2527-2535

    1. Author Response:

      Reviewer #2 (Public Review):

      This work uses a throughput continuous culture system with simplified soil microbial communities to investigate how diversity-disturbance relationships (DDRs) change with different disturbance "intensities" (here, defined as mortality rate or dilution rate in a continuous system) and "frequencies" (here, defined as the number of dilution events that occur per day to achieve the desired mortality rate). Understanding the mechanisms that support different DDR is an ongoing and urgent need in ecology and ecosystem sciences because of the pressing need to predict and manage systems given climate and land-use disturbances.

      A major strength of the work is a blending of modeling and empirical approaches. It includes an ambitiously-designed study that uses a controlled, high-throughput microbial community experimental system to observe disturbance outcomes and uses those observations to build their proposed quantitative framework. The figures are informative and framework is explained clearly. The authors propose and name a new mechanism, "niche-flip" that describes resource competition at varying disturbance "intensities" - this is an interesting proposal and I suggest that it is explored more fully as a potential mechanism (see weaknesses).

      Weaknesses of the work are the use of definitions that are generally inconsistent with the disturbance ecology literature, and the inability to separate the disturbance event characteristic of "intensity" from the biological outcome of mortality. The authors conclude that DDRs are contextual, which is supported by their modeling and data, but I suggest that they consider that diversity as an outcome in itself may not be the most informative metric of what mechanism(s) drive context-specific outcomes. The authors have a lot of compositional data that could also be examined to understand whether their "niche-flip" mechanism is supported.

      This work is likely to advance our understanding of the myriad of outcomes of DDR and what potential mechanisms may support those DDR in natural ecosystems.

      Thank you for your kind words and careful review of our manuscript. We are pleased you appreciate both the experiments and the modeling work, and that you are intrigued by the findings and the niche flip mechanism.

      Major comments:

      Comment 1. Ecological definitions and interdependence of disturbance outcomes/attributes

      The authors define disturbance "intensity" as the average mortality rate but claim that this is a disturbance characteristic. However, mortality rate is not a characteristic of a disturbance event, but rather an effect/outcome of a disturbance on the biological community. The key distinction is that disturbance characteristics (also called traits or aspects) are defined relative to the environment, while disturbance outcomes (also called effects, impacts, or responses) are defined relative to the biology of interest, in this case a microbial community. So, changes in diversity of the community, as a result of a disturbance, is a biological outcome of the disturbance. An average mortality rate, what the authors call "intensity" (L40) would be such an outcome.

      Thank you for this excellent point. We have revised the introduction to make this distinction, reproduced here for convenience:

      "Accordingly, there have been many efforts aimed at understanding the role of environmental disturbances, which are perturbations to the state of an environment. These disturbances are of ecological interest for the impact they have on a community, for example, by bringing about mortality of organisms and a reduction of biomass of a community."

      The authors' definition of "intensity" is not in agreement with the disturbance ecology literature, including the references cited in this current work. For example, in reference #18 (Miller et al. 2011 PNAS) disturbance aspects include intensity, timing, duration, extent, and interval. Specifically, Miller et al. 2011 defined intensity as the magnitude of the disturbance (e.g., a flood's maximum stage). Notably, Miller's definition of intensity is more aligned with the author's definition of "fluctuation," which the authors define as the "magnitude of deviations from the average". In the current work, the disturbance "event" cannot be separated from the biological outcome because of the nature of the continuous culture system. The system is not being disturbed with, for example, a change in pH or salinity or another environmental variable that results in microbial mortality, but rather the loss of viable members from the community through control of the flow-through. So, the mortality is both the precisely controlled disturbance "event" and "outcome" in the continuous culture.

      To summarize, the premise of the article is confusing, because one of the two disturbance "characteristics" considered is, rather a disturbance outcome. This may seem like mincing words and to each paper its own definitions, but because this work seeks to reconcile DDRs as reported across many studies, and because many of the previous ecology studies that have investigated or reported DDRs are not using analogous terms, the work could further confusion rather than serve as a reconciliation. When different definitions are applied that mix disturbance aspects with biological outcomes of disturbance, readers will have to work hard to understand this work in context with the existing literature. I suggest revising the introductory section to be consistent in terminology with the ecology literature and to be framed not only as disturbance characteristics, but also outcomes. I also suggest adding discussion of how an inability to distinguish disturbance event from outcome may influence interpretation of this work and its broader application. I suggest adding clarification/discussion of "how intensity and fluctuations interact" (e.g. L200): as the authors define intensity and fluctuation of the disturbance event, intensity is not independent of the biological disturbance outcome of mortality in the given model system. So, how the two "disturbance components interact" is not able to be examined independently from the biological outcome (mortality, resulting diversity).

      These are also critical points. First, we will address the choice of terminology (re: Miller et al) and, second, the equivalence between disturbance and outcome in continuous culture.

      We agree that careful use of terminology is important for understanding our work in context of the literature. Accordingly, we have replaced our characteristics “intensity” and “fluctuation” with “mean intensity” and “frequency” throughout the paper. We have also added more examples through the results section to indicate how mean intensity, frequency, and maximum dilution rates (during disturbance events) are related.

      "To determine whether the effects of disturbance on diversity are truly fluctuation-dependent15, a disturbance should ideally be decomposed into distinct components of mean intensity (e.g. time-averaged disturbance magnitude) and frequency (e.g. temporal profile of fluctuations)."

      The direct connection between disturbance and mortality in a continuous culture system under dilution disturbances is a critical aspect of our experimental design, because we wanted to compare disturbance outcomes that varied in temporal features (in Miller et al terms, intensity/magnitude vs frequency/timing) while holding mortality equal. In continuous culture this may be achieved by controlling dilution rate and frequency, but you are correct that other classes of natural disturbances such as pH or salinity changes may have different effects on community members. As a first step towards investigating these effects, we had included analyses with non-equal mortality rates (Appendix figure 4). We have now edited the introduction and discussion to emphasize that the equivalence between disturbance event and disturbance outcome is a feature specific to continuous culture.

      Introduction

      "Dilution is perhaps the most common choice for a laboratory disturbance, as it causes species-independent mortality and replenishes the system with fresh nutrients, reminiscent of flow in soil, aquatic, or gut microbiomes. Unlike disturbances with indirect biological impacts (such as pH, temperature, or osmolarity disturbances), there is a direct link between the dilution disturbance event (removal of culture volume) and the biological outcome (mortality of community members)."

      Discussion

      "We also note however, that these types of disturbances do not share the direct link between environmental change and biological outcome that is characteristic of dilution disturbance, so the impact may be less clear."

      Comment 2: Compositional evidence for the proposed "niche flip" mechanism and suggestion for deeper consideration of population-level response to disturbance outcomes that collectively contribute to emergent diversity values.

      Regarding the "niche flip" - it is unclear whether there is compositional evidence for any swap in niche preference/space among particular community members. Figure S8 may offer evidence, but I could not deduce it from the busy bar charts. Could population/ASV level analysis be conducted on each member to assess their dynamics and ask whether the dynamics support the proposed niche-flip as a DDR mechanism?

      This is a very interesting suggestion. As suggested, we could extract the relative preferences of different ASVs from composition data to test a prediction about changes in the composition resulting from niche flip. To make such a prediction, we’d need the Monod growth parameters of the species on relevant resources. We began collecting this data (see Figure 3 – figure supplement 4) but found it challenging to measure these parameters on defined media sources. Furthermore, since we elected to run our main experiments in a complex media that could potentially support diverse communities (as opposed to minimal medias which produce simple communities, see Goldford et al Science 2018) we cannot link Monod growth parameters in this media to particular resources. Subsequent experiments with defined species with measured Monod parameters in defined media would enable us to make and test predictions. These are sizeable experiments that we do not believe are in the scope of the present work. Without a testable prediction, we do not believe species or ASV level analysis to be particularly informative on its own.

      Related, there seems to be possible evidence of a "fluctuation" rate threshold, after which there is a major compositional shift in the microbial community. Consider Figure 3: At all "intensities", there is a shift in microbial community composition between "fluctuation" rates of 4/day and 16/day (3d, Fig S8). This threshold/shift is not also apparent in the Shannon diversity in Fig 3f. This could be an example in which diversity as a metric in itself is not as informative/useful outcome for disturbance responses, as identical Shannon diversity values can result from different community compositions that are themselves the outcomes of different mechanisms. I see from the PCoAs (Fig S9) that the authors were exploring potential compositional clustering by day, frequency, and dilution - the most "obvious" clustering to the eye is indeed by "frequency" and between 4/day and 16/day (red/blue separation along both axes, which also supports a potential threshold/shift. Generally, it would have been good to report statistical tests (e.g., PERMANOVA or equivalent) for these PCoA categories (where it makes sense, nested and term interactions as well) - is there statistical support for compositional threshold shift between 4/16?

      Thank you for these suggestions. Indeed, by eye and by the PCoA plots, there seems to be a significant difference in composition that separate the low-frequency (1/day & 4/day) from the high-frequency (16/day & Constant) conditions. We calculated pairwise distances between Day 6 samples grouped by A) dilution frequency, B) mean dilution rate, or C) combinations of dilution rate and frequency. Using these distances to perform PERMANOVA tests, we find significant differences between cultures with different frequencies, but not for cultures with different dilution rates. For combinations, we found several pairs with differences that were significant only before correction for false-discovery rate. Distances between low-frequency (1/day & 4/day) conditions are much smaller than between low-frequency and high-frequency groups, or between the high-frequency groups. We have now included this as Figure 3 – figure supplement 9 and have summarized the results in the main text, reproduced below for convenience:

      "PERMANOVA statistical analysis of endpoint compositions confirmed that dilution frequency (but not mean dilution rate) had a significant effect on composition (Figure 3 – figure supplement 9). Despite separation between conditions in PCoA of endpoint compositions (Figure 3 – figure supplement 9), PERMANOVA analysis of dilution rate and frequency combinations did not yield significant values after correcting for false discovery rate."

      Reviewer #3 (Public Review):

      This manuscript focuses on the relationship between diversity and disturbance. The authors study this relationship in experimental microbial communities. These communities as subject to different levels of disturbance, which is identified as the dilution rate. The authors find a non-monotonic relationship between diversity and dilution rate. In presence of temporal fluctuations, the non-monotonic relationship becomes less evident, disappearing for strong enough fluctuations. The experimental findings are well explained by a consumer-resource model with Monod response.

      The results of the paper are a very interesting combination of experimental and theoretical work. The manuscript is well written and easy to follow.

      Experiments. The data support the main result of the paper. The U-shaped disturbance-diversity relationship (DDR) is robust (e.g., independent of the measure of diversity). The experimental setup is innovative.

      Theory. A main strength of the manuscript is the clarity in which the model reproduces the experimental data. It is also interesting that alternative models (Lotka-Volterra and consumer-resource with linear response) do not reproduce the data, therefore indicating the relevance of the data themselves. The main weakness of the paper is that, in the end, the mechanism behind the non-monotonicity of the DDR is not completely clear. The authors discuss how it emerges with two species and two resources in presence of a trade-off between maximal growth rate and resource-limited growth rate: at low dilution rate, the species with high maximal growth rate wins, while at high dilution rate the one with resource-limited growth rate dominates. This mechanism is clear with two species (in which diversity can transition between 2 and 1). It is unclear what happens for more species and resources. In particular, the role of the tradeoff --- which is central in the pairwise competition case --- is unclear: the U-shapes relationship is observed also in absence of the tradeoff for multispecies communities.

      Thank you for your enthusiasm about our work and your careful review of our manuscript. We are pleased you appreciate the concordance between experiment and model in our study.

    1. Author Response:

      Reviewer #1:

      The submitted manuscript 'Distinct higher-order representations of natural sounds in human and ferret auditory cortex' by Landemard and colleagues seeks to investigate the neural representations of sound in the ferret auditory cortex. Specifically, they examine the stages of processing via manipulating the complexity and sound structure of stimuli. The authors create synthetic auditory stimuli that are statistically equivalent to natural sounds in their cochlear representation, temporal modulation structure, spectral modulation structure, and spectro-temporal modulation structure. The authors use functional ultrasound imaging (fUS) which allowed for the measurement of the hemodynamic signal at much finer spatial scales than fMRI, making it particularly suitable for the ferret. The authors then compare their results to work done in humans that has previously been published (e.g. Norman-Haignere and McDermott, 2018) and find that: 1. While human non-primary auditory cortex demonstrates a significant difference between natural speech/music sounds and their synthetic counterparts, the ferret non-primary auditory cortex does not. 2. For each sound manipulation in humans, the dissimilarity increases as the distance from the primary auditory cortex increases, whereas for ferrets it does not. 3. While ferrets behaviorally respond to con-specific vocalizations, the ferret auditory cortex does not demonstrate the same hierarchical processing stream as humans do.

      Overall, I find the approach (especially the sound manipulations) excellent and the overall finding quite intriguing. My only concern, is that it is essentially a null-result. While this result will be useful to the literature, there is always the concern that a lack of finding could also be due to other factors.

      Thank you for taking the time to carefully read our manuscript. We have done our best to address all of your questions and concerns, which has improved the paper.

      We note that our finding differs from a typical null result in two ways. First, our key finding is that responses to natural and synthetic sounds are closely matched throughout primary and non-primary auditory cortex. Unlike a typical null result, this finding cannot be due to a noisy measure, since if our data were noisy, we would not have observed any correspondence between natural and synthetic sounds. Second, we have a clear prediction from humans as to what we should observe if the organization were similar: matched responses in primary auditory cortex and divergent responses in non-primary auditory cortex. Our data clearly demonstrate that this prediction is wrong, for all of the reasons noted in our general response above. In essence, what we are showing is that there is a region by species interaction in the similarity of responses to natural vs. synthetic sounds (as reflected by a significant difference in slopes between species, see our response above). We have investigated and ruled out all of the alternative explanations we can think of for this interaction (e.g. differences in SNR or spatial resolution) and are left with the conclusion that there is a meaningful difference in functional organization between humans and ferrets. If there are any additional concerns you have, we would be happy to address them.

      Major points:

      1) What if the stages in the ferret are wrong? The authors use 4 different manipulations thought to reflect key elements of sound structure and/or the relevant hierarchy of the processing stages of the auditory cortex, but it's possible that the dimensions in the ferret auditory cortex are along a different axis than spectro/temporal modulations. While I do not expect the authors to attempt every possible axis, it would be beneficial to discuss.

      Thank you for raising this question. We now directly address this question in the Discussion (page 11):

      "Our findings show that a prominent signature of hierarchical functional organization present in humans – preferential responses for natural vs. spectrotemporal structure – is largely absent in ferret auditory cortex. But this finding does not imply that there is no functional differentiation between primary and non-primary regions in ferrets. For example, ferret non-primary regions show longer latencies, greater spectral integration bandwidths, and stronger task-modulated responses compared with primary regions (Elgueda et al., 2019). The fact that we did not observe differences between primary and non-primary regions is not because the acoustic features manipulated are irrelevant to ferret auditory cortex, since our analysis shows that matching frequency and modulation statistics is sufficient to match the ferret cortical response, at least as measured by ultrasound. Indeed, if anything, it appears that modulation features are more relevant to the ferret auditory cortex since these features appear to drive responses throughout primary and non-primary regions, unlike human auditory cortex where we only observed strong, matched responses in primary regions."

      2) For the ferret vocalizations, it is possible that a greater N would allow for a clearer picture of whether or not the activation is greater than speech/music? While it is clear that any difference would be subtle and probably require a group analysis, this would help settle this result/issue (at least at the group level).

      Below we plot the distribution of NSE values for ferret vocalizations, speech, and music, averaged across all of auditory cortex and plotted separately for each ferret tested (panel A). As is evident, we observe larger NSE values for ferret vocalizations in one animal (p < 0.01, Wilcoxon test), but no difference in the other two (p > 0.55). When we perform a group analysis, averaging across all three animals, we do not observe any significant difference between the categories (panel B) (p = 0.27). Moreover, even for ferret vocalizations, NSE values were similar throughout primary and non-primary regions, and this was true in all three animals tested (panel C). Given these data, we do not believe our study provides evidence for a difference between ferret vocalizations and other categories. Panel A is plotted in the revised Figure 4 - figure supplement 1E. The distance-to-PAC curves (panel C) and the corresponding slopes are plotted in Figure 4D-E.

      Individual and group analyses of the difference between natural and spectrotemporally matched synthetic sounds, broken down by sound category. A, The NSE between natural and synthetic sounds plotted separately for each animal and sound category. NSE values have been averaged across all of auditory cortex. Each circle represents a single pair of natural/synthetic sounds. We find that the NSE values are larger for ferret vocalizations in Ferret A, but this effect is not present in Ferret T or C ( indicates p < 0.005, Wilcoxon test). B, NSE values averaged across animals. C, NSEs for ferret vocalizations, plotted as a function of distance to primary auditory cortex (PAC). Figure shows both individual subject (thin pink lines) and group-averaged data (thick pink line).

      Below, we have reproduced the relevant paragraph of the results where we discuss these and other related findings (page 6):

      "To directly test if ferrets showed preferential responses to natural vs. synthetic ferret vocalizations, we computed maps plotting the average difference between natural vs. synthetic sounds for different categories, using data from both Experiments I and II (Figure 4C). We also separately measured the NSE for sounds from different categories, again plotting NSE values as a function of distance to PAC (Figure 4D-E). The differences that we observed between natural and synthetic sounds were small and scattered throughout primary and non-primary auditory cortex, even for ferret vocalizations. In one animal, we observed significantly larger NSE values for ferret vocalizations compared with speech and music (Ferret A, Mdvoc = 0.137 vs MdSpM = 0.042, Wilcoxon rank-sum test: T = 1138, z = 3.29, p < 0.01). But this difference was not present in the other two ferrets tested (p > 0.55) and was also not present when we averaged NSE values across animals (Mdvoc = 0.053 vs MdSpM = 0.033, Wilcoxon rank- sum test: T = 1016, z = 1.49, p = 0.27). Moreover, the slope of the NSE vs. distance-to- PAC curve was near 0 for all animals and sound categories, even for ferret vocalizations, and was substantially lower than the slopes measured in all 12 human subjects (Figure 4F) (vocalizations in ferrets vs. speech in humans: p < 0.001 via a sign test; speech in ferrets vs. speech in humans: p < 0.001). In contrast, human cortical responses were substantially larger for natural vs. synthetic speech and music, and these response enhancements were concentrated in distinct non-primary regions (lateral for speech and anterior/posterior for music) and clearly different from those for other natural sounds (Figure 4C). Thus, ferrets do not show any of the neural signatures of higher-order sensitivity that we previously identified in humans (large effect size, spatially clustered responses, and a clear non-primary bias), even for con- specific vocalizations."

      3) Relatedly, did the magnitude of this effect increase outside the auditory cortex?

      We did not record outside of auditory cortex. Unlike fMRI, it is not easy to get whole-brain coverage using current fUS probes. Since our goal was to test if ferret auditory cortex showed similar organization as human auditory cortex, we focused our data collection on auditory regions. We have clarified this point in the Methods (page 13):

      "fUS data are collected as a series of 2D images or ‘slices’. Slices were collected in the coronal plane and were spaced 0.4 mm apart. The slice plane was varied across sessions in order to cover the region-of-interest which included both primary and non- primary regions of auditory cortex. We did not collect data from non-auditory regions due to limited time/coverage."

      4) It would be useful to have a measure of the noise floor for each plot and/or species for NSE analyses. This would make it easier to distinguish whether, for instance, in 2A-D, an NSE of 0.1 (human primary) vs. an NSE of 0.042 (ferret primary) should be interpreted as a bit more than double, or both close to the noise floor (which is what I presume).

      All of our NSE measures are noise-corrected such that the effective floor is zero (noise- correction provides an estimate of what the NSE value would be given perfectly reliable measurements). The only exception are cases where we plot the NSE values for example voxels/ROIs (Figure 2A-D, Figure 2 - figure supplement 1), in which case we plot both the raw NSE values along with the noise floor, which is given by the test-retest NSE of the measurements. To address your comment, we have included a supplemental plot (Figure 2 - figure supplement 3) that shows the median uncorrected NSE as a function of distance to primary auditory cortex, along with the noise floor given by the reliability of the measurements. The figure is reproduced below.

      Figure 2 - figure supplement 3. Uncorrected NSE values. This figure plots the uncorrected NSE between natural and synthetic sounds as a function of distance to primary auditory cortex (PAC). The test-retest NSE value, which provides a noise floor for the natural vs. synthetic NSE, is plotted below each set of curves using dashed lines. Each thin line corresponds to a single ferret (gray) or a single human subject (gold). Thick lines show the average across all subjects. Format is the same as Figure 2F.

      We have clarified this important detail in the Results (page 4):

      "We used the test-retest reliability of the responses to noise-correct the measured NSE values such that the effective noise floor given the reliability of the measurements is zero."

      Reviewer #2:

      Landemard et al. compare the response properties of primary vs. non-primary auditory cortex in ferrets with respect to natural and model-matched sounds, using functional ultrasound imaging. They find that responses do not differentiate between natural and model-matched sounds across ferret auditory cortex; in contrast, by drawing on previously published data in humans where Norman-Haignere & McDermott (2018) showed that non-primary (but not primary) auditory cortex differentiates between natural and model-matched sounds, the authors suggest that this is a defining distinction between human and non-human auditory cortex. The analyses are conducted well and I appreciate the authors including a wealth of results, also split up for individual subjects and hemispheres in supplementary figures, which helps the reader get a better idea of the underlying data.

      Overall, I think the authors have completed a very nice study and present interesting results that are applicable to the general neuroscience community. I think the manuscript could be improved by using different terminology ('sensitivity' as opposed to 'selectivity'), a larger subject pool (only 2 animals), and some more explanation with respect to data analysis choices.

      Many thanks for your thoughtful critiques and comments. We have attempted to address all of them, which has improved the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important paper exploits new cryo-EM tomography tools to examine the state of chromatin in situ. The experimental work is meticulously performed and convincing, with a vast amount of data collected. The main findings are interpreted by the authors to suggest that the majority of yeast nucleosomes lack a stable octameric conformation. Despite the possibly controversial nature of this report, it is our hope that such work will spark thought-provoking debate, and further the development of exciting new tools that can interrogate native chromatin shape and associated function in vivo.

      We thank the Editors and Reviewers for their thoughtful and helpful comments. We also appreciate the extraordinary amount of effort needed to assess both the lengthy manuscript and the previous reviews. Below, we provide our point-by-point response in bold blue font. Nearly all comments have been addressed in the revised manuscript. For a subset of comments that would require us to speculate, we have taken a conservative approach because we either lack key information or technical expertise: Instead of adding the speculative replies to the main text, we think it is better to leave them in the rebuttal for posterity. Readers will thereby have access to our speculation and know that we did not feel confident enough to include these thoughts in the Version of Record.

      Reviewer #1 (Public Review):

      This manuscript by Tan et al is using cryo-electron tomography to investigate the structure of yeast nucleosomes both ex vivo (nuclear lysates) and in situ (lamellae and cryosections). The sheer number of experiments and results are astounding and comparable with an entire PhD thesis. However, as is always the case, it is hard to prove that something is not there. In this case, canonical nucleosomes. In their path to find the nucleosomes, the authors also stumble over new insights into nucleosome arrangement that indicates that the positions of the histones is more flexible than previously believed.

      Please note that canonical nucleosomes are there in wild-type cells in situ, albeit rarer than what’s expected based on our HeLa cell analysis and especially the total number of yeast nucleosomes (canonical plus non-canonical). The negative result (absence of any canonical nucleosome classes in situ) was found in the histone-GFP mutants.

      Major strengths and weaknesses:

      Personally, I am not ready to agree with their conclusion that heterogenous non-canonical nucleosomes predominate in yeast cells, but this reviewer is not an expert in the field of nucleosomes and can't judge how well these results fit into previous results in the field. As a technological expert though, I think the authors have done everything possible to test that hypothesis with today's available methods. One can debate whether it is necessary to have 35 supplementary figures, but after working through them all, I see that the nature of the argument needs all that support, precisely because it is so hard to show what is not there. The massive amount of work that has gone into this manuscript and the state-of-the art nature of the technology should be warmly commended. I also think the authors have done a really great job with including all their results to the benefit of the scientific community. Yet, I am left with some questions and comments:

      Could the nucleosomes change into other shapes that were predetermined in situ? Could the authors expand on if there was a structure or two that was more common than the others of the classes they found? Or would this not have been found because of the template matching and later reference particle used?

      Our best guess (speculation) is that one of the class averages that is smaller than the canonical nucleosome contains one or more non-canonical nucleosome classes. However, we do not feel confident enough to single out any of these classes precisely because we do not yet know if they arise from one non-canonical nucleosome structure or from multiple – and therefore mis-classified – non-canonical nucleosome structures (potentially with other non-nucleosome complexes mixed in). We feel it is better to leave this discussion out of the manuscript, or risk sending the community on wild goose chases.

      Our template-matching workflow uses a low-enough cross-correlation threshold that any nucleosome-sized particle (plus minus a few nanometers) would be picked, which is why the number of hits is so large. So unless the noncanonical nucleosomes quadrupled in size or lost most of their histones, they should be grouped with one or more of the other 99 class averages (WT cells) or any of the 100 class averages (cells with GFP-tagged histones). As to whether the later reference particle could have prevented us from detecting one of the non-canonical nucleosome structures, we are unable to tell because we’d really have to know what an in situ non-canonical nucleosome looks like first.

      Could it simply be that the yeast nucleoplasm is differently structured than that of HeLa cells and it was harder to find nucleosomes by template matching in these cells? The authors argue against crowding in the discussion, but maybe it is just a nucleoplasm texture that side-tracks the programs?

      Presumably, the nucleoplasmic “side-tracking” texture would come from some molecules in the yeast nucleus. These molecules would be too small to visualize as discrete particles in the tomographic slices, but they would contribute textures that can be “seen” by the programs – in particular RELION, which does the discrimination between structural states. We are not sure what types of density textures would side-track RELION’s classification routines.

      The title of the paper is not well reflected in the main figures. The title of Figure 2 says "Canonical nucleosomes are rare in wild-type cells", but that is not shown/quantified in that figure. Rare is comparison to what? I suggest adding a comparative view from the HeLa cells, like the text does in lines 195-199. A measure of nucleosomes detected per volume nucleoplasm would also facilitate a comparison.

      Figure 2’s title is indeed unclear and does not align with the paper’s title and key conclusion. The rarity here is relative to the expected number of nucleosomes (canonical plus non-canonical). We have changed the title to:

      “Canonical nucleosomes are a minority of the expected total in wild-type cells”.

      We would prefer to leave the reference to HeLa cells to the main text instead of as a figure panel because the comparison is not straightforward for a graphical presentation. Instead, we now report the total number of nucleosomes estimated for this particular yeast tomogram (~7,600) versus the number of canonical nucleosomes classified (297; 594 if we assume we missed half of them). This information is in the revised figure legend:

      “In this tomogram, we estimate there are ~7,600 nucleosomes (see Methods on how the calculation is done), of which 297 are canonical structures. Accounting for the missing disc views, we estimate there are ~594 canonical nucleosomes in this cryolamella (< 8% the expected number of nucleosomes).”

      If the cell contains mostly non-canonical nucleosomes, are they really non-canonical? Maybe a change of language is required once this is somewhat sure (say, after line 303).

      This is an interesting semantic and philosophical point. From the yeast cell’s “perspective”, the canonical nucleosome structure would be the form that is in the majority. That being said, we do not know if there is one structure that is the majority. From the chromatin field’s point of view, the canonical nucleosome is the form that is most commonly seen in all the historical – and most contemporary – literature, namely something that resembles the crystal structure of Luger et al, 1997. Given these two lines of thinking, we added the following clarification as lines 312 – 316:

      “At present, we do not know what the non-canonical nucleosome structures are, meaning that we cannot even determine if one non-canonical structure is the majority. Until we know the non-canonical nucleosomes’ structures, we will use the term non-canonical to describe all the nucleosomes that do not have the canonical (crystal) structure.”

      The authors could explain more why they sometimes use conventional the 2D followed by 3D classification approach and sometimes "direct 3-D classification". Why, for example, do they do 2D followed by 3D in Figure S5A? This Figure could be considered a regular figure since it shows the main message of the paper.

      Since the classification of subtomograms in situ is still a work in progress, we felt it would be better to show one instance of 2-D classification for lysates and one for lamellae. While it is true that we could have presented direct 3-D classification for the entire paper, we anticipate that readers will be interested to see what the in situ 2-D class averages look like.

      The main message is that there are canonical nucleosomes in situ (at least in wild-type cells), but they are a minority. Therefore, the conventional classification for Figure S5A should not be a main figure because it does not show any canonical nucleosome class averages in situ.

      Figure 1: Why is there a gap in the middle of the nucleosome in panel B? The authors write that this is a higher resolution structure (18Å), but in the even higher resolution crystallography structure (3Å resolution), there is no gap in the middle.

      There is a lower concentration of amino acids at the middle in the disc view; unfortunately, the space-filling model in Figure 1A hides this feature. The gap exists in experimental cryo-EM density maps. See Author response image 1 for an example (pubmed.ncbi.nlm.nih.gov/29626188). The size of the gap depends on the contour level and probably the contrast mechanism, as the gap is less visible in the VPP subtomogram averages. To clarify this confusing phenomenon, we added the following lines to the figure legend:

      “The gap in the disc view of the nuclear-lysate-based average is due to the lower concentration of amino acids there, which is not visible in panel A due to space-filling rendering. This gap’s visibility may also depend on the contrast mechanism because it is not visible in the VPP averages.”

      Author response image 1.

      Reviewer #2 (Public Review):

      Nucleosome structures inside cells remain unclear. Tan et al. tackled this problem using cryo-ET and 3-D classification analysis of yeast cells. The authors found that the fraction of canonical nucleosomes in the cell could be less than 10% of total nucleosomes. The finding is consistent with the unstable property of yeast nucleosomes and the high proportion of the actively transcribed yeast genome. The authors made an important point in understanding chromatin structure in situ. Overall, the paper is well-written and informative to the chromatin/chromosome field.

      We thank Reviewer 2 for their positive assessment.

      Reviewer #3 (Public Review):

      Several labs in the 1970s published fundamental work revealing that almost all eukaryotes organize their DNA into repeating units called nucleosomes, which form the chromatin fiber. Decades of elegant biochemical and structural work indicated a primarily octameric organization of the nucleosome with 2 copies of each histone H2A, H2B, H3 and H4, wrapping 147bp of DNA in a left handed toroid, to which linker histone would bind.

      This was true for most species studied (except, yeast lack linker histone) and was recapitulated in stunning detail by in vitro reconstitutions by salt dialysis or chaperone-mediated assembly of nucleosomes. Thus, these landmark studies set the stage for an exploding number of papers on the topic of chromatin in the past 45 years.

      An emerging counterpoint to the prevailing idea of static particles is that nucleosomes are much more dynamic and can undergo spontaneous transformation. Such dynamics could arise from intrinsic instability due to DNA structural deformation, specific histone variants or their mutations, post-translational histone modifications which weaken the main contacts, protein partners, and predominantly, from active processes like ATP-dependent chromatin remodeling, transcription, repair and replication.

      This paper is important because it tests this idea whole-scale, applying novel cryo-EM tomography tools to examine the state of chromatin in yeast lysates or cryo-sections. The experimental work is meticulously performed, with vast amount of data collected. The main findings are interpreted by the authors to suggest that majority of yeast nucleosomes lack a stable octameric conformation. The findings are not surprising in that alternative conformations of nucleosomes might exist in vivo, but rather in the sheer scale of such particles reported, relative to the traditional form expected from decades of biochemical, biophysical and structural data. Thus, it is likely that this work will be perceived as controversial. Nonetheless, we believe these kinds of tools represent an important advance for in situ analysis of chromatin. We also think the field should have the opportunity to carefully evaluate the data and assess whether the claims are supported, or consider what additional experiments could be done to further test the conceptual claims made. It is our hope that such work will spark thought-provoking debate in a collegial fashion, and lead to the development of exciting new tools which can interrogate native chromatin shape in vivo. Most importantly, it will be critical to assess biological implications associated with more dynamic - or static forms- of nucleosomes, the associated chromatin fiber, and its three-dimensional organization, for nuclear or mitotic function.

      Thank you for putting our work in the context of the field’s trajectory. We hope our EMPIAR entry, which includes all the raw data used in this paper, will be useful for the community. As more labs (hopefully) upload their raw data and as image-processing continues to advance, the field will be able to revisit the question of non-canonical nucleosomes in budding yeast and other organisms. 

      Reviewer #1 (Recommendations For The Authors):

      The manuscript sometimes reads like a part of a series rather than a stand-alone paper. Be sure to spell out what needs to be known from previous work to read this article. The introduction is very EM-technique focused but could do with more nucleosome information.

      We have added a new paragraph that discusses the sources of structural variability to better prepare readers, as lines 50 – 59:

      “In the context of chromatin, nucleosomes are not discrete particles because sequential nucleosomes are connected by short stretches of linker DNA. Variation in linker DNA structure is a source of chromatin conformational heterogeneity (Collepardo-Guevara and Schlick, 2014). Recent cryo-EM studies show that nucleosomes can deviate from the canonical form in vitro, primarily in the structure of DNA near the entry/exit site (Bilokapic et al., 2018; Fukushima et al., 2022; Sato et al., 2021; Zhou et al., 2021). In addition to DNA structural variability, nucleosomes in vitro have small changes in histone conformations (Bilokapic et al., 2018). Larger-scale variations of DNA and histone structure are not compatible with high-resolution analysis and may have been missed in single-particle cryo-EM studies.”

      Line 165-6 "did not reveal a nucleosome class average in..". Add "canonical", since it otherwise suggests there were no nucleosomes.

      Thank you for catching this error. Corrected.

      Lines 177-182: Why are the disc views missed by the classification analysis? They should be there in the sample, as you say.

      We suspect that RELION 3 is misclassifying the disc-view canonical nucleosomes into the other classes. The RELION developers suspect that view-dependent misclassification arises from RELION 3’s 3-D CTF model. RELION 4 is reported to be less biased by the particles’ views. We have started testing RELION 4 but do not have anything concrete to report yet.

      Line 222: a GFP tag.

      Fixed.

      Line 382: "Note that the percentage .." I can't follow this sentence. Why would you need to know how many chromosome's worth of nucleosomes you are looking at to say the percentage of non-canonical nucleosomes?

      Thank you for noticing this confusing wording. The sentence has been both simplified and clarified as follows in lines 396 – 398:

      “Note that the percentage of canonical nucleosomes in lysates cannot be accurately estimated because we cannot determine how many nucleosomes in total are in each field of view.”

      Line 397: "We're not implying that..." Please add a sentence clearly stating what you DO mean with mobility for H2A/H2B.

      We have added the following clarifying sentence in lines 412 – 413:

      “We mean that H2A-H2B is attached to the rest of the nucleosome and can have small differences in orientation.”

      Line 428: repeated message from line 424. "in this figure, the blurring implies.."

      Redundant phrase removed.

      Line 439: "on a HeLa cell" - a single cell in the whole study?

      Yes, that study was done on a single cell.

      A general comment is that the authors could help the reader more by developing the figures and making them more pedagogical, a list of suggestions can be found below.

      Thank you for the suggestions. We have applied all of them to the specific figure callouts and to the other figures that could use similar clarification.

      Figure 2: Help the reader by avoiding abbreviations in the figure legend. VPP tomographic slice - spell out "Volta Phase Plate". Same with the term "remapped" (panel B) what does that mean?

      We spelled out Volta phase plate in full and explained “remapped” the additional figure legend text:

      “the class averages were oriented and positioned in the locations of their contributing subtomograms”.

      Supplementary figures:

      Figure S3: It is unclear what you mean with "two types of BY4741 nucleosomes". You then say that the canonical nucleosomes are shaded blue. So what color is then the non-canonical? All the greys? Some of them look just like random stuff, not nucleosomes.

      “Two types” is a typo and has been removed and “nucleosomes” has been replaced with “candidate nucleosome template-matching hits” to accurately reflect the particles used in classification.

      Figure S6: Top left says "3 tomograms (defocus)". I wonder if you meant to add the defocus range here. I have understood it like this is the same data as shown in Figure S5, which makes me wonder if this top cartoon should not be on top of that figure too (or exclusively there).

      To make Figures S6 (and S5) clearer, we have copied the top cartoon from Figure S6 to S5.

      Note that we corrected a typo for these figures (and the Table S7): the number of template-matched candidate nucleosomes should be 93,204, not 62,428.

      The description in the parentheses (defocus) is shorthand for defocus phase contrast and was not intended to also display a defocus range. All of the revised figure legends now report the meaning of both this shorthand and of the Volta phase plate (VPP).

      To help readers see the relationship between these two figures, we added the following clarifying text to the Figure S5 and S6 legends, respectively:

      “This workflow uses the same template-matched candidate nucleosomes as in Figure S6; see below.”

      “This workflow uses the same template-matched candidate nucleosomes as in Figure S5.”

      Figure S7: In the first panel, it is unclear why the featureless cylinder is shown as it is not used as a reference here. Rather, it could be put throughout where it was used and then put the simulated EM-map alone here. If left in, it should be stated in the legend that it was not used here.

      It would indeed be much clearer to show the featureless cylinder in all the other figures and leave the simulated nucleosome in this control figure. All figures are now updated. The figure legend was also updated as follows:

      “(A) A simulated EM map from a crystal structure of the nucleosome was used as the template-matching and 3-D classification reference.”

      Figure S18: Why are there classes where the GFP density is missing? Mention something about this in the figure legend.

      We have appended the following speculations to explain the “missing” GFP densities:

      “Some of the class averages are “missing” one or both expected GFP densities. The possible explanations include mobility of a subpopulation of GFPs or H2A-GFPs, incorrectly folded GFPs, or substitution of H2A for the variant histone H2A.Z.”

      Reviewer #2 (Recommendations For The Authors):

      My specific (rather minor) comments are the following:

      1) Abstract:

      yeast -> budding yeast.

      All three instances in the abstract have been replaced with “budding yeast”.

      It would be better to clarify what ex vivo means here.

      We have appended “(in nuclear lysates)” to explain the meaning of ex vivo.

      2) Some subtitles are unclear.

      e.g., "in wild-type lysates" -> "wild-type yeast lysates"

      Thank you for this suggestion. All unclear instances of subtitles and sample descriptions throughout the text have been corrected.

      3) Page 6, Line 113. "...which detects more canonical nucleosomes." A similar thing was already mentioned in the same paragraph and seems redundant.

      Thank you for noticing this redundant statement, which is now deleted.

      4) Page 25, Line 525. "However, crowding is an unlikely explanation..." Please note that many macromolecules (proteins, RNAs, polysaccharides, etc.) were lost during the nuclei isolation process.

      This is a good point. We have rewritten this paragraph to separate the discussion on technical versus biological effects of crowding, in lines 538 – 546:

      “Another hypothesis for the low numbers of detected canonical nucleosomes is that the nucleoplasm is too crowded, making the image processing infeasible. However, crowding is an unlikely technical limitation because we were able to detect canonical nucleosome class averages in our most-crowded nuclear lysates, which are so crowded that most nucleosomes are butted against others (Figures S15 and S16). Crowding may instead have biological contributions to the different subtomogram-analysis outcomes in cell nuclei and nuclear lysates. For example, the crowding from other nuclear constituents (proteins, RNAs, polysaccharides, etc.) may contribute to in situ nucleosome structure, but is lost during nucleus isolation.”

      5) Page 7, Line 126. "The subtomogram average..." Is there any explanation for this?

      Presumably, the longer linker DNA length corresponds to the ordered portion of the ~22 bp linker between consecutive nucleosomes, given the ~168 bp nucleosome repeat length. We have appended the following explanation as the concluding sentence, lines 137 – 140:

      “Because the nucleosome-repeat length of budding yeast chromatin is ~168 bp (Brogaard et al., 2012), this extra length of DNA may come from an ordered portion of the ~22 bp linker between adjacent nucleosomes.”

      6) "Histone GFP-tagging strategy" subsection:

      Since this subsection is a bit off the mainstream of the paper, it can be shortened and merged into the next one.

      We have merged the “Histone GFP-tagging strategy” and “GFP is detectable on nucleosome subtomogram averages ex vivo” subsections and shortened the text as much as possible. The new subsection is entitled “Histone GFP-tagging and visualization ex vivo”

      7) Page 16, Line 329. "Because all attempts to make H3- or H4-GFP "sole source" strains failed..." Is there a possible explanation here? Cytotoxic effect because of steric hindrance of nucleosomes?

      Yes, it is possible that the GFP tag is interfering with the nucleosomes interactions with its numerous partners. It is also possible that the histone-GFP fusions do not import and/or assemble efficiently enough to support a bare-minimum number of functional nucleosomes. Given that the phenotypic consequences of fusion tags is an underexplored topic and that we don’t have any data on the (dead) transformants, we would prefer to leave out the speculation about the cause of death in the attempted creation of “sole source” strains.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript investigates how humans store temporal sequences of tones in working memory. The authors mainly focus on a theory named "Language of thought" (LoT). Here the structure of a stimulus sequence can be stored in a tree structure that integrates the dependencies of a stimulus stored in working memory. To investigate the LoT hypothesis, participants listened to multiple stimulus sequences that varied in complexity (e.g., alternating tones vs. nearly random sequence). Simultaneously, the authors collected fMRI or MEG data to investigate the neuronal correlates of LoT complexity in working memory. Critical analysis was based on a deviant tone that violated the stored sequence structure. Deviant detection behavior and a bracketing task allowed a behavioral analysis.

      Results showed accurate bracketing and fast/correct responses when LoT complexity is low. fMRI data showed that LoT complexity correlated with the activation of 14 clusters. MEG data showed that LoT complexity correlated mainly with activation from 100-200 ms after stimulus onset. These and other analyses presented in the manuscript lead the authors to conclude that such tone sequences are represented in human memory using LoT in contrast to alternative representations that rely on distinct memory slot representations.

      Strengths

      The study provides a concise and easily accessible introduction. The task and stimuli are well described and allow a good understanding of what participants experience while their brain activation is recorded. Results are extensive as they include multiple behavioral investigations and brain activation data from two different measurement modalities. The presentation of the behavioral results is intuitive. The analysis provided a direct comparison of the LoT with an alternative model based on estimating a transition-probability measure of surprise.

      For the fMRI data, the whole brain analysis was accompanied by detailed region of interest analyses, including time course analysis, for the activation clusters correlated with LoT complexity. In addition, the activation clusters have been set in relation (overlap and region of interest analyses) to a math and a language localizer. For the MEG data, the authors investigated the LoT complexity effect based on linear regression, including an analysis that also included transitional probabilities and multivariate decoding analysis. The discussion of the results focused on comparing the activation patterns of the task with the localizer tasks. Overall, the authors have provided considerable new data in multiple modalities on a well-designed experiment investigating how humans represent sequences in auditory working memory.

      Weaknesses

      The primary issue of the manuscript is the missing formal description of the LoT model and alternatives, inconsistencies in the model comparisons, and no clear argumentation that would allow the reader to understand the selection of the alternative model. Similar to a recent paper by similar authors (Planton et al., 2021 PLOS Computational Biology), an explicit model comparison analysis would allow a much stronger conclusion. Also, these analyses would provide a more extensive evidence base for the favored LoT model. Needed would be a clear argumentation for why the transitional probabilities were identified as the most optimal alternative model for a critical test. A clear description of the models (e.g., how many free parameters) and a description of the simulation procedure (e.g., are they trained, etc.) Here it would be strongly advised to provide the scripts that allow others to reproduce the simulations.

      We thank the reviewer for the requests and critiques. Although this paper follows upon our extensive prior behavioral work (Planton et al.), we agree that it should stand alone and that therefore the models need to be described more fully. We have now added a formal description of the LoT in the subsection The Language of Thought for binary sequences in the Results section and have added a formal and verbal description of the selected sequences in Figure 1-figure supplement 1. Furthermore, we added a model comparison similar to the one done in (Planton et al., 2021 PLOS Computational Biology). This analysis is now included in Figure 2 and in the Behavioral data subsection of the Results section. It replicates previous behavioral results obtained in Planton et al., 2021 PLOS Computational Biology, namely that complexity, as measured by minimal description length in the binary version of the “language of geometry” was the best predictor of participants’ behaviour.

      Interestingly, we found that the model that considered both complexity and surprise had even lower AIC suggesting that statistical learning is simultaneously occurring in the brain (Brain signatures of a multiscale process of sequence learning in humans, M Maheu, S Dehaene, F Meyniel - eLife, 2019). In this respect, we do not consider surprise from transition probabilities as an alternative model but rather as a mechanism that is occurring in parallel to sequence compression. The main goal of this work was to determine how sequence processing was affected by sequence structure, captured by the language of thought. In this line, we didn't select the tested sequences in order to investigate statistical learning but, instead, chose them with similar global statistical properties.

      The MEG experiment provided us with the opportunity to separate temporally the contributions of statistical mechanisms from the ones of sequence compression according to the language of thought. Indeed, contrary to the fMRI experiment, we could model at the item level the statistical properties of individual sounds. We report the results when accounting jointly for statistical processing and LoT-complexity in Supplementary materials.

      The different models considered in previous work didn’t need to be trained. The sequence complexity they provided could be analytically computed based on sequence minimal description length.

      Furthermore, the manuscript needs a clear motivation for the type of sequences and some methodological decisions. Central here is the quadratic trend selectively used for the fMRI analysis but not for the other datasets.

      To design the MEG, we had to decrease the number of sequences from 10 to 7. We selected them based on the LoT-complexity and the type of sequence information they spanned. As a consequence, the predictors for linear and quadratic complexity are very correlated (82%). Unfortunately, due to low SNR, this doesn’t allow to robustly account for the contributions of quadratic complexity in the MEG-recorded brain signals. Still, in response to the referee, we performed a linear regression as a function of quadratic complexity on the residuals of the regression as function of statistics and complexity that we report here. No significant clusters were found for habituation and standard trials but two were found (corresponding to the same topography) for deviant trials for late time-points.

      In Author response image 1 regression coefficients for the quadratic complexity regressor regressed on the residuals of the surprise from transition probabilities and complexity. In Author response image 2, 2 significant clusters were found for the deviant sounds.

      We also averaged the decoding scores from Figure7.A over the time-window obtained from the temporal cluster-based permutation test (see Author response image 2). The choice of complexity values didn’t allow any clear assessment of the contribution of the quadratic complexity term.

      In summary, in the current design, we do not think that the number of tested sequences allows us to clearly conclude that no quadratic effect can be found for Habituation and Standard trials. We would need to re-design an experiment to test specifically the quadratic complexity contribution to brain signals in MEG.

      Author response image 1.

      Author response image 2.

      Also, the description of the linear mixed models is missing (e.g., the random effect structure, e.g., see Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious mixed models. arXiv preprint arXiv:1506.04967.). Moreover, sample sizes have not been justified by a power analysis.

      The linear mixed model that is considered in this work is very simple, it only uses Subject as a random variable. This is now stated clearly in the corresponding part in the Experimental procedures section:

      To test whether subject performance correlated with LoT complexity, we performed linear regressions on group-averaged data, as well linear mixed models including participant as the (only) random factor. The random effect structure of the mixed models was kept minimal, and did not include any random slopes, to avoid the convergence issues often encountered when attempting to fit more complex models.

    1. Author Respoinse

      Reviewer #2 (Public Review):

      In the results of Fig. 2, the proteins are emitted at distance epsilon from the cortical boundary. From there, they locally perform 1D diffusion to the boundary, so most of them would readsorb once they diffuse a distance epsilon. Only a small fraction would extend past epsilon, which I assume is why the concentration drops by orders of magnitude beyond epsilon. Is such a concentration drop realistic given typical numbers of proteins in cells?

      This is a good point. In [29], McInally et al. investigate kinesin-13 concentrations in Giardia and find that it drops sharply near the pole (about three to four orders of magnitude), as surmised by the referee. The drop off we see in our model is like what McInally et al see in terms of orders of magnitude decrease in the concentration gradient close to the pole.

      It should be clarified if the proposed size scaling is independent of the specific choice of the distance epsilon of the point of protein release from the anterior pole. I don't see any reason why this distance should increase with cell size as epsilon = 0.05 R (on page with equation 5). It's unclear if the size scaling of the concentration gradient might be dependent on the assumption epsilon ~ R.

      Figure R1 shows the dependence of the gradient on epsilon and see that the concentration gradient from the pole is unaffected everywhere beyond the source.

      Figure R1. Concentration gradient for cells with the source at different distances from the pole (ϵ) Concentration profiles with differing source points. We start very close to the pole and move further away. The radius of the sphere is 10 μm, the diffusion constant D=1 μm^2/s and the transport speed along the cortex is v=1μm/s.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes the role of PMd cck neurons in the invigoration of escape behavior (ie retreat from aversive stimuli located in a circumscribed area of the environment in which testing was conducted). Further, PMd cck neurons are shown to exert their effect on escape via the dorsal PAG. Finally, in an intriguing twist, aversive images are shown to increase the functional coupling between hypothalamus and PAG in the human brain.

      The manuscript is broadly interdisciplinary, spanning multiple subfields of neuroscience research from slice physiology to human brain imaging.

      We thank the Reviewer for recognizing the interdisciplinarity of our work.

      To understand the novelty of the results obtained in the rodent studies, it is important to note that these data are a replication and elaboration of work published recently in Neuron by the primary authors of this manuscript. The current manuscript does not cite the Neuron paper.

      We apologize for this omission. At the time of the current submission the Neuron paper had not been accepted and thus we could not cite it. We now discuss this paper in the introduction and highlight how the current manuscripts expand upon the data published in the Neuron paper.

      The most novel aspect of the rodent experiments presented in this manuscript is the demonstration of a role for cck PMd neurons in invigorating behavioral withdrawal from cues associated with the kind of artificial stimuli commonly used in laboratory settings (ie a grid floor associated with shock). Unfortunately, these results are made somewhat difficult to interpret by a lack of counterbalancing - all subjects receive an assay of escape from a predator prior to the shock floor assay. Certainly, research on stress and sensitization tells us that prior experience with aversive stimuli can influence the response to aversive stimuli encountered in the future. Because the role of this pMD circuitry in predatory escape has already been demonstrated, this counterbalancing issues does somewhat diminish the impact of the most novel rodent data presented here.

      Indeed, as the Reviewer states, prior exposure to aversive stimuli may influence responses to future exposures to threats. We opted to have the rat test before the shock grid test because the rat exposure is a milder experience than the shock grid test, as no actual pain occurs in the rat assay. We thus reasoned that the more intensely aversive assay (the shock assay) was more likely to influence behavior in the rat assay than vice-versa. Nevertheless, we agree with the Reviewer’s point that the lack of counterbalancing between the assays may mask potential influences of the rat assay on the shock grid assay behavior.

      To address this issue we ran a cohort of new mice, showing that behavior in the shock grid assay is not affected by prior experience in the rat assay. We now show in Figure R1 and Figure 1, figure supplement 2 that freezing, threat avoidance and escape metrics in the shock grid assay are not significantly changed by prior exposure to the rat assay.

      Figure R1. (Same as Figure 1, figure supplement 2). The order of threat exposure does not affect defensive behavior metrics. (A) Two cohorts of mice were exposed to the rat and shock grid threats in counterbalanced order, as specified in the yellow and green boxes. (B) The defensive behavioral metrics of these two cohorts were compared for the fear retrieval assay. None of the tested metrics were different between groups (Wilcoxon rank-sum test; each group, n=9 mice).

      The manuscript concludes with an fMRI experiment in which the BOLD response to aversive images is reported to covary across the hypothalamus and PAG. It is intriguing that unpleasant pictures influence BOLD in regions that might be expected to contain circuits homologous to those demonstrated in rodents. It is important to note that viewing images is passive for the subjects of this experiment, and the data include no behavioral analogue of the escape responses that are the focus of the rest of the manuscript.

      We agree with the Reviewer that there are many differences between the mouse and human behavioral tasks, and we have expanded the text highlighting these differences more clearly. One of our results, as highlighted by the Reviewer, is that inhibition of the PMd-dlPAG projection impairs escape from threats. Indeed, there is no escape in the human data, as stated by the Reviewer.

      Now, we conducted new dual photometry recordings, in which we simultaneously monitor calcium transients in the PMd and the dlPAG in contralateral sides. Using these dual recordings, we show that mutual information between the PMd and the dlPAG in mice is higher during exposure to threats (rat and shock grid fear retrieval) than control assays (toy rat and pre-shock habituation) (Figure R2 and Figure 9 and Figure 9, figure supplement 1). Importantly, this analysis was also performed after excluding all time points that include escapes. Thus, the increase in PMd-dlPAG mutual information is independent of escapes, and is related to exposure to threats.

      Similarly, the increase in activity in the human fMRI data in the hypothalamus-dlPAG pathway is also related to the exposure to aversive images, rather than specific defensive behaviors performed by the human subjects. This new finding of increased mutual information in the PMd-dlPAG circuit independently of escapes provides a better parallel to the human data.

      In Figure R2 below we used mutual information instead of correlation because mutual information can capture both linear and non-linear correlation between two time-series. Figure R2E-G shows that the projection from PMd-cck cells to dlPAG is unilateral. Thus, in dual photometry recordings that were done contralaterally in the PMd and the dlPAG, the signals from the dlPAG are from local cell bodies, and are not contaminated by GCaMP signals from PMd-cck axon terminals.

      Figure R2. (Panels from Figure 9(A-D) and from Figure 9, figure supplement 1 (panels E-G)) Dual fiber photometry signals from the PMd and dlPAG exhibit increased correlation and mutual information during threat exposure. (A) Scheme showing setup used to obtain dual fiber photometry recordings. (B) PMd-cck mice were injected with AAV9-Ef1a-DIO-GCaMP6s in the PMd and AAV9-syn-GCaMP6s in the dlPAG. (C) Expression of GCaMP6s in the PMd and dlPAG. (Scale bars: (left) 200 µm, (right) 150 µm) (D) Bars show the mutual information between the dual-recorded PMd and dlPAG signals, both including (left) and excluding (right) escape epochs, during exposure to threat and control. Mutual information is an information theory-derived metric reflecting the amount of information obtained for one variable by observing another variable. See Methods section for more details. (E) Cck-cre mice were injected with AAV9-Ef1a-DIO-YFP in the PMd in the left side. (F) Image shows the expression of YFP in PMd-cck cells in the left side. (scale bar: 200 µm) (G) PMd-cck axon terminals unilaterally express YFP in the dlPAG. (scale bar: 150µm). * p<0.05, ** p<0.01.

      Reviewer #2 (Public Review):

      The manuscript by Wang et al. addresses neuronal mechanisms underlying conserved escape behaviors. The study targets the midbrain periaqueductal grey, specifically the dorsolateral aspect (dlPAG), since previous research demonstrated that activation of dlPAG leads to escape behaviors in rodents and panic-related symptoms in humans. The hypothalamic dorsal premammillary nucleus (PMd) monosynaptically projects to the dlPAG and thus could play a role in escape behavior. The authors test whether cholecystokinin (CCK)-expressing PMd cells could be involved in escape behaviors from innate and conditioned threats using mainly two behavioral paradigms in mice: exposure to a live rat and electrical foot shocks.

      Different approaches are used to test the main hypothesis. Using fiber photometry and microendoscopy calcium imaging in freely moving mice, the study finds that PMd CCK+ neurons were more active when mice are close to threats and during escape behaviors. Furthermore, PMD CCK+ activation patterns predicted escape behavior in a general linearized model. Chemogenetic inhibition of CCK+ PMd cells decreased escape speed from threats in both behavioral paradigms, while optogenetic activation of those cells lead to an increase in speed. Observation of c-fos expression after optogenetic activation revealed activation within two target areas of the PMd, the dlPAG and anteromedial ventral thalamus (AMv), in which cellular activity measured by fiber photometry also increased during escape behaviors. Interestingly, inhibition of PMd-to-dlPAG pathway, but not PMd-to-AMv, caused a decrease in escape velocity. Lastly, the authors investigated the response of several human participants to threatening images in an fMRI scan. These results suggest that similar to mice, an activation proportional to the threat intensity within a functional connection between hypothalamus and PAG pathway may occur in humans.

      The authors conclude that a pathway from the PMd to the dlPAG, characterized by expression of CCK, control escape vigor and responsiveness to threat in mice, and that a similar pathway could be present in humans.

      Overall, the comprehensive data from multiple approaches support a role of the identified pathway in escape behavior. However, an insufficient description of the used methods and experimental details makes it difficult to assess the validity and conclusivity of some findings. In addition, the strong interpretation emphasis on the functional specificity of the CCK+ PMd-dlPAG pathway appears not fully supported by the data.

      1) The rationale for selection of CCK+ cells of the PMd is missing in the current manuscript. Despite methodological considerations, a clear description of these cells' role and characteristics from the existing literature is needed.

      To address this point, we justify our choice of cck+ cells by discussing prior data showing that PMd cck cells are the major neuronal population of the PMd. Furthermore, cck is not strongly expressed in other adjacent hypothalamic nuclei, showing the high anatomical specificity of our manipulations targeting PMd-cck cells. We also discuss prior data (Wang et al., 2021) in the Introduction and Discussion about these cells.

      2) The narrowness of the conclusions of the article is unnecessary. Although CCK+ PMd cells could play a role in regulating escape vigor, some of the presented results rather support the notion of a more general role of these cells in mediating defensive states. For example, the photometry data shows correlation of activity with other active defensive behavior. To address this point, a better analysis of the relation between neuronal activity and the general locomotor behavior of the animals is lacking. In addition, the already presented relation with the measured behaviors is not taken into account when interpreting the results (e.g. Fig 7 E). This description would be relevant to more comprehensively attributing functional roles for CCK + PMd cells.

      At the Reviewer's request, we have included an analysis of the relationship between general locomotor behavior and PMd-cck df/F (Figure R3 and Figure 2, figure supplement 2). Interestingly, we found that the df/F increases monotonically with increasing ranges of speed and acceleration in the threat assays, while remaining fairly constant for matched ranges in the control assays.

      We agree with the Reviewer that Figure 7E shows PMd-cck cells are activated not only during escape, but also other behaviors. However, the chemogenetic inhibition data show that PMd-cck cell activity only impaired escape speed, without altering freezing, approach or stretch-attend postures. Thus, the chemogenetic inhibition data indicates that the activity of these cells is only critical for escape, among the behaviors scored. Nevertheless, we discussed a “notion of a more general role of these cells in mediating defensive states” as suggested by Reviewer 2. However, Reviewer 1 provided the opposite feedback, stating that “It needs to be made clear that a specific role of PMd in quantitative measures of escape is the new result, instead of a broader role for this region”. Considering these opposing suggestions, we broadened the discussion on the role of the PMd, but did so conservatively.

      Figure R3. (Same as Figure 2, figure supplement 2). Bars show the mean PMd-cck df/F (z-scored) for increasing ranges of (A) speed and (B) acceleration. (Wilcoxon signed-rank test; n=15) * p<0.05, ** p<0.01, *** p<0.001.

      3) The imprecision of the methods description, especially the behavioral analysis is contributing to the previous point. In particular, the escape criterion itself seems to include a vague classification based on movement away from the threat- this should be more concretely defined (e.g. using angle of escape direction). In any case, the different behavioral context dimensions between the two paradigms would probably affect the escape criterion itself and thus have to be taken into account when interpreting the results.

      The Reviewer makes an important point that the escape definition included in the Methods section was lacking in detail, specifying only a minimum directional speed. We had neglected to include two crucial criteria that were used as well: a minimum distance-from-threat at which escape must be initiated and a minimum distance traversed during escape. All escapes were therefore required to begin near the threat and lead to a substantial increase in mouse distance from the threat. These details are now included in the Methods section, as follows:

      “'Escapes' were defined as epochs for which (1) the mouse speed away from the threat or control stimuli exceeded 2 cm/s for a minimum of 5 seconds continuously, (2) movement away from the threat was initiated at a maximum distance-from-threat of 30 cm and (3) the distance traversed from escape onset to offset was greater than 10 cm. Thus, escapes were required to begin near the threat and lead to a quick and substantial increase in distance from the threat.

      'Escape duration' was defined as the amount of time that elapsed from escape onset to escape offset.

      'Escape speed' was defined as the average speed from escape onset to offset.

      'Escape angle' was defined as the cosine of the mouse head direction in radians, such that the values ranged from -1 (facing towards the threat) to 1 (facing away from the threat). Mouse head direction was determined by the angle of the line connecting a point midway between the ears and the nose.”

      Using the escape definition above, a higher number of escapes and a higher average escape speed was observed in threat assays compared to control assays (Figure 1). This finding indicates that the definitions we used are capturing defensive evasion.

      Both contexts have a length of 70 cm, so differences in the length of the contexts did not influence the definition of escape across contexts.

      In response to the Reviewer's suggestion of an escape angle criterion, we have included Figure R4 which illustrates that, using the aforementioned escape definition, the resulting escape angle is quite stereotyped. The cosine of the escape angle shows very little variation, showing that only a narrow range of escape angles is used. Given this result, we opted to not include the angle of escape as part of the escape criteria to increase simplicity.

      Figure R4. (A) Lines represent mouse position for all escapes that occurred during an example rat (top) and fear retrieval (bottom) session. Note that, while there is a diversity of escape routes, the escape angle is quite similar. (B) (left) Diagram provides a description of the escape angle metric, here calculated as the cosine of the head direction in radians. A value of 1 indicates an escape parallel with the long walls of the enclosure. (right) Bars represent the mean escape angle for all animals in Figure 1 during the rat and fear retrieval assays (n=32). As is apparent in (A), the mean escape angle cosine has little variability.

      4) In line, more detailed descriptions of the animal's behavior are needed to support assessment of the results regarding the event-related fiber photometry results. Measures like frequency of escape, duration of freezing bouts and angle, duration and total speed of the escape bouts, and a better description of measures like Δ escape speed could be relevant for interpreting the results. In addition, there is no explanation of how the possible overlapping of behaviors in the broad time frame used in the experiments was regarded.

      We have now included the requested measures as a supplement to Figure 2 (see also Fig. R5 below). Regarding overlapping behaviors, we have quantified the overlap between categorized behaviors in the fiber photometry assay and found that only a small fraction of behavioral timepoints were categorized as more than one behavior, primarily during behavioral transitions. This is quantified in Figure R6 below. Moreover, as is now described in the Methods, the analyses presented in Figure 2G-I (as well as Figure 7C-E, 7G-I) were performed only on behaviors that were separated from all other behaviors by a minimum of 5 seconds.

      Figure R5. (Same as Figure 2, figure supplement 1) Behavioral metrics for the PMd fiber photometry cohort during threat exposure assays. (A) Diagram provides a description of the escape angle metric, here calculated as the cosine of the head direction in radians. A value of 1 indicates an escape parallel with the long walls of the enclosure. (B) Table shows pertinent defensive metrics during exposure to rat and fear retrieval assays for the PMd fiber photometry cohort. (n=15 mice).

      Figure R6. The behavioral overlap between classified behaviors is minimal. The colormap depicts the fraction of behavioral timepoints for each of the four classified behaviors that was categorized as each of the remaining behaviors across all PMd fiber photometry assays (n=15 mice).

      5) Part of the experimental results provide suboptimal evidence for the provided interpretation. That is, the lack of clear quantification and statistical analysis of the microendoscopy calcium imaging data on PMd-CCK+ cells makes it hard to reconcile this data with the photometry data. Furthermore, evidence through c-Fos staining after optogenetically stimulation of PMd-cck+ cells is insufficient evidence for the interpretation of broad, but functionally specific, recruitment of defensive networks. While the data on optogenetic inhibition of the PMd-CCK+ projection to the dlPAG seems to confirm the main hypothesis, both an intra-animal control and demonstration of statistical significance in the analysis are desirable to fully support that role.

      We agree with the Reviewer that clear quantification and statistical analyses are essential in interpreting the microendoscopic analysis. However, we are not sure what is being requested, as we have applied both of these approaches to this dataset. For instance, in Figure 3, we quantify the percentage of cells that significantly encode each behavior as well as implement 5-fold logistic regression to determine how well these behaviors can be predicted. This accuracy is statistically compared to chance. Further quantification and statistical comparisons of speed and position decoding accuracy between threat and control assays are included in Figure 4. Concerning the Arch experiments, we have included an intra-animal control by comparing light off and on epochs, and we statistically compare the difference between these epochs with a control group.

      Regarding the c-Fos experiment, we observe increased cfos expression in several nuclei known to be critical for defense, such as the bed nucleus of the stria terminalis and the ventromedial hypothalamus. This finding underlies our claim that optogenetic activation of the PMd recruits defensive networks. Nevertheless, it is entirely possible that naturalistic endogenous activation of the PMd does not recruit these nuclei. We added text addressing this caveat.

      6) The provided fMRI data only provides circumstantial evidence to support a functionally specific hypothalamus to PAG pathway especially due to the technical characteristics and limitations of the experimental setup and behavioral paradigm.

      The Reviewer makes an excellent point. Please see our response to Reviewer 1, point 6, where we provide a better parallel to the fMRI data in a new photometry analysis, as well as the added Figure 9.

      Briefly, we now have conducted contralateral dual photometry recordings of the dlPAG and the PMd, and show an increase in mutual information between the neural activity of these two regions during exposure to threats. This result was found after removing all timepoints with escapes. Thus, the increase in mutual information is related broadly to threat exposure, rather than caused by specific moments during which escape occurs. We argue that this result more closely parallels the human data, as both the fMRI and mutual information from mice data show an increase in functional connectivity in the hypothalamus-dlPAG pathway during threat exposure, independently of escapes.

      Reviewer #3 (Public Review):

      This manuscript by Wang et al extends the Adhikari lab's earlier findings of the hypothalamic dorsal premammillary nucleus' role in defensive behavior. Using cell-type specific calcium imaging, the authors show that the activity of CCK-expressing PMd neurons precedes and predicts escape from both learned and unlearned threats. Optogenetic/chemogenetic inhibition revealed that the PMd-dlPAG pathway contributes to escape vigor. Additionally, optogenetic activation of CCK PMd neurons induces Fos in numerous brain regions implicated in fear and escape behaviors. Last, an analogous hypothalamic-PAG pathway in humans is shown to be activated by aversive images in humans.

      Although these findings are potentially impactful, additional clarification and data are needed to strengthen and streamline the manuscript, as outlined below.

      1) The results of the authors' recent publications (Wang et al Neuron 2021, Reis et al J. Neuro 2021) should be integrated into the manuscript. For example, the rationale for selectively manipulating CCK+ PMd neurons is not stated. Likewise, histological validation that the Cre-dependent GCaMP expression is restricted to CCK+ neurons should be shown or referenced. The authors should also provide discussion as to how the current results integrate with their other recent findings.

      Following the Reviewer’s suggestions, we address these concerns by referencing our previous paper. Cck+ cells were chosen because this marker is expressed in over 90% of PMd cells (Wang et al., 2021), but not in adjacent nuclei (Mickelsen et al., 2020). These cells have also been shown to be important to control escape from innate threats, such as carbon dioxide (Wang et al., 2021). These are the justifications for selecting PMd-cck cells, as discussed in this revised submission. We also reference our prior work to indicate specific expression of GCaMP in PMd cck cells.

      2) The authors used male and female mice in their experiments but there are no analyses of potential sex differences in threat responses or escape vigor. Were there any significant sex differences in the measurements presented in Figure 1? A supplementary figure showing data for male and female mice would be helpful. Also, for Figure 1, please display the individual data points so that the reader can appreciate the variability in the behavioral responses. How many approaches and escapes are observed in each test? What is the average duration of a freezing bout?

      As the results reported in Figure 1 summarize data from a rather large cohort (n=32), we decided it best for clarity's sake to show the variability in behavioral responses as a histogram of the difference scores for each animal (threat - control), now included as Figure 1, Figure Supplement 1, as well as below (Figure R7). Showing 32 individual data points may make the figure difficult to visualize (but of course, we can instead plot these individual points if the Reviewer prefers that instead of the plots shown below). At the Reviewer's request, we have also included the number of approaches and escapes in Figure 1 and the supplement. The average duration of a freezing bout is 2.03s ± 0.15 and is now reported in the Results section. There were no significant sex differences in the Figure 1 measures, and this is stated in the text, as well as plotted below in Figure R8 (male n=17, female n=15; Wilcoxon rank-sum test, p>0.05).

      Figure R7. (Also Figure 1, figure supplement 1) Distribution of the difference scores for threat - control assays. Histograms depict the difference scores for all mice, threat - control, for each behavioral metric in Figure 1. The dotted red line indicates zero, or no difference between threat and control (n=32 mice).

      Figure R8 (Also Figure 1, Figure supplement 3). Distribution of the difference scores for threat - control assays for males and females. Histograms depict the difference scores for all mice, threat - control, for each behavioral metric in Figure 1, separately for males (green) and females (purple). The dotted red line indicates zero, or no difference between threat and control (male n=17, female n=15). No significant differences (p>0.05) were found between males and females in any of the metrics plotted.

      3) In Fig. 2, there appears to be sustained activity of CCK+ neurons after the onset of threat approach, and ramping activity preceding stretch-attend. In-depth analysis of these responses may be beyond the scope of this study, but the findings should be discussed since the representation of approach-related behaviors indicates the PMd is involved in more general representation of threat proximity, rather than simply escape vigor.

      We agree with the Reviewer that PMd-activity represents distance to both innate and conditioned threats. We also include new data showing that PMd-dlPAG mutual information increases in the presence of threats (Figure R2 and Figure 9). Taken together, these data show that PMd activity encodes more than just escape vigor. We have altered the text to emphasize these results. These dual-site recordings were done contralaterally, so that dlPAG-syn cell body GCaMP signals are not contaminated by GCaMP-expressing PMd-cck axon terminals in the dlPAG.

      4) The authors state that PMd CCK neuronal activity regulates escape vigor. Although the authors show a correlation of the calcium signal amplitude and escape distance in Fig. 2I, a correlation with escape velocity would be a more convincing measure of vigor.

      PMd-cck neural activity is related to escape speed, as shown by single cell miniaturized microscopy recordings. Figure 4D shows that PMd ensemble activity can predict escape speed from threats, but not control stimuli. These results were specific to escape, as PMd activity did not encode approach speed towards threats or control stimuli (Figure 4D). Furthermore, we performed new analysis and showed that a greater number of PMd cells show activity significantly correlated with escape from threats, compared to control stimuli. Finally, we have additionally shown that, for the cells whose activity is significantly correlated with escape speed, the mutual information between escape speed and df/F is significantly greater for threat than control. This has now been included as Figure 3I-K (same as Figure R9 below).

      Figure R9. A higher fraction of PMd-cck cells are correlated with escape speed during exposure to threats. (Also Figure 3I-J) (A) Traces show the z-scored df/F (blue) and speed (gray) for one cell classified as a speed cell in the rat exposure assay (top) and one non-correlated cell from the toy rat assay (bottom). Individual escape epochs are indicated by red boxes. (B) Bars show the percent of cells that significantly correlate with escape speed. (Fisher's exact test; toy rat: n correlated = 56, n non-correlated = 405; rat: n correlated = 100, n non-correlated = 366; pre-shock: n correlated = 50, n non-correlated = 571; fear retrieval: n correlated = 122, n non-correlated = 391) (C) Bars show the mutual information in bits between escape speed and calcium activity for cells whose signals were significantly correlated with escape speed in (J). (Wilcoxon rank sum test; toy rat n=56, rat n=100; pre-shock n=50, fear retrieval n=122). p<0.001.

      Unfortunately, the lower resolution provided by photometry did not reveal consistent correlations with escape velocity across assays. Despite this lack of single cell resolution, PMd-cck photometry amplitude correlated with escape velocity during exposure to the rat, but not the toy rat, as shown below (Figure R10). However, this result was not replicated in the fear retrieval assay. Taken together, these data show that PMd activity is indeed related to escape vigor.

      Figure R10. Escape speed correlates with PMd-cck photometry amplitude during rat exposure. Bars depict the Spearman r-value of escape speed and PMd-cck photometry df/F (z-scored) amplitude during exposure to rat and toy rat. (n=9 mice) p<0.001.

      5) The changes in prediction error from control to threat contexts in Figs. 4B and 4D are compelling, but the prediction error in the threat context seems high. Can the authors provide a basis for what constitutes a 'good' error score?

      We have now included the chance error, calculated by training and testing the GLM on circularly permuted data across mice and indicated below with a dotted red line in Figure 4 and its supplement. The Methods have also been updated to reflect this new aspect of the analysis. A ‘good’ error would be a value that is significantly lower than the error expected by chance, which is indicated by the red dashed line in Figure R11.

      Fig. R11. (Also Figure 4B, 4D and Figure 4, figure supplement 1) (A) Bars show the mean squared error (MSE) of the GLM-predicted location from the actual location. The MSE is significantly lower for threat than control assays (Wilcoxon signed-rank test; n=9 mice). The dotted red line indicates chance error, calculated by training and testing the GLM on circularly permuted data. Only threat assay error was significantly lower than chance (Wilcoxon signed-rank test; rat p<0.001, fear retrieval p=0.003). (B) Bars depict the MSE of the GLM-predicted velocity away from (left) and towards (right) the threat. The GLM more accurately decodes threat than control velocities for samples in which the mice move away from the threat (top). Only threat assay error was significantly lower than chance (Wilcoxon signed-rank test; rat p=0.004, fear retrieval p=0.012). (C) Bars depict the mean squared error of the GLM-predicted speed. The GLM more accurately decodes threat than control speeds. Only threat assay error was significantly lower than chance (rat p<0.020, fear retrieval p=0.040). (Wilcoxon signed-rank test; n=9 mice) p<0.01.

      6) Off-target effects are a potential concern at the dose of CNO used (5 mg/kg). For example, the increased approach speed with CNO in the YFP control group (Fig. 5D) may be a result of the high CNO dose. How was the dose of CNO selected?

      This dose was selected based on our prior experience using the same dose to study PMd-cck cells in our prior Neuron paper. Additionally, this is a common dose, used in many papers. Indeed, there are several recent neuroscience papers published in this journal, eLife, that use this exact dose of CNO (Chen et al., 2016; Halbout et al., 2019; Ito et al., 2020; Kwak and Jung, 2019; Li et al., 2020; Mukherjee et al., 2021; O’Hare et al., 2017; Patel et al., 2019).

      Although in this particular case approach velocity trended higher after CNO treatment, this is not a consistent result. We ran another cohort of control mice (n=9 saline, 9 CNO 5 mg/kg) and show that no such trend in approach velocity to the shock grid was observed during fear retrieval (Figure R12).

      Fig. R12. CNO has no effect on approach velocity in a separate control cohort. The experimental protocol was performed as described in Figure 1B for a control cohort. For this group, CNO injection had no significant effect on approach speed (Wilcoxon signed-rank test, n=9).

      7) Given the visible trends in the data, the number of animals used in Fig. 6B is insufficient to make conclusions about the behavioral effect of optogenetic excitation of PMd CCK neurons. Either more animals should be added, or the analysis should be limited to the Fos staining.

      At the Reviewer's request, we have increased the number of animals in this analysis and found the results unchanged. Figure 6B has been replaced in the main manuscript (same as Figure R13 below). The addition of these new animals also erased the previous non-significant trends seen with fewer animals.

      Figure R13. (Also Figure 6B) Delivery of blue light increases speed in PMd-cck ChR2 mice, but not stretch-attend postures or freeze bouts. (PMd-cck YFP n=6, PMd-cck ChR2 n=8; Wilcoxon rank-sum test).

    1. Author Response

      Reviewer #1 (Public Review):

      The actual description of the methods does not allow the reader to evaluate the precision of two important processing steps. First, rCBF measures are supposed to be restricted to the cortex, but given the pCASL image spatial resolution, partial volume effects with white matter probably exist, especially in younger infants. Furthermore, segmenting tissues on the basis of anatomical images (especially T1-weighted) is complicated in the first postnatal year. As rCBF measurements are very different between grey and white matter, the performed procedure might impact the measures at each age, or even lead to a systematic bias on age-dependent changes. Second, the methodology and accuracy of the brain registration across infants are little detailed whereas it is a challenging aspect given the intense brain growth and folding, the changing contrast in T1w images at these ages, and the importance of this step to perform reliable voxelwise comparison across ages.

      We thank the reviewer for this comment. We have added more descriptions in the methods to address this comment. Briefly, individual rCBF map was generated in the individual space and calibrated by phase contrast MRI to minimize the individual variations of processing parameters such as T1 of arterial blood (Aslan et al., 2010). Cortical segmentation was also conducted in individual space. Then different types of images including rCBF map and gray matter segmentation probability map in the individual space were normalized into the template space. An averaged gray matter probability map was generated after inter-subject normalization. After carefully testing multiple thresholds in the averaged gray matter probability maps, 40% probability minimizing the contamination of white matter and CSF while keeping the continuity of the cortical gray matter mask across the cerebral cortex was used to generate the binary gray matter mask shown on the left panel of Figure R1 below. Despite poor contrasts and poor cortical segmentation of T1-weighted images of younger infants rightfully pointed out by this reviewer, the poor cortical segmentation of younger infants was compensated by the averaged cortical mask and measurement of rCBF in the template space. As demonstrated in the right three panels in Figure R1, the rCBF measure in the cortical mask in the template space is consistent across ages for accurate and reliable voxelwise comparison across age.

      Figure R1. The gray matter mask and segmented cortical mask overlaid on rCBF map of three representative infants aged 3, 6, and 20 months in the template space. The gray matter mask on the left panel was created to minimize the contamination of white matter and CSF while keeping the continuity of the cortical gray matter mask across the cerebral cortex. The contour of the gray matter mask was highlighted with bule line.

      The authors achieved their aim in showing that the rCBF increase differs across brain regions (the DMN showing intense changes compared to the visual and sensorimotor networks). Nevertheless, an analysis of covariance (instead of an ANOVA) including the infants' age as covariate (in addition to the brain region) would have allowed them to evaluate the interaction between age and region (i.e. different slopes of age-related changes across regions) in a more rigorous manner. Regarding the evaluation of the coupling between physiological (rCBF) and functional connectivity measures, the results only partly support the authors' conclusion. Actually, both measures strongly depend on the infants' age, as the authors highlight in the first parts of the study. Thus, considering this common age dependency would be required to show that the physiological and connectivity measurements are specifically related and that there is indeed a coupling.

      We thank the reviewer for this comment. Following the reviewer’s suggestion, we conducted an analysis of covariance (ANCOVA) and found significant interaction between regions and age (F(6, 322) = 2.45, p < 0.05) with age as a covariate. This ANCOVA result is consistent with Figure 3c showing differential rCBF increase rates across brain regions. The ANCOVA result was added in the last paragraph in the Results section “Faster rCBF increases in the DMN hub regions during infant brain development”.

      Regarding the evaluation of the coupling between physiological (rCBF) and functional connectivity measures (FC), the Figure 5, Figure 5–figure supplement 1 and 2 were generated exactly to test that the FC-rCBF coupling specifically localized in the DMN is not due to mutual age dependency. Briefly, Figure 5B demonstrated significant correlation only clustered in the DMN regions using the correlation method demonstrated in Figure 5-figure supplement 1. Furthermore, nonparametric permutation tests with 10,000 permutations were conducted. Such permutation tests are sensitive and effective with Figure 5c revealing significant coupling only in the DMN regions. If coupling is related to mutual age dependency, Figure 5c would demonstrate significant coupling in Vis and SM network regions too.

    1. Author Response:

      Reviewer #2:

      In Zhang et al.'s paper, with 7T fMRI, they used different face parts as stimuli to explore the functional organization within the face specific areas, and found consistent patterns between different subjects in rFFA and rOFA. In these areas, the posterior region was biased to eye, and the anterior region was biased to mouth. To exclude potential confounds, they also ran several control experiments to show that the preference to eyes and mouth is not due to the eccentricity or upper-lower visual field preference. Based on what they found, they claim that there exists a finer scale functional organization within the face areas.

      In general, I think the whole study is carefully designed, and the results are solid and interesting. However, I am not very comfortable about the claim about the organization of the face areas. Typically, when we talk about the organization, it either has more than 2 subdivisions or it has a continuous representation of certain features. In this paper, the results are mainly about the comparison between two face parts, and they failed to find other distinctive subareas showing preference to other face parts. Therefore, I would suggest that the authors could tune down their claim from functional organization to functional preference.

      We have followed the advice from the reviewer to tune down the claim of functional organization in our manuscript. To emphasize both the functional preferences to different face parts within face-selective regions and the consistent spatial profile across different individuals, we now use “spatial tuning of face parts” in the manuscript.

      Reviewer #3:

      Zhang and colleagues investigated the spatial distribution of feature tuning for different face-parts within face-selective regions of human visual cortex using ultra-high resolution 7.0 T fMRI. By comparing the response patterns elicited by images of face-parts (hair, eyes, nose, mouth and chin) with whole faces, they report a spatial pattern of tuning for eyes and mouth along the posterior-anterior axis of both the pFFA and OFA. Within the pFFA this pattern spatial tuning appeared to track the orientation of the mid fusiform sulcus - an anatomical landmark for face-processing in ventral temporal cortex. Two additional control experiments are conducted to examine the robustness of the original findings and to rule out potentially confounding variables. These data are consistent with recent evidence for similar face-part tuning in the OFA and add to the growing body of work showing the topographical mapping feature based tuning within visual cortex.

      The conclusions of this paper are mostly supported by the data, but some aspects of the data acquisition, analysis and interpretation that require further clarification/consideration.

      1) It is currently unclear whether the current data are in full agreement with recent work (de Haas et al., 2021) showing similar face-part tuning within the OFA (or IOG) bilaterally. The current data suggest that feature tuning for eye and mouth parts progresses along the posterior-anterior axis within the right pFFA and right OFA. In this regard, the data are consistent. But de Haas and colleagues also demonstrated tuning for visual space that was spatially correlated (i.e. upper visual field representations overlapped upper face-part preferences and vice-versa). The current manuscript found little evidence for this correspondence within pFFA but does not report the data for OFA. For completeness this should be reported and any discrepancies with either the prior, or between OFA and pFFA discussed.

      In the current study, three participants had data from both retinotopic mapping and face part mapping experiments. Consistent and robust part clustering were found in the right pFFA and right OFA. Following the reviewer’s suggestion, we analyzed these data for the right OFA and found the spatial patterns of eyes vs. mouths are similar to the patterns of visual field sensitivity on the vertical direction (i.e., upper to lower visual field), which are consistent with de Haas and colleagues’ findings. Note that we used more precise functional localization of OFA, while de Haas et al’s analysis was based on anatomically defined IOG, for which OFA is a part of. We have added this result in the Results session (Page 16), and also added a supplemental Figure 4-figure supplement 1.

      2) It is somewhat challenging to fully interpret the responses to face-parts when they were presented at fixation and not in the typical visual field locations during real-world perception. For instance, we typically fixate faces either on or just below the eyes (Peterson et al., 2012) and so in the current experiment the eyes are in the typical viewing position, but the remainder of the face-parts are not (e.g. when fixating the eyes, the nose mouth and chin all fall in the lower visual field but in the current experimental paradigm they appear at fixation). Consideration of whether the reported face-part tuning would hold (or even be enhanced) if face-parts were presented in their typical locations should be included.

      Our early visual cortex and some of the object-selective visual areas are sensitive to visual field locations. To dissociate the visual field tuning and face part tuning in face processing regions, in the main experiment of the current study the face part stimuli were presented at fixation to avoid the potential confounding contribution from visual field location. The spatial correlation between face part tuning and visual field tuning has been observed in posterior part of the face network. It is unlikely that presenting the face parts at the fixation was responsible for the observed face part tuning. To directly test the role of stimulus location, we reanalyzed the data from control experiment 2 in which face parts were presented at their typical locations. Contrasting eyes above fixation vs. nose & mouth below fixation revealed similar anterior-posterior bias in the right pFFA, showing that the face part tuning in the right pFFA is invariant to the visual field location of stimuli. See comparison in the figure below, note that the maps of eyes on top vs. nose & mouth on bottom are unsmoothed:

      3) Although several experiments (including two controls) have been conducted, each one runs the risk of being underpowered (n ranges 3-10). One way to add reassurance when sample sizes are small is to include analyses of the reliability and replicability of the data within subjects through a split-half, or other cross-validation procedure. The main experiment here consisted of eight functional runs, which is more than sufficient for these types of analyses to be performed.

      Following the reviewer’s suggestion, we split the eight runs data from each participant in the main experiment into two data sets (odd-runs and even-runs), and estimated the eyes-mouth biases within each data set. Then we calculated the correlation coefficient between such biases across different voxels between the two data sets to estimate the reliability of the results in the right pFFA. The results demonstrate strong reliability of the data within participants. We have added these results in the Results session (Page 7 and Figure 2-figure supplement 1).

      4) The current findings were only present within the right pFFA and right OFA. Although right lateralisation of face-processing is mentioned in the discussion, this is only cursory. A more expansive discussion of what such a face-part tuning might mean for our understanding of face-processing is warranted, particularly given that the recent work by de Haas and colleagues was bilateral.

      The right lateralization of face-processing has been observed in face-selective network. Both the neural selectivity to faces (Kanwisher et al., 1997) and the decodable neural information of faces (Zhang et al., 2015) are higher in the right than in the left hemisphere. The neural clustering of face part tuning and consistent spatial patterns across individuals in the right rather than in the left face selective regions provides a potential computational advantage for right lateralization for face processing. The clustering of neurons with similar feature tuning have been found extensively in the ventral pathway, which may help to support a more efficient neural processing. Therefore, one of the neural mechanisms underlying the functional lateralization of face processing could be the existence of spatial clustering of face part tunings in the right hemisphere. We have added more discussion about the relevance between our results and lateralization of face processing.

    1. Author Response

      Reviewer #1 (Public Review):

      Briggs et al use a combination of mathematical modelling and experimental validation to tease apart the contributions of metabolic and electronic coupling to the pancreatic beta cell functional network. A number of recent studies have shown the existence of functional beta cell subpopulations, some of which are difficult to fully reconcile with established electrophysiological theory. More generally, the contribution of beta cell heterogeneity (metabolism, differentiation, proliferation, activity) to islet function cannot be explained by existing combined metabolic/electrical oscillator models. The present studies are thus timely in modelling the islet electrical (structural) and functional networks. Importantly, the authors show that metabolic coupling primarily drives the islet functional network, giving rise to beta cell subpopulations. The studies, however, do not diminish the critical role of electrical coupling in dictating glucose responsiveness, network extent as well as longer-range synchronization. As such, the studies show that islet structural and functional networks both act to drive islet activity, and that conclusions on the islet structural network should not be made using measures of the functional network (and vice versa).

      Strengths:

      • State-of-the-art multi-parameter modelling encompassing electrical and metabolic components.

      • Experimental validation using advanced FRAP imaging techniques, as well as Ca2+ data from relevant gap junction KO animals.

      • Well-balanced arguments that frame metabolic and electrical coupling as essential contributors to islet function.

      • Likely to change how the field models functional connectivity and beta cell heterogeneity.

      Weaknesses:

      • Limitations of FRAP and electrophysiological gap junction measures not considered.

      • Limitations of Cx36 (gap junction) KO animals not considered.

      • Accuracy of citations should be improved in a few cases.

      We thank reviewer 1 for their positive comments, including the many strengths in the approaches, arguments and impact. We do note the weaknesses raised by the reviewer and have addressed them following the comments below.

      We would like to also note that when we refer to metabolic activity driving the functional network, we are not referring to metabolic coupling between beta cells. Rather we mean that two cells that show either high levels of metabolic activity (glycolytic flux) or that show similar levels metabolic activity will show increased synchronization and thus a functional network edge as compares to cells with elevated gap junction conductance. Increased metabolic activity would likely generate increased depolarizing currents that will provide an increased coupling current to drive synchronization; whereas similar metabolic activity would mean a given coupling current could more readily drive synchronized activity. We have substantially rewritten the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In their present work, Briggs et al. combine biophysical simulations and experimental recordings of beta cell activity with analyses of functional network parameters to determine the role played by gap-junctional coupling, metabolism, and KATP conductance in defining the functional roles that the cells play in the functional networks, assess the structure-function relationship, and to resolve an important current open question in the field on the role of so-called hub cells in islets of Langerhans.

      Combining differential equation-based simulations on 1000 coupled cells with demanding calcium, NAPDH, and FRAP imaging, as well as with advanced network analyses, and then comparing the network metrics with simulated and experimentally determined properties is an achievement in its own right and a major methodological strength. The findings have the potential to help resolve the issue of the importance of hub cells in beta cell networks, and the methodological pipeline and data may prove invaluable for other researchers in the community.

      However, methodologically functional networks may be based on different types of calcium oscillations present in beta cells, i.e., fast oscillations produced by bursts of electrical activity, slow oscillations produced by metabolic/glycolytic oscillations, or a mixture of both. At present, the authors base the network analyses on fast oscillations only in the case of simulated traces and on a mixture of fast and slow oscillations in the case of experimental traces. Since different networks may depend on the studied beta cell properties to a different extent (e.g., fast oscillation-based networks may, more importantly, depend on electrical properties and slow oscillationbased networks may more strongly depend on metabolic properties), it is important that in drawing the conclusions the authors separately address the influence of a cell's electrical and metabolic properties on its functional role in the network based on fast oscillations, slow oscillations, or a mixture of both.

      We thank reviewer 2 for their positive comments, including addressing the importance of this study as it pertains to islet biology and acknowledging methodological complexities of this study. We also thank the reviewer for their careful reading and providing useful comments. We have integrated each comment into the manuscript. Most importantly, we have now extended our analysis to both fast and slow oscillations by incorporating an additional mathematical model of coupled slow oscillations and performing additional experimental analysis of fast, slow, and mixed oscillations.

      Reviewer #3 (Public Review):

      Over the past decade, novel approaches to understanding beta cell connectivity and how that contributes to the overall function of the pancreatic islet have emerged. The application of network theory to beta cell connectivity has been an extremely useful tool to understand functional hierarchies amongst beta cells within an islet. This helps to provide functional relevance to observations from structural and gene expression data that beta cells are not all identical.

      There are a number of "controversies" in this field that have arisen from the mathematical and subsequent experimental identification of beta "hub" cells. These are small populations of beta cells that are very highly connected to other beta cells, as assessed by applying correlation statistics to individual beta cell calcium traces across the islet.

      In this paper Briggs et al set out to answer the following areas of debate:

      They use computational datasets, based on established models of beta cells acting in concert (electrically coupled) within an islet-like structure, to show that it is similarities in metabolic parameters rather than "structural" connections (ie proximity which subserves gap junction coupling) that drives functional network behaviour. Whilst the computational models are quite relevant, the fact that the parameters (eg connectivity coefficients) are quite different to what is measured experimentally, confirm the limitations of this model. Therefore it was important for the authors to back up this finding by performing both calcium and metabolic imaging of islet beta cells. These experimental data are reported to confirm that metabolic coupling was more strongly related to functional connectivity than gap junction coupling. However, a limitation here is that the metabolic imaging data confirmed a strong link between disconnected beta cells and low metabolic coupling but did not robustly show the opposite. Similarly, I was not convinced that the FRAP studies, which indirectly measured GJ ("structural") connections were powered well enough to be related to measures of beta cell connectivity.

      The group goes on to provide further analytical and experimental data with a model of increasing loss of GJ connectivity (by calcium imaging islets from WT, heterozygous (50% GJ loss), and homozygous (100% loss). Given the former conclusion that it was metabolic not GJ connectivity that drives small world network behaviour, it was surprising to see such a great effect on the loss of hubs in the homs. That said, the analytical approaches in this model did help the authors confirm that the loss of gap junctions does not alter the preferential existence of beta cell connectivity and confirms the important contribution of metabolic "coupling". One perhaps can therefore conclude that there are two types of network behaviour in an islet (maybe more) and the field should move towards an understanding of overlapping network communities as has been done in brain networks.

      Overall this is an extremely well-written paper which was a pleasure to read. This group has neatly and expertly provided both computational and experimental data to support the notion that it is metabolic but not "structural" ie GJ coupling that drives our observations of hubs and functional connectivity. However, there is still much work to do to understand whether this metabolic coupling is just a random epiphenomenon or somehow fated, the extent to which other elements of "structural" coupling - ie the presence of other endocrine cell types, the spatial distribution of paracrine hormone receptors, blood vessels and nerve terminals are also important.

      We thank reviewer 3 for their positive comments, including the methodology, writing style, and the importance of this paper to the broader islet community. We thank the reviewer for their very in-depth and helpful comments. We have addressed each comment below and made significant changes to the manuscript according. We conducted more FRAP experiments and separated results into slow, fast, and mixed oscillations. We included analysis of an additional computational model that simulates slow calcium oscillations. Additionally, we substantially rewrote the paper to clarify that we are not referring to metabolic coupling and speak on the broader implications of network theory and our findings.

      Reviewer #4 (Public Review):

      This manuscript describes a complex, highly ambitious set of modeling and experimental studies that appear designed to compare the structural and functional properties of beta cell subpopulations within the islet network in terms of their influence on network synchronization. The authors conclude that the most functionally coupled cell subpopulations in the islet network are not those that are most structurally coupled via gap junctions but those that are most metabolically active.

      Strengths of the paper include (1) its use of an interdisciplinary collection of methods including computer simulations, FRAP to monitor functional coupling by gap junctions, the monitoring of Ca2+ oscillations in single beta cells embedded in the network, and the use of sophisticated approaches from probability theory. Most of these methods have been used and validated previously. Unfortunately, however, it was not clear what the underlying premise of the paper actually is, despite many stated intentions, nor what about it is new compared to previous studies, an additional weakness.

      Although the authors state that they are trying to answer 3 critical questions, it was not clear how important these questions are in terms of significance for the field. For example, they state that a major controversy in the field is whether network structure or network function mediates functional synchronization of beta cells within the islet. However, this question is not much debated. As an example, while it is known that there can be long-range functional coupling in islets, no workers in the field believe there is a physical structure within islets that mediates this, unlike the case for CNS neurons that are known to have long projections onto other neurons. Beta cells within the islets are locally coupled via gap junctions, as stated repeatedly by the authors but these mediate short-range coupling. Thus, there are clearly functional correlations over long ranges but no structures, only correlated activity. This weakness raises questions about the overall significance of the work, especially as it seems to reiterate ideas presented previously.

      We thank reviewer 4 for their positive comments, including our multidisciplinary use of mathematical models and experimental imaging techniques. We have now included an additional model of slow oscillations (the Integrated Oscillator Model) to improve our conclusions. We also thank reviewer 4 for the insightful comments. We have carefully reviewed each comment and made significant changes to the manuscript accordingly. In particular, we have significantly rewritten the introduction and discussion attempting to clarify what is new in our manuscript and what is previously shown. Additionally, we agree with the reviewers’ sentiment that there is little debate over whether, for example, there are physical structures within the islet that mediate long-range functional connections. However, there is current debate over whether functional beta-cell subpopulations can dictate islet dynamics (see [11]–[13]). This debate can be framed by observing whether these functional subpopulations emerge from the islet due to physical connections (structural network) or something more nuisance (such as intrinsic dynamics). We have reframed the introduction and discussion to clarify this debate as well as more clearly state the premise of the paper.

      Specific Comments

      1). The authors state it is well accepted that the disruption of gap junctional coupling is a pathophysiological characteristic of diabetes, but this is not an opinion widely accepted by the field, although it has been proposed. The authors should scale back on such generalizations, or provide more compelling evidence to support such a claim.

      Thank you for pointing this out, we have provided more specific citations and changes the wording from “well accepted” to “has been documented”. See Discussion page 13 lines 415-416.

      2) The paper relies heavily on simulations performed using a version of the model of Cha et al (2011). While this is a reasonable model of fast bursting (e.g. oscillations having periods <1 min.), the Ca2+ oscillations that were recorded by the authors and shown in Fig. 2b of the manuscript are slow oscillations with periods of 5 min and not <1 min, which is a weakness of the model in the current context. Furthermore, the model outputs that are shown lack the well-known characteristics seen in real islets, such as fast-spiking occurring on prolonged plateaus, again as can be seen by comparing the simulated oscillations shown in Fig. 1d with those in Fig. 2b. It is recommended that the simulations be repeated using a more appropriate model of slow oscillations or at least using the model of Cha et al but employed to simulate in slower bursting.

      The reviewer raises an important point and caveat associated with our simulated model and experimental data. This point was also made by other reviewers, and a similar response to this comment can be found elsewhere in response to reviewer 2 point 6. To address this comment, we have performed several additional experiments and analyses:

      1) We collected additional Ca2+ (to identify the functional network and hubs) and FRAP data (to assess gap junction permeability) in islets which show either pure slow, pure fast, or mixed oscillations. We generated networks based on each time scale to compare with FRAP gap junction permeability data. We found that the conclusions of our first draft to be consistent across all oscillation types. There was no relationship between gap junction conductance, as approximated using FRAP, and normalized degree for slow (Figure 3j), fast (Figure 3 Supp 1d,e), or mixed (Figure 3 Supp 1g,h) oscillations. We also include discussion of these conclusions - See Results page 7 lines 184-186 and lines 188-191, Discussion page 12 lines 357-360.

      2) We also performed additional simulations with a coupled ‘Integrated Oscillator Model’ which shows slow oscillations because of metabolic oscillations (Figure 2). We compared connectivity with gap junction coupling and underlying cell parameters. In this case, there is an association between functional and structural networks, with highly-connected hub cells showing higher gap junction conductance (Figure 2f) but also low KATP channel conductance (gKATP) (Figure 2e). However, there are some caveats to these findings – given the nature of the IOM model, we were limited to simulating smaller islets (260 cells) and less heterogeneity in the calcium traces was observed. Additional analysis suggests the greater association between functional and structural networks in this model was a result of the smaller islets, and the association was also dependent on threshold (unlike in the Cha-Noma fast oscillator model) robust. These limitations and results are discussed further (Discussion page 11 lines 344-354).

      Additionally, in the IOM, the underlying cell dynamics of highly-connected hub cells are differentiated by KATP channel conductance (gKATP), which is different than in the fast oscillator model (differentiated by metabolism, kglyc). However this difference between models can be linked to differences in the way duty cycle is influenced by gKATP and kglyc (Figure 1h, Figure 2g). In each model there was a similar association between duty cycle and highly-connected hub cells. We also discuss these findings (Discussion page 11 lines 334-343).

      Overall these results and discussion with respect to the coupled IOM oscillator model can be found in Figure 2, Results page 6 lines 128-156 and Discussion page 11 lines 332-354.

      3) Much of the data analyzed whether obtained via simulation or through experiment seems to produce very small differences in the actual numbers obtained, as can be seen in the bar graphs shown in Figs. 1e,g for example (obtained from simulations), or Fig. 2j (obtained from experimental measurements). The authors should comment as to why such small differences are often seen as a result of their analyses throughout the manuscript and why also in many cases the observed variance is high. Related to the data shown, very few dots are shown in Figs. 1eg or Fig 4e and 4h even though these points were derived from simulations where 100s of runs could be carried out and many more points obtained for plotting. These are weaknesses unless specific and convincing explanations are provided.

      We thank the reviewer for these comments, which are similar to those of reviewer 2 (point 4) and reviewer 3 (point 6). Indeed there is some variability between cells in both simulations and experiments related to the metabolic activity in hubs and non-hubs. The variability points to potentially other factors being involved in determining hubs beyond simply kglyc, including a minor role for gap junction coupling structural network and potentially cell position and other intrinsic factors. We now discuss this point – see Discussion page 12 lines 364-266.

      The differences between hubs and nonhubs appear small because the value of kglyc is very small. For figure 1e, the average kglyc for nonhubs was 1.26x10-4 s-1 (which is the average of the distribution because most cells are non hubs) while the average kglyc for hubs was 1.4x10-4 s-1 which is about half of a standard deviation higher. The paired t-test controls for the small value of average kglyc.

      For simulation data each of the 5 dots corresponds to a simulated islet averaged over 1000 cells (or 260 cells for coupled IOM). The computational resources are high to generate such data so it is not feasible to conduct 100s of runs. Again, we note the comparisons between hubs and non-hubs are paired, and we find statistically significant differences for kglyc in figure 1 using only 5 paired data points. That we find these differences indicates the substantial difference between hubs and non-hubs. This is further supported all effect sizes being much greater than 0.8 for all significantly different findings (Cha Noma - kglyc: 2.85, gcoup: 0.82) (IOM: gKATP: 1.27, gcoup: 2.94) – We have included these effect sizes in the captions see Figure 1 and 2 captions (pages 34, 36)

      To consider all of the available data rather than the average across an entire islet, we created a kernel density estimate the kglyc for hubs and nonhubs created by concatenating every single cell in each of the five islets. A kstest results in a highly significant difference (P<0.0001) between these two distributions.

      Author response image 1.

      4) The data shown in Fig. 4i,j are intended to compare long-range synchronization at different distances along a string of coupled cells but the difference between the synchronized and unsynchronized cells for gcoup and Kglyc was subtle, very much so.

      Thank you for pointing out these subtle differences. The y-axis scale for i and j is broad to allow us to represent all distances on a single plot. After correction for multiple comparison, the differences were still statistically significant. As the reviewer mentioned in point 3, each plot contains only five data points, each of which represent the average of a single simulated islet, therefore we are not concerned about statistical significance coming from too large of a sample size. We also checked the differences between synchronized and nonsynchronized cell pairs in figure 4 panels e and h (now figure 5 e, h). These are the same data as i and j but normalized such that all of the distances could be averaged together. We again found statistical significance between synchronized and non-synchronized cell pairs. As can be seen in Author response image 2 the difference between synchronized and non-synchronized cell pairs is greater than the variability between simulated islets. Thus, in this case the variability is not substantial.

      Author response image 2.

      5) The data shown in Fig. 5 for Cx36 knockout islets are used to assess the influence of gap junctional coupling, which is reasonable, but it would be reassuring to know that loss of this gene has no effects on the expression of other genes in the beta cell, especially genes involved with glucose metabolism.

      This is an important point. Previous studies have assessed that no significant change in NAD(P)H is observed in Cx36 deficient islets – see Benninger et al J.Physiol 2011 [14]. Islet architecture is also retained. Further the insulin secretory response of dissociated Cx36 knockout beta cells is the same as that of dissociated wildtype beta cells, further indicating no significant defect in the intrinsic ability of the beta cell to release insulin – see Benninger et al J.Physiol 2011 [14]. We now Mention these findings in the discussion. See Discussion page 14 lines 459-464.

      6) In many places throughout the paper, it is difficult to ascertain whether what is being shown is new vs. what has been shown previously in other studies. The paper would thus benefit strongly from added text highlighting the novelty here and not just restating what is known, for instance, that islets can exhibit small-world network properties. This detracts from the strengths of the paper and further makes it difficult to wade through. Even the finding here that metabolic characteristics of the beta cells can infer profound and influential functional coupling is not new, as the authors proposed as much many years ago. Again, this makes it difficult to distill what is new compared to what is mainly just being confirmed here, albeit using different methods.

      Thank you for the suggestion, we have made significant modifications throughout the Introduction, Discussion and Results to be clearer about what is known from previous work and what is newly found in this manuscript.

      Reviewer #5 (Public Review):

      The authors use state-of-the-art computation, experiment, and current network analysis to try and disaggregate the impact of cellular metabolism driving cellular excitability and structural electrical connections through gap junctions on islet synchronization. They perform interesting simulations with a sophisticated mathematical model and compare them with closely associated experiments. This close association is impressive and is an excellent example of using mathematics to inform experiments and experimental results. The current conclusions, however, appear beyond the results presented. The use of functional connectivity is based on correlated calcium traces but is largely without an understood biophysical mechanism. This work aims to clarify such a mechanism between metabolism and structural connection and comes out on the side of metabolism driving the functional connectivity, but both are required and more nuanced conclusions should be drawn.

      We thank reviewer 5 for their positive comments, including our multifaceted experimental and computational techniques. We also found the reviewers careful reading and thoughtful comments to be very helpful and we have worked to integrate each comment into our manuscript. It is evident from the reviewer comments that we did not clearly explain what was meant by our conclusions concerning the functional network reflecting metabolism rather than gap junctions. We have conducted significant rewriting to show that we are not concluding that communication (metabolic or electric) occurs due to conduits other than gap junctions. Rather, our data suggest that the functional network (which reflects calcium synchronization) reflects intrinsic dynamics of the cells, which include metabolic rates, more than individual gap junction connections.

      References referred to in this response to reviewers document:

      [1] A. Stožer et al., “Functional connectivity in islets of Langerhans from mouse pancreas tissue slices,” PLoS Comput Biol, vol. 9, no. 2, p. e1002923, 2013.

      [2] N. L. Farnsworth, A. Hemmati, M. Pozzoli, and R. K. Benninger, “Fluorescence recovery after photobleaching reveals regulation and distribution of connexin36 gap junction coupling within mouse islets of Langerhans,” The Journal of physiology, vol. 592, no. 20, pp. 4431–4446, 2014.

      [3] C.-L. Lei, J. A. Kellard, M. Hara, J. D. Johnson, B. Rodriguez, and L. J. Briant, “Beta-cell hubs maintain Ca2+ oscillations in human and mouse islet simulations,” Islets, vol. 10, no. 4, pp. 151–167, 2018.

      [4] N. R. Johnston et al., “Beta cell hubs dictate pancreatic islet responses to glucose,” Cell metabolism, vol. 24, no. 3, pp. 389–401, 2016.

      [5] V. Kravets et al., “Functional architecture of pancreatic islets identifies a population of first responder cells that drive the first-phase calcium response,” PLoS Biology, vol. 20, no. 9, p. e3001761, 2022.

      [6] H. Ren et al., “Pancreatic α and β cells are globally phase-locked,” Nature Communications, vol. 13, no. 1, p. 3721, 2022.

      [7] A. Stožer et al., “From Isles of Königsberg to Islets of Langerhans: Examining the function of the endocrine pancreas through network science,” Frontiers in Endocrinology, vol. 13, p. 922640, 2022.

      [8] J. Zmazek et al., “Assessing different temporal scales of calcium dynamics in networks of beta cell populations,” Frontiers in physiology, vol. 12, p. 337, 2021.

      [9] M. E. Corezola do Amaral et al., “Caloric restriction recovers impaired β-cell-β-cell gap junction coupling, calcium oscillation coordination, and insulin secretion in prediabetic mice,” American Journal of Physiology-Endocrinology and Metabolism, vol. 319, no. 4, pp. E709–E720, 2020.

      [10] J. M. Dwulet, J. K. Briggs, and R. K. P. Benninger, “Small subpopulations of beta-cells do not drive islet oscillatory [Ca2+] dynamics via gap junction communication,” PLOS Computational Biology, vol. 17, no. 5, p. e1008948, May 2021, doi: 10.1371/journal.pcbi.1008948.

      [11] B. E. Peercy and A. S. Sherman, “Do oscillations in pancreatic islets require pacemaker cells?,” Journal of Biosciences, vol. 47, no. 1, pp. 1–11, 2022.

      [12] G. A. Rutter, N. Ninov, V. Salem, and D. J. Hodson, “Comment on Satin et al.‘Take me to your leader’: an electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e10–e11, 2020.

      [13] L. S. Satin and P. Rorsman, “Response to comment on satin et al.‘Take me to your leader’: An electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e12–e13, 2020.

      [14] R. K. Benninger, W. S. Head, M. Zhang, L. S. Satin, and D. W. Piston, “Gap junctions and other mechanisms of cell–cell communication regulate basal insulin secretion in the pancreatic islet,” The Journal of physiology, vol. 589, no. 22, pp. 5453–5466, 2011.

      [15] R. Fried, Erectile dysfunction as a cardiovascular impairment. Academic Press, 2014. [16] T. Pipatpolkai, S. Usher, P. J. Stansfeld, and F. M. Ashcroft, “New insights into KATP channel gene mutations and neonatal diabetes mellitus,” Nature Reviews Endocrinology, vol. 16, no. 7, pp. 378–393, 2020.

      [17] A. M. Notary, M. J. Westacott, T. H. Hraha, M. Pozzoli, and R. K. P. Benninger, “Decreases in Gap Junction Coupling Recovers Ca2+ and Insulin Secretion in Neonatal Diabetes Mellitus, Dependent on Beta Cell Heterogeneity and Noise,” PLOS Computational Biology, vol. 12, no. 9, p. e1005116, Sep. 2016, doi: 10.1371/journal.pcbi.1005116.

      [18] J. V. Rocheleau, G. M. Walker, W. S. Head, O. P. McGuinness, and D. W. Piston, “Microfluidic glucose stimulation reveals limited coordination of intracellular Ca2+ activity oscillations in pancreatic islets,” Pro ceedings of the National Academy of Sciences, vol. 101, no. 35, pp. 12899–12903, 2004. [19] R. K. Benninger, M. Zhang, W. S. Head, L. S. Satin, and D. W. Piston, “Gap junction coupling and calcium waves in the pancreatic islet,” Biophysical journal, vol. 95, no. 11, pp. 5048–5061, 2008.

    1. Author Response

      Reviewer #1 (Public Review)

      The documented findings may be explained by the artifact of task design and the way the signals were calculated: The vmPFC was the only ROI for which a positive correlation was found between BGA and mood rating and TML. Instead, most other regions showed negative correlation (inlc da-Insula, dorsolateral prefrontal cortex, the visual cortex, the motor cortex, the dorsomedial premotor cortex, the ventral somatosensory cortex, and the ventral inferior parietal lobule). This can be purely an artifact of task itself: In 25% of mood rating trials, subjects were presented with a question. They had to move the cursor from left (very bad) to the right (very good) along a continuous visual analog scale (100 steps) with left and right-hand response buttons. They even got a warning if they were slow. In 75% of trials, subjects saw none of this and the screen was just blank and the subjects rested.”

      1) First of all, it is unclear if the 25% and 75% trials were mixed. I am assuming that they were not mixed as that could represent a fundamental mistake. The manuscript gives me the impression that this was not done (please clarify).

      If by 25% and 75% trials the Reviewer means rating and no-rating trials then yes, they were intermixed (following on Vinckier et al. 2018). As explained in the initial manuscript, mood was rated every 3-7 trials (for a total of 25% of trials), and we used a computational model to interpolate mood (i.e., theoretical mood level) for the trials in between. This was implemented to avoid sampling mood systematically after every feedback and to test whether vmPFC and daIns represents mood continuously or just when it must be rated. We do not see how this could represent a fundamental mistake. Note that the associations between BGA and mood hold whether we use only rating trials, or only no-rating trials, or both types of trials.

      To better explain how ratings and feedbacks were distributed across trials, we have added a supplementary figure that shows a representative example (Figure S1). This plot shows that ratings were collected independently of whether subjects were in high- or low-mood episodes. In other words, the alternance between rating and no-rating trials was orthogonal to the alternance between low- and high-mood episodes.

      2) Assuming that they were not mixed and we are seeing the data from 75% of trials only. These trials would trigger increased BGA activity in the default mode areas such as the vmPFC, and opposite patterns in the salience, visual and motor areas. Hence the opposite correlations. The authors should just plot BGA activity across regions during rest trials and see if this was the case. That would provide a whole different interpretation.

      Even if there were opposite correlations induced by the alternance between rating and no-rating trials, they would be orthogonal to mood fluctuations induced by positive and negative feedbacks. There is no way these putative opposite correlations could confound the correlation between BGA and mood, when restricted for instance to rating trials only. Anyway, what data show is not an opposite correlation between vmPFC and daIns (see figure R1 below) but that these two regions, when included as competing regressors in a same model, are both significant predictors of mood level. This could not be the case if vmPFC and daIns activities were just mirror reflections of a same factor (alternance of rating and no-rating trials).

      We agree on the argument that performing a task may activate (increase BGA in) the daIns and deactivate (decrease BGA in) the vmPFC, but this average level of activity is not relevant for our study, which explores trial-to-trial fluctuations. It would only be problematic if the alternance between rating and no-rating trials was 1) correlated to mood levels and 2) inducing (anti)correlations between vmPFC and daIns BGA. The first assumption is false by construction of the design, as explained above, and the second assumption is empirically false, as shown below by the absence of correlation between daIns and vmPFC BGA. For each trial, we averaged BGA during the pre-stimulus time window (-4 to 0s) and tested the correlation between all possible pairs of vmPFC and daIns recording sites implanted in a same subject (n = 247 pairs of recording sites from 18 subjects). We observed no reliable correlation between the two brain regions, whether including only rest (no-rating) trials, only rating trials, or all trials together (see figure R1 below). On the contrary, the positive correlation between mood and vmPFC, as well as the negative correlation between mood and daIns, was observed in all cases (whether considering rest, rating, or all trials together).

      Figure R1: Correlation between vmPFC and daIns activities. Bars show the correlation coefficients, averaged across pairs of recording sites, obtained when including all trials, only rest trials (no rating), or only mood-rating trials. The p-values were obtained using a two-sided, one-sample Student’s t-test on Fisher-transformed correlation coefficients. Note that performing the same analysis across subjects (instead of recording sites) yields the same result.

      3) In addition, it is entirely unclear how the BGA in a given electrode was plotted. How is BGA normalized for each electrode? What is baseline here? Without understanding what baseline was used for this normalization, it is hard to follow the next section about the impact of the intracerebral activity on decision-making.

      The normalization we used is neutral to the effect of interest. Details of BGA computation are given in the Methods section (lines 746-751):

      “For each frequency band, this envelope signal (i.e., time varying amplitude) was divided by its mean across the entire recording session and multiplied by 100. This yields instantaneous envelope values expressed in percentage (%) of the mean. Finally, the envelope signals computed for each consecutive frequency band were averaged together to provide a single time series (the broadband gamma envelope) across the entire session. By construction, the mean value of that time series across the recording session is equal to 100.”

      Then, BGA was simply z-scored over trials for every recording site. Thus, there was no baseline correction in the sense that there was no subtraction of pre-stimulus activity. We agree this would have been problematic, since we were precisely interested in the information carried by pre-stimulus activity. By z-scoring, we took as reference the mean activity over all trials.

      We added the following sentence in the Methods section (lines 755-756):

      “BGA was normalized for each recording site by z-scoring across trials.”

      4) line 237: how was the correction for multiple comparisons done? Subject by subject, ROI by ROI, electrode by electrode? Please clarify.

      The correction for multiple comparisons was done using a classic cluster-based permutation test (Maris & Ostenweld, 2007, J. Neurosci. Methods) performed at the level of ROI.

      We have updated the section detailing this method in the manuscript (lines 807-818), as follows:

      “For each ROI, a t-value was computed across all recording sites of the given ROI for each time point of the baseline window (-4 to 0 s before choice onset), independently of subject identity, using two-sided, one-sample, Student’s t-tests. For all GLMs, the statistical significance of each ROI was assessed through permutation tests. First, the pairing between responses and predictors across trials was shuffled randomly 300 times for each recording site. Second, we performed 60,000 random combinations of all contacts in a ROI, drawn from the 300 shuffles calculated previously for each site. The maximal cluster-level statistics (the maximal sum of t-values over contiguous time points exceeding a significance threshold of 0.05) were extracted for each combination to compute a “null” distribution of effect size across a time window from -4 to 0 s before choice onset (the baseline corresponding to the rest or mood assessment period). The p-value of each cluster in the original (non-shuffled) data was finally obtained by computing the proportion of clusters with higher statistics in the null distribution, and reported as the “cluster-level corrected” p-value (pcorr).”

      Reviewer #2 (Public Review)

      “This study used intracranial EEG to explore links between broad-band gamma oscillations and mood, and their impact on decisions. The topic is interesting and important. A major strength is the use of intracranial EEG (iEEG) techniques, which allowed the authors to obtain electrical signals directly from deep brain areas involved in decision making. With its precise temporal resolution, iEEG allowed the authors to study activity in specific frequency bands. While the results are potentially interesting, one major concern with the analysis procedure-specifically grouping of all data across all subjects and performing statistics across electrodes instead of across subjects-reduces enthusiasm for these findings. There is also a question about how mood impacts attentional state, which has already been shown to impact baseline (pre-stimulus) broad band gamma.”

      Major comments

      1)The number of subjects with contacts in vmPFC, daIns, and both vmPFC and daIns should be stated in the manuscript so the reader doesn't have to refer to the supplementary table to find this information.

      These details have been added to the Results section (lines 236-242 and 258-262), as follows:

      “The vmPFC (n = 91 sites from 20 subjects) was the only ROI for which we found a positive correlation (Figure 2b; Source data 1; Table S2) between BGA and both mood rating (best cluster: -1.37 to -1.04 s, sum(t(90)) = 122.3, pcorr = 0.010) and TML (best cluster: -0.57 to -0.13 s, sum(t(90)) = 132.4, pcorr = 8.10-3). Conversely, we found a negative correlation in a larger brain network encompassing the daIns (n = 86 sites from 28 subjects, Figure 2b; Source data 1; Table S2), in which BGA was negatively associated with both mood rating (best cluster: -3.36 to -2.51 s, sum(t(85)) = -325.8, pcorr < 1.7.10-5) and TML (best cluster: -3.13 to -2.72 s, sum(t(85)) = -136.4, pcorr = 9.10-3). (…) In order to obtain the time course of mood expression in the two ROIs (Figure 2c), we performed regressions between TML and BGA from all possible pairs of vmPFC and daIns recording sites recorded in a same subject (n = 247 pairs of recording sites from 18 subjects, see Methods) and tested the regression estimates across pairs within each ROI at each time point.”

      2) Effects shown in figs 2 and 3 are combined across subjects. We don't know the effective sample size for the comparisons being made, and the effects shown could be driven by just a few subjects. If the authors compute trial-wise regressions between mood and BGA for each subject, and then perform the statistics across subjects instead of across electrodes, do these results still pan out?

      Yes, we have redone the analyses at the group level to get statistics across subjects (see response to essential revisions). All main results remained significant or borderline. In these group-level random-effect analyses, data points are subject-wise BGA averaged across recording sites (within the temporal cluster identified with the fixed-effect approach). We have incorporated these analyses into the manuscript as a supplementary table (Table S4). However, these statistics across subjects are less standard in the field of electrophysiology, as they are both underpowered and unadjusted for sampling bias (because the same weight is given to subjects with 1 or 10 recording sites in the ROI), so we prefer to keep the usual statistics across recording sites in the main text.

      These analyses have been incorporated into the Results section (lines 355-357), as follows:

      “We also verified that the main findings of this study remained significant (or borderline) when using group-level random-effects analyses (Table S4, see methods), even if this approach is underpowered and unadjusted for sampling bias (some subjects having very few recording sites in the relevant ROI).”

      The methods section has also been edited, as follows (lines 831-835):

      “To test the association between BGA and mood, TML or choice at the group level (using random-effects analyses), we performed the same linear regression as described in the electrophysiological analyses section on BGA averaged over the best time cluster (identified by the fixed-effects approach) and across all recording sites of a given subjects located in the relevant ROI. We then conducted a two-sided, one-sample Student's t-test on the resulting regression estimates (Table S4).”

      3) Furthermore, how many of the subjects show statistically significant regressions between BGA and mood at any electrode? For example, the error bars in fig 2b are across electrodes. How would this figure look if error bars indicated variance across subjects instead?

      Depending on the metrics (mood rating or theoretical mood level), statistically significant regressions between BGA and mood was observed in 4 to 6 subjects for the vmPFC and 5 to 9 subjects in the daIns. We provide these numbers to satisfy the Reviewer’s request, but we do not see what statistical inference they could inform (inferences based on number of data points above and below significance threshold are clearly wrong). To satisfy the other request, we have reproduced below Fig. 2B with error bars indicating variance across subjects and not recording sites (Figure R2). Again, to make an inference about a neural representation at the population level, the relevant samples are recording sites, not subjects. All monkey electrophysiology studies base their inferences on the variance across neurons (typically coming from 2 or 3 monkeys pooled together).

      Figure R2: Reproduction of Figure 2B with lower panels indicating mean and variance across subjects instead of recording sites (upper panels). Blue: vmPFC, red: daIns. Bold lines indicate significant clusters (p < 0.05).

      4) In panel f, we can see that a large number of sites in both ROIs show correlations in the opposite direction to the reported effects. How can this be explained? How do these distributions of effects in electrodes correspond to distributions of effects in individual subjects?

      In our experience, this kind of pattern is observed in any biological dataset, so we do not understand what the Reviewer wants us to explain. It is simply the case for any significant effect across samples, the distribution would include some samples with effects in the opposite direction. If there were no effects in the opposite direction, nobody would need statistics to know whether the observed distribution is different from the null distribution. In our case, the variability might have arisen from different sources of noise (in mood estimate, in BGA recording, in stochastic fluctuations of pre-stimulus activity, in the link between mood and BGA that may be depends on unknown factors, etc.) This variability has been typically masked because until recently, effects of interest were plotted as means with error bars. The variability is more apparent when plotting individual samples, as we did. It is visually amplified by the fact that outliers are as salient as data points close to the mean, which are way more numerous but superimposed. We have replotted below the panel f with data points being subjects instead of recording sites (Figure R3).

      Figure R3: Reproduction of Figure 2F with lower panels showing the distribution, of regression estimates over subjects instead of recording sites (upper panels). Blue: vmPFC, red: daIns. Note that this is the only analysis which failed to reach significance using a group-level random-effect approach. This is not surprising as this approach is underpowered (perhaps in particular for this analysis over a [-4 to 0 s] pre-choice time window) and unadjusted for sampling bias (some subjects having very few recording sites in the relevant ROI).

      5) Baseline (pre-stimulus) gamma amplitudes have been shown to be related to attentional states. Could these effects be driven by attention rather than mood? The relationship between mood and decisions may be more complex than the authors describe, and could impact other cognitive factors such as attention, which have already been shown to impact baseline broad-band gamma.

      We agree with the Reviewer that the relationships between mood and decisions are certainly more complex in reality than in our model, which is obviously a simplification, as any model is. We also acknowledge that pre-stimulus gamma activity is modulated by fluctuations in attention. However, what was measured and related to BGA in our study is mood level, so it remains unclear what reason could support the claim that the effects may have been driven by attention. A global shift in attentional state (like being more vigilant when in a good or bad mood) would not explain the specific effects we observed (making more or less risky choices). If the Reviewer means that subjects might have paid more attention to gain prospects when in a good mood, and to loss prospects when in a bad mood, then we agree this is a possibility. Note however that the difference between this scenario and our description of the results (subjects put more weight on gain/loss prospect when in a good/bad mood) would be quite subtle. We have nevertheless incorporated this nuance in the discussion (lines 494-496):

      “This result makes the link with the idea that we may see a glass half-full or half-empty when we are in a good or bad mood, possibly because we pay more attention to positive or negative aspects.”

      6) The authors used a bipolar montage reference. Would it be possible that effects in low frequencies are dampened because of the bipolar reference instead of common average reference?

      This is unlikely, because the use of a common average reference montage has been shown to significantly increase the number of channels exhibiting task-related high-frequency activity (BGA), but not the number of channels exhibiting task-related low-frequency activity (see Li et al., 2018, Figure 5A-B). In addition, using a monopolar configuration would also have the disadvantage of significantly increasing the correlations between channels (compared to a bipolar montage). This would have therefore artificially induced task-related effects in other channels due to volume conduction effects (Li et al., 2018; Mercier et al., 2017).

      Reviewer #3 (Public Review):

      In this interesting paper, Cecchi et al. collected intracerebral EEG data from patients performing decision-making tasks in order to study how patient's trial-by-trial mood fluctuations affect their neural computation underlying risky choices. They found that the broadband gamma activity in vmPFC and dorsal anterior Insula (daIns) are distinctively correlated with the patient's mood and their choice. I found the results very interesting. This study certainly will be an important contribution to cognitive and computational neuroscience, especially how the brain may encode mood and associate it to decisions.

      Major comments

      1) The authors showed that the mood is positively correlated in vmPFC on high mood trials alone and negatively correlated daIns in low mood trials alone. This is interesting. But those are the trials in which these regions' activity predict choice (using the residual of choice model fit)?

      This is an excellent point. The intuition of Reviewer 3 was correct. To test it, we performed a complementary analysis in which we regressed choice (model fit residuals) against BGA, separately for low vs. high mood trials (median-split). This analysis revealed that in the vmPFC, BGA during high mood trials positively predicted choices whereas in the daIns, BGA during low mood trials negatively predicted choices.

      We have added the following paragraph in the Results section (lines 328-337):

      “Taken together, these results mean that vmPFC and daIns baseline BGA not only express mood in opposite fashion, but also had opposite influence on upcoming choice. To clarify which trials contributed to the significant association between choice and BGA, we separately regressed the residuals of choice model fit against BGA across either high- or low-mood trials (median split on TML; Figure 3b). In the vmPFC, regression estimates were significantly positive for high-mood trials only (high TML = 0.06 ± 0.01, t(90) = 5.64, p = 2.10-7; two-sided, one-sample, Student’s t-test), not for low-mood trials. Conversely, in the daIns, regression estimates only reached significance for low-mood trials (low TML = -0.05 ± 0.01, t(85) = -4.63, p = 1.10-5), not for high-mood trials. This double dissociation suggests that the vmPFC positively predicts choice when mood gets better than average, and the daIns negatively predicts choice when mood gets worse than average.”

      Also, Figure 3 has been modified accordingly.

      2) It would be helpful to see how high-mood trials and low-mood trials are distributed. Are they clustered or more intermixed?

      We thank the Reviewer for the suggestion. To provide a more detailed view on how feedback history shaped mood ratings and TML, we added a supplementary figure that shows a representative example (Figure S1).

      3) I am not sure how I should reconcile the above finding of the correlation between mood and BGA on high-mood vs. low-mood trials, and the results about how high vs. low baseline BGA predict choice. I may have missed something related to this in the discussion section, but could you clarify?

      Following the Reviewer’s suggestion, we now demonstrate that the vmPFC positively predicts choice when mood gets better than average, and the daIns negatively predicts choice when mood gets worse than average (see response to first point).

      To clarify this, we have added the following paragraph in the discussion (lines 461-469), and a schematic figure summarizing the main findings (Figure 4).

      “Choice to accept or reject the challenge in our task was significantly modulated by the three attributes displayed on screen: gain prospect (in case of success), loss prospect (in case of failure) and difficulty of the challenge. We combined the three attributes using a standard expected utility model and examined the residuals after removing the variance explained by the model. Those residuals were significantly impacted by mood level, meaning that on top of the other factors, good / bad mood inclined subjects to accept / reject the challenge. The same was true for neural correlates of mood: higher baseline BGA in the vmPFC / daIns was both predicted by good / bad mood and associated to higher accept / reject rates, relative to predictions of the choice model. Thus, different mood levels might translate into different brain states that predispose subjects to make risky or safe decisions (Figure 4).”

    1. Author Response

      Reviewer #1 (Public Review):

      This paper presents an interesting data set from historic Western Eurasia and North Africa. Overall, I commend the authors for presenting a comprehensive paper that focuses the data analysis of a large project on the major points, and that is easy to follow and well-written. Thus, I have no major comments on how the data was generated, or is presented. Paradoxically, historical periods are undersampled for ancient DNA, and so I think this data will be useful. The presentation is clever in that it focuses on a few interesting cases that highlight the breadth of the data.

      The analysis is likewise innovative, with a focus on detecting "outliers" that are atypical for the genetic context where they were found. This is mainly achieved by using PCA and qpAdm, established tools, in a novel way. Here I do have some concerns about technical aspects, where I think some additional work could greatly strengthen the major claims made, and lay out if and how the analysis framework presented here could be applied in other work.

      clustering analysis

      I have trouble following what exactly is going on here (particularly since the cited Fernandes et al. paper is also very ambiguous about what exactly is done, and doesn't provide a validation of this method). My understanding is the following: the goal is to test whether a pair of individuals (lets call them I1 and I2) are indistinguishable from each other, when we compare them to a set of reference populations. Formally, this is done by testing whether all statistics of the form F4(Ref_i, Ref_j; I1, I2) = 0, i.e. the difference between I1 and I2 is orthogonal to the space of reference populations, or that you test whether I1 and I2 project to the same point in the space of reference populations (which should be a subset of the PCA-space). Is this true? If so, I think it could be very helpful if you added a technical description of what precisely is done, and some validation on how well this framework works.

      We agree that the previous description of our workflow was lacking, and have substantially improved the description of the entire pipeline (Methods, section “Modeling ancestry and identifying outliers using qpAdm”), making it clearer and more descriptive. To further improve clarity, we have also unified our use of methodology and replaced all mentions of “qpWave” with “qpAdm”. In the reworked Methods section mentioned above, we added a discussion on how these tests are equivalent in certain settings, and describe which test we are exactly doing for our pairwise individual comparisons, as well as for all other qpAdm tests downstream of cluster discovery. In addition, we now include an additional appendix document (Appendix 4) which, for each region, shows the results from our individual-based qpAdm analysis and clustering in the form of heatmaps, in addition to showing the clusters projected into PC space.

      An independent concern is the transformation from p-values to distances. I am in particular worried about i) biases due to potentially different numbers of SNPs in different samples and ii) whether the resulting matrix is actually a sensible distance matrix (e.g. additive and satisfies the triangle inequality). To me, a summary that doesn't depend on data quality, like the F2-distance in the reference space (i.e. the sum of all F4-statistics, or an orthogonalized version thereof) would be easier to interpret. At the very least, it would be nice to show some intermediate results of this clustering step on at least a subset of the data, so that the reader can verify that the qpWave-statistics and their resulting p-values make sense.

      We agree that calling the matrix generated from p-values a “distance matrix” is a misnomer, as it does not satisfy the triangle inequality, for example. We still believe that our clustering generates sensible results, as UPGMA simply allows us to project a positive, symmetric matrix to a tree, which we can then use, given some cut-off, to define clusters. To make this distinction clear, we now refer to the resulting matrix as a “dissimilarity matrix” instead. As mentioned above, we now also include a supplementary figure for each region visualizing the clustering results.

      Regarding the concerns about p-values conflating both signal and power, we employ a stringent minimum SNP coverage filter for these analyses to avoid extremely-low coverage samples being separated out (min. SNPs covered: 100,000). In addition, we now show that cluster size and downstream outlier status do not depend on SNP coverage (Figure 2 - Suppl. 3).

      The methodological concerns lead me to some questions about the data analysis. For example, in Fig2, Supp 2, very commonly outliers lie right on top of a projected cluster. To my understanding, apart from using a different reference set, the approach using qpWave is equivalent to using a PCA-based clustering and so I would expect very high concordance between the approaches. One possibility could be that the differences are only visible on higher PCs, but since that data is not displayed, the reader is left wondering. I think it would be very helpful to present a more detailed analysis for some of these "surprising" clustering where the PCA disagrees with the clustering so that suspicions that e.g. low-coverage samples might be separated out more often could be laid to rest.

      To reduce the risk of artifactual clusters resulting from our pipeline, we devised a set of QC metrics (described in detail below) on the individuals and clusters we identified as outliers. Driven by these metrics, we implemented some changes to our outlier detection pipeline that we now describe in substantially more detail in the Methods (see comment above). Since the pipeline involves running many thousands of qpAdm analyses, it is difficult to manually check every step for all samples – instead, we focused our QC efforts on the outliers identified at the end of the pipeline. To assess outlier quality we used the following metrics, in addition to manual inspection:

      First, for an individual identified as an outlier at the end of the pipeline, we check its fraction of non-rejected hypotheses across all comparisons within a region. The rationale here is that by definition, an outlier shouldn’t cluster with many other samples within its region, so a majority of hypotheses should be rejected (corresponding to gray and yellow regions in the heatmaps, Appendix 4). Through our improvements to the pipeline, the fraction of non-rejected hypotheses was reduced from an average of 5.3% (median 1.1%) to an average of 3.8% (median 0.6%), while going from 107 to 111 outliers across all regions.

      Second, we wanted to make sure that outlier status was not affected by the inclusion of pre-historic individuals in our clustering step within regions. To represent majority ancestries that might have been present in a region in the past, we included Bronze and Copper Age individuals in the clustering analysis. We found that including these individuals in the pairwise analysis and clustering improved the clusters overall. However, to ensure that their inclusion did not bias the downstream identification of outliers, we also recalculated the clustering without these individuals. We inspected whether an individual identified as an outlier would be part of a majority cluster in the absence of Bronze and Copper Age individuals, which was not the case (see also the updated Methods section for more details on how we handle time periods within regions).

      In response to the “surprising” outliers based on the PCA visualizations in Figure 2, Supplement 2: with our updated outlier pipeline, some of these have disappeared, for example in Western and Northern Europe. However, in some regions the phenomenon remains. We are confident this isn’t a coverage effect, as we’ve compared the coverage between outliers and non-outliers across all clusters (see previous comment, Figure 2 - Suppl. 3), as well as specifically for “surprising” outliers compared to contemporary non-outliers – none of which showed any differences in the coverage distributions of “surprising” outliers (Author response images 1 and 2). In addition, we believe that the quality metrics we outline above were helpful in minimizing artifactual associations of samples with clusters, which could influence their downstream outlier status. As such, we think it is likely that the qpAdm analysis does detect a real difference between these sets of samples, even though they project close to each other in PCA space. This could be the result of an actual biological difference hidden from PCA by the differences in reference space (see also the reply to the following comment). Still, we cannot fully rule out the possibility of latent technical biases that we were not able to account for, so we do not claim the outlier pipeline is fully devoid of false positives. Nevertheless, we believe our pipeline is helpful in uncovering true, recent, long-range dispersers in a high-throughput and automated manner, which is necessary to glean this type of insight from hundreds of samples across a dozen different regions.

      Author response image 1.

      SNP coverage comparison between outliers and non-outliers in region-period pairings with “surprising” outliers (t-test p-value: 0.242).

      Author response image 2.

      PCA projection (left) and SNP coverage comparison (right) for “surprising” outliers and surrounding non-outliers in Italy_IRLA.

      One way the presentation could be improved would be to be more consistent in what a suitable reference data set is. The PCAs (Fig2, S1 and S2, and Fig6) argue that it makes most sense to present ancient data relative to present-day genetic variation, but the qpWave and qpAdm analysis compare the historic data to that of older populations. Granted, this is a common issue with ancient DNA papers, but the advantage of using a consistent reference data set is that the analyses become directly comparable, and the reader wouldn't have to wonder whether any discrepancies in the two ways of presenting the data are just due to the reference set.

      While it is true that some of the discrepancies are difficult to interpret, we believe that both views of the data are valuable and provide complementary insights. We considered three aspects in our decision to use both reference spaces: (1) conventions in the field (including making the results accessible to others), (2) interpretability, and (3) technical rigor.

      Projecting historical genomes into the present-day PCA space allows for a convenient visualization that is common in the field of ancient DNA and exhibits an established connection to geographic space that is easy to interpret. This is true especially for more recent ancient and historical genomes, as spatial population structure approaches that of present day. However, there are two challenges: (1) a two-dimensional representation of a fairly high-dimensional ancestry space necessarily incurs some amount of information loss and (2) we know that some axes of genetic variation are not well-represented by the present-day PCA space. This is evident, for example, by projecting our qpAdm reference populations into the present-day PCA, where some ancestries which we know to be quite differentiated project closely together (Author response image 3). Despite this limitation, we continue to use the PCA representation as it is well resolved for visualization and maximizes geographical correspondence across Eurasia.

      On the other hand, the qpAdm reference space (used in clustering and outlier detection) has higher resolution to distinguish ancestries by more comprehensively capturing the fairly high-dimensional space of different ancestries. This includes many ancestries that are not well resolved in the present-day PCA space, yet are relevant to our sample set, for example distinguishing Iranian Neolithic ancestry against ancestries from further into central and east Asia, as well as distinguishing between North African and Middle Eastern ancestries (Author response image 3).

      To investigate the differences between these two reference spaces, we chose pairwise outgroup-f3 statistics (to Mbuti) as a pairwise similarity metric representing the reference space of f-statistics and qpAdm in a way that’s minimally affected by population-specific drift. We related this similarity measure to the euclidean distance on the first two PCs between the same set of populations (Author response image 4). This analysis shows that while there is almost a linear correspondence between these pairwise measures for some populations, others comparisons fall off the diagonal in a manner consistent with PCA projection (Author response image 3), where samples are close together in PCA but not very similar according to outgroup-f3. Taken together, these analyses highlight the non-equivalence of the two reference spaces.

      In addition, we chose to base our analysis pipeline on the f-statistics framework to (1) afford us a more principled framework to disentangle ancestries among samples and clusters within and across regions (using 1-component vs. 2-component models of admixture), while (2) keeping a consistent, representative reference set for all analyses that were part of the primary pipeline. Meanwhile, we still use the present-day PCA space for interpretable visualization.

      Author response image 3.

      Projection of qpAdm reference population individuals into present-day PCA.

      Author response image 4.

      Comparison of pairwise PCA projection distance to outgroup-f3 similarity across all qpAdm reference population individuals. PCA projection distance was calculated as the euclidean distance on the first two principal components. Outgroup-f3 statistics were calculated relative to Mbuti, which is itself also a qpAdm reference population. Both panels show the same data, but each point is colored by either of the two reference populations involved in the pairwise comparison.

      PCA over time

      It is a very interesting observation that the Fst-vs distance curve does not appear to change after the bronze age. However, I wonder if the comparison of the PCA to the projection could be solidified. In particular, it is not obvious to me how to compare Fig 6 B and C, since the data in C is projected onto that in Fig B, and so we are viewing the historic samples in the context of the present-day ones. Thus, to me, this suggests that ancient samples are most closely related to the folks that contribute to present-day people that roughly live in the same geographic location, at least for the middle east, north Africa and the Baltics, the three regions where the projections are well resolved. Ideally, it would be nice to have independent PCAs (something F-stats based, or using probabilistic PCA or some other framework that allows for missingness). Alternatively, it could be helpful to quantify the similarity and projection error.

      The fact that historical period individuals are “most closely related to the folks that contribute to present-day people that roughly live in the same geographic location” is exactly the point we were hoping to make with Figures 6 B and C. We do realize, however, that the fact that one set of samples is projected into the PC space established by the other may suggest that this is an obvious result. To make it more clear that it is not, we added an additional panel to Figure 6, which shows pre-historical samples projected into the present-day PC space. This figure shows that pre-historical individuals project all across the PCA space and often outside of present-day diversity, with degraded correlation of geographic location and projection location (see also Author response image 5). This illustrates the contrast we were hoping to communicate, where projection locations of historical individuals start to “settle” close to present-day individuals from similar geographic locations, especially in contrast with pre-historic individuals.

      Author response image 5.

      Comparing geographic distance to PCA distance between pairs of historical and pre-historical individuals matched by geographic space. For each historical period individual we selected the closest pre-historical individual by geographic distance in an effort to match the distributions of pairwise geographic distance across the two time periods (left). For these distributions of individuals matched by geographic distance, we then queried the euclidean distance between their projection locations in the first two principal components (right).

    1. Author Response

      Reviewer #3 (Public Review):

      The authors explore the use of SRT as a host-directed therapy for use in combination with other first-line TB antibiotics. This manuscript is of substantial importance since TB is a major world health concern, and there is growing interest in the development of host-directed therapies to augment existing therapies for TB. Demonstrating the effectiveness of adding an FDA-approved drug to existing cocktails of anti-TB drugs has potentially exciting implications.

      The manuscript is bolstered by their use of multiple in vitro and in vivo models of infection, as well as a clinically relevant strain of TB. While their findings generally support the use of SRT as an effective HDT/treatment, the mechanistic details underlying the effectiveness of SRT remain somewhat obscure, and as presented, the in vitro experiments support more limited conclusions.

      Major concerns:

      In vitro studies (i.e. bacterial culture) were only performed with SRT up to 6 uM while the cultured cell experiments used a range up to 20 uM. 5 uM had almost no effect on the viability/growth of Mtb in macrophages. The authors should use the same concentrations in vitro as their macrophage studies to test whether SRT directly impacts Mtb viability to be able to rule in/out that SRT does not impact Mtb viability when cultured.

      We haven’t seen any appreciable decrease in the growth of Mtb at upto 20M in in vitro experiments, nearly 30-40% restriction after 8 days of culture. We used in combination of HR a lower dose of 6mM in combination with HR to offset the effect of minimal SRT inhibitory effects so that only the effect of SRT is understood.

      The mechanism of action of SRT during TB infection and the conclusions drawn by the authors are not supported by the limited experimentation. SRT is presented as an antagonist of polyI:C-induced type I IFNs, but during TB infection, cytosolic DNA sensing via the cGAS/STING axis constitutes the major pathway through which type I IFNs are induced in macrophages.

      To offer more support that SRT inhibits type I IFN, the authors should consider measuring the the actual amount of type I IFN using an IFNb ELISA. Additionally, the authors should use human/mouse primary macrophages (not just THP1 reporter cells) and measure transcript levels (at key time points post infection) and protein levels of type I IFN and other proinflammatory mediators (e.g. TNFa, IL-1, IL-6) +/- SRT to determine if SRT is specific to the type I IFN response. If this is indeed the case, other NFkB genes/cytokines should not be impacted.

      Moreover, to draw the conclusion that "augmentation property of SRT is due to its ability to inhibit IFN signalling" a set of experiments using an IFN blocking antibody would enhance Figure 2, as both cGAS and STING KO macs have significant differences in basal gene expression and their ability to respond to innate immune stimuli.

      Because the first half of the paper focuses on type I IFNs during macrophage infection to explain the mechanism of action for SRT, additional analysis of the mouse infections to examine levels of type I IFNs, as well as IL-1B and IFN-g (in serum/tissues?), is important for connecting the two halves of the manuscript. The in vivo data would also be strengthened by quantitative analysis of histological changes by, for example, blinded pathology scoring. This type of quantitation would also permit statistical analyses of this important pathology readout.

      We have performed analyse of tissue cytokine levels and did not see stark differences in the levels between HRZE and HRZES at two time points of 4 and 8weeks post treatment (Figure below). We feel that such studies would need a more comprehensive analyses of the immunological response induced in the host by the treatment at multiple time points. Such studies would be part of a more focussed plan in the future proposals and manuscripts. We have also conducted a manual scoring of the lesions between the groups and have recorded this data in the manuscript (Fig.4-figure supplement 1)

      The authors conclude that SRT functions through an inflammasome-related function, but this conclusion requires further support of actual inflammasome activation, such as IL-1B secretion by ELISA or IL-1B processing by western blot analysis, rather than Il1b gene expression alone. Additional functional readouts of inflammasome activation like cell death assays would also strengthen this conclusion.

      We thank the reviewer for these suggestions. These studies are currently underway and will be part of a future manuscript detailing the mechanistics of SRT mediated increase in antibiotic efficacy.

      What strain of TB was used in these studies? The results and methods do not indicate the strain used, which is critical to know since different strains have varying pathogenesis phenotypes.

      We have used Mtb Erdman for routine drug sensitive and N73 for the drug tolerant studies. This has been added in the text.

      Minor concerns:

      It might be worth consistently using the more common INH and RIF abbreviations to increase the clarity/readability of the MS and figures.

      We have used the conventional clinical abbreviations used for INH and Rifampicin What is the physiological concentration of SRT when taken for depression and how does that compare to the concentrations used in vitro? Are the in vitro concentrations feasible to achieve in patients?

      In Figure 3B, why is there a spike in TNF-a in the HRS treated cells only at 42h?

      The authors wish to thank the reviewer for this query. We have reanalysed the data and have depicted the modified figures in the current text version. The spike at 42H for TNF was an oversight and due to an erroneous representation of the values in the figure.

      Was statistical analysis performed on the data in Figure 3B and D?

      Yes, we have incorporated this information in the modified figure.

      A description/discussion of the different mouse strains use in infection - what benefits each has as a model and why several were used - would help convey the impact of the in vivo studies.

      These have been incorporated in the text. A discussion of the mouse strains and their immunopathology in infection has been included in the text.

      Since antibiotics and SRT were administered ad libitum, how did the authors ensure that mice took enough of the antibiotics and especially SRT? Is it known whether these drugs affect the water taste enough to affect a mouse's willingness to drink them?

      We preferred the use of ad libitum delivery of TB drugs in drinking water as used in the previous studies by Vilchèze et .al, 2018 Antimicrob Agents Chemother 23;62(3):e02165-17. To avoid non drinking, we used 5% glucose in the water of all animals including the non-antibiotic treated groups. We also followed the uptake of water during the treatment and found comparable levels of usage between the groups.

      Was statistical analysis performed on time-to-death experiments?

      Because of the inherent differences in the susceptibility and response between males and females C3HEBFEJ mice, we did not perform statistical analyses between the groups.

      Were CFUs measured in mice from Figure 4 to determine empirically how effective the antibiotic treatments were? And if SRT impacted their effectiveness?

      We have not tested the effect of SRT on bacterial burdens on bacteria treated with HR alone as these studies were aimed at deciphering chronic pathology. We have tested the effect on bacterial loads in the C3HEBFEJ model with the four-drug therapy and the C57BL6 and Balbc models of infection.

      The H&E images could use some additional labels to more easily discern what groups they belong to.

      These have been incorporated in the figure.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a carefully-conducted fMRI study looking at how neural representations in the hippocampus, entorhinal cortex, and ventromedial prefrontal cortex change as a function of local and global spatial learning. Collectively, the results from the study provide valuable additional constraints on our understanding of representational change in the medial temporal lobes and spatial learning. The most notable finding is that representational similarity in the hippocampus post-local-learning (but prior to any global navigation trials) predicts the efficiency of subsequent global navigation.

      Strengths:

      The paper has several strengths. It uses a clever two-phase paradigm that makes it possible to track how participants learn local structure as well as how they piece together global structure based on exposure to local environments. Using this paradigm, the authors show that - after local learning - hippocampal representations of landmarks that appeared within the same local environment show differentiation (i.e., neural similarity is higher for more distant landmarks) but landmarks that appeared in different local environments show the opposite pattern of results (i.e., neural similarity is lower for more distant landmarks); after participants have the opportunity to navigate globally, the latter finding goes away (i.e., neural similarity for landmarks that occurred in different local environments is no longer influenced by the distance between landmarks). Lastly, the authors show that the degree of hippocampal sensitivity to global distance after local-only learning (but before participants have the opportunity to navigate globally) negatively predicts subsequent global navigation efficiency. Taken together, these results meaningfully extend the space of data that can be used to constrain theories of MTL contributions to spatial learning.

      We appreciate Dr. Norman’s generous feedback here along with his other insightful comments. Please see below for a point-by-point response. We note that responses to a number of Dr. Norman’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct Dr. Norman to our responses above under the Essential revisions section.

      Weaknesses:

      General comment 1: The study has an exploratory feel, in the sense that - for the most part - the authors do not set forth specific predictions or hypotheses regarding the results they expected to obtain. When hypotheses are listed, they are phrased in a general way (e.g., "We hypothesized that we would find evidence for both integration and differentiation emerging at the same time points across learning, as participants build local and global representations of the virtual environment", and "We hypothesized that there would be a change in EC and hippocampal pattern similarity for items located on the same track vs. items located on different tracks" - this does not specify what the change will be and whether the change is expected to be different for EC vs. hippocampus). I should emphasize that this is not, unto itself, a weakness of the study, and it appears that the authors have corrected for multiple comparisons (encompassing the range of outcomes explored) throughout the paper. However, at times it was unclear what "denominator" was being used for the multiple comparisons corrections (i.e., what was the full space of analysis options that was being corrected for) - it would be helpful if the authors could specify this more concretely, throughout the paper.

      We appreciate this guidance and the importance of these points. We have taken a number of steps to clarify our hypotheses, we now distinguish a priori predictions from exploratory analyses, and we now explicitly indicate throughout the manuscript how we corrected for multiple comparisons. For full details, please see above for our response to Essential Revisions General comment #1.

      General comment 2: Some of the analyses featured prominently in the paper (e.g., interactions between context and scan in EC) did not pass multiple comparisons correction. I think it's fine to include these results in the paper, but it should be made clear whenever they are mentioned that the results were not significant after multiple comparisons correction (e.g., in the discussion, the authors say "learning restructures representations in the hippocampus and in the EC", but in that sentence, they don't mention that the EC results fail to pass multiple comparisons correction).

      Thank you for encouraging greater clarity here. As noted directly above, we now explicitly indicate our a priori predictions, we state explicitly which results survive multiple comparisons correction, and we added necessary caveats for effects that should be interpreted with caution.

      General comment 3: The authors describe the "flat" pattern across the distance 2, 3, and 4 conditions in Figure 4c (post-global navigation) and in Figure 5b (in the "more efficient" group) as indicating integration. However, this flat pattern across 2, 3, and 4 (unto itself) could simply indicate that the region is insensitive to location - is there some other evidence that the authors could bring to bear on the claim that this truly reflects integration? Relatedly, in the discussion, the authors say "the data suggest that, prior to Global Navigation, LEs had integrated only the nearest landmarks located on different tracks (link distance 2)" - what is the basis for this claim? Considered on its own, the fact that similarity was high for link distance 2 does not indicate that integration took place. If the authors cannot get more direct evidence for integration, it might be useful for them to hedge a bit more in how they interpret the results (the finding is still very interesting, regardless of its cause).

      Based on the outcomes of additional behavioral and neural analyses that were helpfully suggested by reviewers, we revised discussion of this aspect of the data. Please see our response above under Essential Revisions General comment #4 for full details of the changes made to the manuscript.

      Reviewer #2 (Public Review):

      This paper presents evidence of neural pattern differentiation (using representational similarity analysis) following extensive experience navigating in virtual reality, building up from individual tracks to an overall environment. The question of how neural patterns are reorganized following novel experiences and learning to integrate across them is a timely and interesting one. The task is carefully designed and the analytic setup is well-motivated. The experimental approach provides a characterization of the development of neural representations with learning across time. The behavioral analyses provide helpful insight into the participants' learning. However, there were some aspects of the conceptual setup and the analyses that I found somewhat difficult to follow. It would also be helpful to provide clearer links between specific predictions and theories of hippocampal function.

      We appreciate the Reviewer’s careful read of our manuscript and their thoughtful guidance for improvement, which we believe strengthened the revised product. We note that responses to a number of the Reviewer’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct the Reviewer to our responses above under the Essential revisions section.

      General comment 1: The motivation in the Introduction builds on the assumption that global representations are dependent on local ones. However, I was not completely sure about the specific predictions or assumptions regarding integration vs. differentiation and their time course in the present experimental design. What would pattern similarity consistent with 'early evidence of global map learning' (p. 7) look like? Fig. 1D was somewhat difficult to understand. The 'state space' representation is only shown in Figure 1 while all subsequent analyses are averaged pairwise correlations. It would be helpful to spell out predictions as they relate to the similarity between same-route vs. different-route neural patterns.

      We appreciate this feedback. An increase in pattern similarity across features that span tracks would indicate the linking of those features together. ‘Early evidence’ here describes the point in experience where participants had traversed local (within-track) paths but had yet to traverse across-tracks.

      Figure 1D seeks to communicate the high-level conceptual point about how similarity (abstractly represented as state-space distance) may change in one of two directions as a function of experience.

      General comment 2: The shared landmarks could be used by the participants to infer how the three tracks connected even before they were able to cross between them. It is possible that the more efficient navigators used an explicit encoding strategy to help them build a global map of the world. While I understand the authors' reasoning for excluding the shared landmarks (p. 13), it seems like it could be useful to run an analysis including them as well - one possibility is that they act as 'anchors' and drive the similarity between different tracks early on; another is that they act as 'boundaries' and repel the representations across routes. Assuming that participants crossed over at these landmarks, these seem like particularly salient aspects of the environment.

      We agree that these shared landmarks play an important role in learning the global environment and guiding participants’ navigation. However, they also add confounding elements to the analyses; mainly, shared landmarks are located near multiple goal locations and associated with multiple tracks, and transition probabilities differ at shared landmarks because they have an increased number of neighboring landmarks and fractals. In the initial submission, shared landmarks were included in all analyses except (a) global distance models and (b) context models (which compare items located on the same vs different tracks).

      With respect to (a) the global distance models, we ran these models while including shared landmarks and the results did not differ (see figure below and compare to Fig. 5 in the revised manuscript):

      Distance representations in the Global Environment, with shared landmarks included. These data can be compared to Figure 5 of the revised manuscript, which does not include shared landmarks (see page 5 of this response letter).

      We continue to report the results from models excluding shared landmarks due to the confounding factors described above, with the following addition to the Results section:

      “We excluded shared landmarks from this model as they are common to multiple tracks; however, the results do not differ if these landmarks are included in the analysis.”

      With respect to (b) the context analyses (which compare items located on the same vs different tracks), we cannot include shared landmarks in these analyses because they are common amongst multiple tracks and thus confound the analyses. Finally, we are unable to conduct additional analyses investigating shared landmarks specifically (for example, examining how similarity between shared landmarks evolves across learning) due to very low trial counts. We share the Reviewer’s perspective that the role of shared landmarks during the building of map representations promises to provide additional insights and believe this is a promising question for future investigation.

      General comment 3: What were the predictions regarding the fractals vs. landmarks (p. 13)? It makes sense to compare like-to-like, but since both were included in the models it would be helpful to provide predictions regarding their similarity patterns.

      We are grateful for the feedback on how to improve the consistency of results reporting. In the revision, we updated the relevant sections of the manuscript to include results from fractals. Please see our above response to Essential Revisions General comment #5 for additions made to the text.

      General comment 4: The median split into less-efficient and more-efficient groups does not seem to be anticipated in the Introduction and results in a small-N group comparison. Instead, as the authors have a wealth of within-individual data, it might be helpful to model single-trial navigation data in relation to pairwise similarity values for each given pair of landmarks in a mixed-effects model. While there won't be a simple one-to-one mapping and fMRI data are noisy, this approach would afford higher statistical power due to more within-individual observations and would avoid splitting the sample into small subgroups.

      We appreciate this very helpful suggestion. Following this guidance, we removed the median-split analysis and ran a mixed-effects model relating trial-wise navigation data (at the beginning of the Global Navigation Task) to pairwise similarity values for each given pair of landmarks and fractals (Post Local Navigation). We also altered our approach to the across-participant analysis examining brain-behavior relationships. Please see our above response to Essential Revisions General comment #3 for additions to the revised manuscript.

      General comment 5: If I understood correctly, comparing Fig. 4B and Fig. 5B suggests that the relationship between higher link distance and lower representational similarity was driven by less efficient navigators. The performance on average improved over time to more or less the same level as within-track (Fig. 2). Were less efficient navigators particularly inefficient on trials with longer distances? In the context of models of hippocampal function, this suggests that good navigators represented all locations as equidistant while poorer navigators showed representations more consistent with a map - locations that were further apart were more distant in their representational patterns. Perhaps more fine-grained analyses linking neural patterns to behavior would be helpful here.

      Following the above guidance, we removed the median-split analyses when exploring across-participant brain-behavior relationships (see Essential Revisions General comment #3), replacing it with a mixed-effects model analysis, and we revised our discussion of the across-track link distance effects (see Essential Revisions General comment #4). For this reason, we were hesitant and ultimately decided against conducting the proposed fine-grained analyses on the median-split data.

      General comment 6: I'm not completely sure how to interpret the functional connectivity analysis between the vmPFC and the hippocampus vs. visual cortex (Fig. 6). The analysis shows that the hippocampus and visual cortex are generally more connected than the vmPFC and visual cortex - but this relationship does not show an experience-dependent relationship and is consistent with resting-state data where the hippocampus tends to cluster into the posterior DMN network.

      We expected to see an experience-dependent relationship between vmPFC and hippocampal pattern similarity, and agree that these findings are difficult to interpret. Based on comments from several reviewers, we removed the second-order similarity analysis from the manuscript in favor of an analysis which models the relationship between vmPFC pattern similarity and hippocampal pattern similarity. Moreover, given the exploratory nature of the vmPFC analyses, and following guidance from Reviewer 1 about the visual cortex control analyses, both were moved to the Appendix. Please see our above response to Essential Revisions General comment #7 for further details of the changes made to the manuscript.

      Reviewer #3 (Public Review):

      Fernandez et al. report results from a multi-day fMRI experiment in which participants learned to locate fractal stimuli along three oval-shaped tracks. The results suggest the concurrent emergence of a local, differentiated within-track representation and a global, integrated cross-track representation. More specifically, the authors report decreases in pattern similarity for stimuli encountered on the same track in the entorhinal cortex and hippocampus relative to a pre-task baseline scan. Intriguingly, following navigation on the individual tracks, but prior to global navigation requiring track-switching, pattern similarity in the hippocampus correlated with link distances between landmark stimuli. This effect was only observed in participants who navigated less efficiently in the global navigation task and was absent after global navigation.

      Overall, the study is of high quality in my view and addresses relevant questions regarding the differentiation and integration of memories and the formation of so-called cognitive maps. The results reported by the authors are interesting and are based upon a well-designed experiment and thorough data analysis using appropriate techniques. A more detailed assessment of strengths and weaknesses can be found below.

      Strengths

      1) The authors address an interesting question at the intersection of memory differentiation and integration. The study is further relevant for researchers interested in the question of how we form cognitive maps of space.

      2) The study is well-designed. In particular, the pre-learning baseline scan and the random-order presentation of stimuli during MR scanning allow the authors to track the emergence of representations in a well-controlled fashion. Further, the authors include an adequate control region and report direct comparisons of their effects against the patterns observed in this control region.

      3) The manuscript is well-written. The introduction provides a good overview of the research field and the discussion does a good job of summarizing the findings of the present study and positioning them in the literature.

      We thank Dr. Bellmund for his positive evaluation of the manuscript. We greatly appreciate the insightful feedback, which we believe strengthened the manuscript’s clarity and potential impact. We note that responses to a number of Dr. Bellmund’s points were surfaced by the Editor as Essential revisions; as such, in a number of instances in the point-by-point below we direct the Reviewer to our responses above under the Essential revisions section.

      Weaknesses

      General comment 1: Despite these distinct strengths, the present study also has some weaknesses. On the behavioral level, I am wondering about the use of path inefficiency as a metric for global navigation performance. Because it is quantified based on the local response, it conflates the contributions of local and global errors.

      We appreciate this point with respect to path inefficiency during global navigation. As noted below, following Dr. Bellmund’s further insightful guidance, we now complement the path inefficiency analyses with additional metrics of across-track (global) navigation performance, which effectively separate local from global errors (please see below response to Author recommendation #1).

      General comment 2: For the distance-based analysis in the hippocampus, the authors choose to only analyze landmark images and do not include fractal stimuli. There seems to be little reason to expect that distances between the fractal stimuli, on which the memory task was based, would be represented differently relative to distances between the landmarks.

      We are grateful for the feedback on how to improve the consistency of results reporting. In the revision, we updated the relevant sections of the manuscript to include results from fractals. Please see our above response to Essential Revisions General comment #5 for full details.

      General comment 3: Related to the aforementioned analysis, I am wondering why the authors chose the link distance between landmarks as their distance metric for the analysis and why they limit their analysis to pairs of stimuli with distance 1 or 2 and do not include pairs separated by the highest possible distance (3).

      We appreciate the request for clarification here. Beginning with the latter question, we note that the highest possible distance varies between within-track vs. across-track paths. If participants navigate in the Local Navigation Task using the shortest or most efficient path, the highest possible within-track link distance between two stimuli is 2. For this reason, the Local Navigation/within-track analysis includes link distances of 1 and 2. For the Global Navigation analysis, we also include pairs of stimuli with link distances of 3 and 4 when examining across-track landmarks.

      Regarding the use of link distance as the distance metric, we note that the path distance (a.u.) varies only slightly between pairs of stimuli with the same link distance. As such, categorical treatment link distance accounts for the vast majority of the variance in path distance and thus is a suitable approach. Please note that in the new trial-level brain-behavior analysis included in the revised manuscript (which replaces the median-split analysis), we used the length of the optimal path.

      General comment 4: Surprisingly, the authors report that across-track distances can be observed in the hippocampus after local navigation, but that this effect cannot be detected after global, cross-track navigation. Relatedly, the cross-track distance effect was detected only in the half of participants that performed relatively badly in the cross-track navigation task. In the results and discussion, the authors suggest that the effect of cross-track distances cannot be detected because participants formed a "more fully integrated global map". I do not find this a convincing explanation for why the effect the authors are testing would be absent after global navigation and for why the effect was only present in those participants who navigated less efficiently.

      We appreciate Dr. Bellmund’s input here, which was shared by other reviewers. We revised and clarified the Discussion based on reviewer comments. Please see our above response to Essential Revisions General comment #4 for full details.

      General comment 5: The authors report differences in the hippocampal representational similarity between participants who navigated along inefficient vs. efficient paths. These are based on a median split of the sample, resulting in a comparison of groups including 11 and 10 individuals, respectively. The median split (see e.g. MacCallum et al., Psychological Methods, 2002) and the low sample size mandate cautionary interpretation of the resulting findings about interindividual differences.

      We appreciate the feedback we received from multiple reviewers with respect to the median-split brain-behavior analysis. We replaced the median-split analysis with the following: 1) a mixed-effects model predicting neural pattern similarity Post Local Navigation, with a continuous metric of task performance (each participant’s median path inefficiency for across-track trials in the first four test runs of Global Navigation) and link distance as predictors; and 2) a mixed-effects model relating trial-wise navigation data to pairwise similarity values for each given pair of landmarks and fractals (as suggested by Reviewer 2). Please see our above response to Essential Revisions General comment #3 for additions to the revised manuscript.

    1. Author Response:

      Evaluation Summary:

      This manuscript is of primary interest to readers in the field of infectious diseases especially the ones involved in COVID-19 research. The identification of immunological signatures caused by SARS-CoV-2 in HIV-infected individuals is important not only to better predict disease outcomes but also to predict vaccine efficacy and to potentially identify sources of viral variants. In here, the authors leverage a combination of clinical parameters, limited virologic information and extensive flow cytometry data to reach descriptive conclusions.

      We have extensively reworked the paper.

      Reviewer #1 (Public Review):

      The methods appear sound. The introduction of vaccines for COVID-19 and the emergence of variants in South Africa and how they may impact PLWH is well discussed making the findings presented a good reference backdrop for future assessment. Good literature review is also presented. Specific suggestions for improving the manuscript have been identified and conveyed to the authors.

      We thank the Reviewer for the support.

      Reviewer #2 (Public Review):

      Karima, Gazy, Cele, Zungu, Krause et al. described the impact of HIV status on the immune cell dynamics in response to SARS-CoV-2 infection. To do so, during the peak of the KwaZulu-Natal pandemic, in July 2020, they enrolled a robust observational longitudinal cohort of 124 participants all positive for SARS-CoV-2. Of the participants, a group of 55 people (44%) were HIV-infected individuals. No difference is COVID-19 high risk comorbidities of clinical manifestations were observed in people living with HIV (PLWH) versus HIV-uninfected individuals exception made for joint ache which was more present in HIV-uninfected individuals. In this study, the authors leverage and combine extensive clinical information, virologic data and immune cells quantification by flow cytometry to show changes in T cells such as post-SARS-CoV-2 infection expansion of CD8 T cells and reduced expression CXCR3 on T cells in specific post-SARS-CoV-2 infection time points. The authors also conclude that the HIV status attenuates the expansion of antibody secreting cells. The correlative analyses in this study show that low CXCR3 expression on CD8 and CD4 T cells correlates with Covid-19 disease severity, especially in PLWH. The authors did not observe differences in SARS-CoV-2 shedding time frame in the two groups excluding that HIV serostatus plays a role in the emergency of SARS-CoV-2 variants. However, the authors clarify that their PLWH group consisted of mostly ART suppressed participants whose CD4 counts were reasonably high. The study presents the following strengths and limitations

      We thank the Reviewer for the comments. The cohort now includes participants with low CD4.

      Strengths:

      A. A robust longitudinal observational cohort of 124 study participants, 55 of whom were people living with HIV. This cohort was enrolled in KwaZulu-Natal,South Africa during the peak of the pandemic. The participants were followed for up to 5 follow up visits and around 50% of the participants have completed the study.

      We thank the Reviewer for the support. The cohort has now been expanded to 236 participants.

      B. A broad characterization of blood circulating cell subsets by flow cytometry able to identify and characterize T cells, B cells and innate cells.

      We thank the Reviewer for the support.

      Weaknesses:

      The study design does not include

      A. a robust group of HIV-infected individuals with low CD4 counts, as also stated by the authors

      This has changed in the resubmission because we included participants from the second, beta variant dominated infection wave. For this infection wave we obtained what we think is an important result, presented in a new Figure 2:

      This figure shows that in infection wave 2 (beta variant), CD4 counts for PLWH dropped to below the CD4=200 level, yet recovered after SARS-CoV-2 clearance. Therefore, the participants we added had low CD4 counts, but this was SARS-CoV-2 dependent.

      B. a group of HIV-uninfected individuals and PLWH with severe COVID-19. As stated in the manuscript the majority of our participants did not progress beyond outcome 4 of the WHO ordinal scale. This is also reflected in the age average of the participants. Limiting the number of participants characterized by severe COVID-19 limits the study to an observational correlative study

      Death has now been added to Table 1 under the “Disease severity” subheading. The number of participants who have died, at 13, is relatively small. We did not limit the study to non-critical cases. Our main measure of severity is supplemental oxygen.

      This is stated in the Results, line 106-108:

      “Our cohort design did not specifically enroll critical SARS-CoV-2 cases. The requirement for supplemental oxygen, as opposed to death, was therefore our primary measure for disease severity.”

      This is justified in the Discussion, lines 219-225:

      “Our cohort may not be a typical 'hospitalized cohort' as the majority of participants did not require supplemental oxygen. We therefore cannot discern effects of HIV on critical SARS-CoV-2 cases since these numbers are too small in the cohort. However, focusing on lower disease severity enabled us to capture a broader range of outcomes which predominantly ranged from asymptomatic to supplemental oxygen, the latter being our main measure of more severe disease. Understanding this part of the disease spectrum is likely important, since it may indicate underlying changes in the immune response which could potentially affect long-term quality of life and response to vaccines.”

      C. a control group enrolled at the same time of the study of HIV-uninfected and infected individuals.

      This was not possible given constraints imposed on bringing non-SARS-CoV-2 infected participants into a hospital during a pandemic for research purposes. However, given that the study was longitudinal, we did track participants after convalescence. This gave us an approximation of participant baseline in the absence of SARS-CoV-2, for the same participants. Results are presented in Figure 2 above.

      D. results that elucidate the mechanisms and functions of immune cells subsets in the contest of COVID-19.

      We do not have functional assays.

      Reviewer #3 (Public Review):

      Karim et al have assembled a large cohort of PLWH with acute COVID-19 and well-matched controls. The main finding is that, despite similar clinical and viral (e.g., shedding) outcomes, the immune response to COVID-19 in PLWH differs from the immune response to COVID-19 in HIV uninfected individuals. More specifically, they find that viral loads are comparable between the groups at the time of diagnosis, and that the time to viral clearance (by PCR) is also similar between the two groups. They find that PLWH have higher proportions and also higher absolute number of CD8 cells in the 2-3 weeks after initial infection.

      The authors do a wonderful job of clinically characterizing the research participants. I was most impressed by the attention to detail with respect to timing of viral diagnosis as it related to symptom onset and specimen collection. I was also impressed by the number of longitudinal samples included in this study.

      We thank the Reviewer for the support.

    1. Author Response

      Reviewer #2 (Public Review):

      Silberberg et al. present a series of cryo-EM structures of the ATP dependent bacterial potassium importer KdpFABC, a protein that is inhibited by phosphorylation under high environmental K+ conditions. The aim of the study was to sample the protein's conformational landscape under active, non-phosphorylated and inhibited, phosphorylated (Ser162) conditions.

      Overall, the study presents 5 structures of phosphorylated wildtype protein (S162-P), 3 structures of phosphorylated 'dead' mutant (D307N, S162-P), and 2 structures of constitutively active, non-phosphorylatable protein (S162A).

      The true novelty and strength of this work is that 8 of the presented structures were obtained either under "turnover" or at least 'native' conditions without ATP, ie in the absence of any non-physiological substrate analogues or stabilising inhibitors. The remaining 2 were obtained in the presence of orthovanadate.

      Comparing the presented structures with previously published KdpFACB structures, there are 5 structural states that have not been reported before, namely an E1-P·ADP state, an E1-P tight state captured in the autoinhibited WT protein (with and without vanadate), and two different nucleotide-free 'apo' states and an E1·ATP early state.

      Of these new states, the 'tight' states are of particular interest, because they appear to be 'off-cycle', dead end states. A novelty lies in the finding that this tight conformation can exist both in nucleotide-free E1 (as seen in the published first KdpFABC crystal structure), and also in the phosphorylated E1-P intermediate.

      By EPR spectroscopy, the authors show that the nucleotide free 'tight' state readily converts into an active E1·ATP conformation when provided with nucleotide, leading to the conclusion that the E1-P·ADP state must be the true inhibitory species. This claim is supported by structural analysis supporting the hypothesis that the phosphorylation at Ser162 could stall the KdpB subunit in an E1P state unable to convert into E2P. This is further supported by the fact that the phosphorylated sample does not readily convert into an E2P state when exposed to vanadate, as would otherwise be expected.

      The structures are of medium resolution (3.1 - 7.4 Å), but the key sites of nucleotide binding and/or phosphorylation are reasonably well supported by the EM maps, with one exception: in the 'E1·ATP early' state determined under turnover conditions, I find the map for the gamma phosphate of ATP not overly convincing, leaving the question whether this could instead be a product-inhibited, Mg-ADP bound E1 state resulting from an accumulation of MgADP under the turnover conditions used. Overall, the manuscript is well written and carefully phrased, and it presents interesting novel findings, which expand our knowledge about the conformational landscape and regulatory mechanisms of the P-type ATPase family.

      We thank the reviewer for their comments and helpful insights. We have addressed the points as follows:

      However in my opinion there are the following weaknesses in the current version of the manuscript:

      1) A lack of quantification. The heart of this study is the comparison of the newly determined KdpFABC structures with previously published ones (of which there are already 10). Yet, there are no RMSD calculations to illustrate the magnitude of any structural deviations. Instead, the authors use phrases like 'similar but not identical to', 'has some similarities', 'virtually identical', 'significant differences'. This makes it very hard to appreciate the true level of novelty/deviation from known structures.

      This is a very valid point and we thank the reviewers for bringing it up. To provide a better overview and appreciation of conformational similarities and significant differences we have calculated RMSDs between all available structures of KdpFABC. They are summarised in the new Table 1 – Table Supplement 2. We have included individual rmsd values, whenever applicable and relevant, in the respective sections in the text and figures. We note that the RMSDs were calculated only between the cytosolic domains (KdpB N,A,P domains) after superimposition of the full-length protein on KdpA, which is rigid across all conformations of KdpFABC (see description in material and methods lines 1184-1191 or the caption to Table 1 – Table Supplement 2). We opted to not indicate the RMSD calculated between the full-length proteins, as the largest part of the complex does not undergo large structural changes (see Figure 1 – Figure Supplement 1, the transmembrane region of KdpB as well as KdpA, KdpC and KdpF show relatively small to no rearrangements compared to the cytosolic domains), and would otherwise obscure the relevant RMSD differences discussed here.

      Also the decrease in EPR peak height of the E1 apo tight state between phosphorylated and non-phosphorylated sample - a key piece of supporting data - is not quantified.

      EPR distance distributions have been quantified by fitting and integrating a gaussian distribution curve, and have been added to the corresponding results section (lines 523-542) and the methods section (lines 1230-1232).

      2) Perhaps as a consequence of the above, there seems to be a slight tendency towards overstatements regarding the novelty of the findings in the context of previous structural studies. The E1-P·ATP tight structure is extremely similar to the previously published crystal structure (5MRW), but it took me three reads through the paper and a structural superposition (overall RMSD less than 2Å), to realise that. While I do see that the existing differences, the two helix shifts in the P- and A- domains - are important and do probably permit the usage of the term 'novel conformation' (I don't think there is a clear consensus on what level of change defines a novel conformation), it could have been made more clear that the 'tight' arrangement of domains has actually been reported before, only it was not termed 'tight'.

      As indicated above we have now included an extensive RMSD table between all available KdpFABC structures. To ensure a meaningful comparison, the rmsd are only calculated between the cytosolic domains after superimposition of the full-length protein on KdpA, as the transmembrane region of KdpFABC is largely rigid (see figure below panel B). However, we have to note that in the X-ray structure the transmembrane region of KdpB is displaced relative to the rest of the complex when compared to the arrangement found in any of the other 18 cryo-EM structures, which all align well in the TMD (see figure below panel C). These deviations make the crystal structure somewhat of an outlier and might be a consequence of the crystal packing (see figure below panel A). For completeness in our comparison with the X-Ray structure, we have included an RMSD calculated when superimposed on KdpA and additional RMSD that was calculated between structures when aligned on the TMD of KdpB (see figure below panel D,E). The reported RMSD that the reviewer mentiones of less than 2Å was probably obtained when superimposing the entire complex on each other (see figure below panel F). However, we do not believe that this is a reasonable comparison as the TMD of the complex is significantly displaced, which stands in strong contrast to all other RMSDs calculated between the rest of the structures where the TMD aligns well (see figure below panel B).

      From the resulting comparisons, we conclude that the E1P-tight and the X-Ray structure do have a certain similarity but are not identical. In particular not in the relative orientation of the cytosolic domains to the rest of the complex. We hope that including the RMSD in the text and separately highlighting the important features of the E1P tight state in the section “E1P tight is the consequence of an impaired E1P/E2P transition“ makes the story now more conclusive.

      Likewise, the authors claim that they have covered the entire conformational cycle with their 10 structures, but this is actually not correct, as there is no representative of an E2 state or functional E1P state after ADP release.

      This is correct, and we have adjusted the phrasing to “close to the entire conformational cycle” or “the entire KdpFABC conformational cycle except the highly transient E1P state after ADP release and E2 state after dephosphorylation.”

      3) A key hypothesis this paper suggests is that KdpFABC cannot undergo the transition from E1P tight to E2P and hence gets stuck in this dead end 'off cycle' state. To test this, the authors analysed an S162-P sample supplied with the E2P inducing inhibitor orthovanadate and found about 11% of particles in an E2P conformation. This is rationalised as a residual fraction of unphosphorylated, non-inhibited, protein in the sample, but the sample is not actually tested for residual unphosphorylated fraction or residual activity. Instead, there is a reference to Sweet et al, 2020. So the claim that the 11% E2P particles in the vanadate sample are irrelevant, whereas the 14% E1P tight from the turnover dataset are of key importance, would strongly benefit from some additional validation.

      We have added an ATPase assay that shows the residual ATPase activity of WT KdpFABC compared to KdpFABS162AC, both purified from E. coli LB2003 cells, which is identical to the protein production and purification for the cryo-EM samples (see Figure 2-Suppl. Figure 5). The residual ATPase activity is ca. 14% of the uninhibited sample, which correlates with the E2-P fraction in the orthovanadate sample.

      Reviewer #3 (Public Review):

      The authors have determined a range of conformations of the high-affinity prokaryotic K+ uptake system KdpFABC, and demonstrate at least two novel states that shed further light on the structure and function of these elusive protein complexes.

      The manuscript is well-written and easy to follow. The introduction puts the work in a proper context and highlights gaps in the field. I am however missing an overview of the currently available structures/states of KdpFABC. This could also be implemented in Fig. 6 (highlighting new vs available data). This is also connected to one of my main remarks - the lack of comparisons and RMSD estimates to available structures. Similarity/resemblance to available structures is indicated several times throughout the manuscript, but this is not quantified or shown in detail, and hence it is difficult for the reader to grasp how unique or alike the structures are. Linked to this, I am somewhat surprised by the lack of considerable changes within the TM domain and the overlapping connectivity of the K indicated in Table 1 - Figure Supplement 1. According to Fig. 6 the uptake pathway should be open in early E1 states, but not in E2 states, contrasting to the Table 1 - Figure Supplement 1, which show connectivity in all structures? Furthermore, the release pathway (to the inside) should be open in the E2-P conformation, but no release pathway is shown as K ions in any of the structures in Table 1 - Figure Supplement 1. Overall, it seems as if rather small shifts in-between the shown structures (are the structures changing from closed to inward-open)? Or is it only KdpA that is shown?

      We thank the reviewer for their positive response and constructive criticisms. We have addressed these comments as follows:

      1. The overview of the available structures has been implemented in Fig. 6, with the new structures from this study highlighted in bold.

      2. RMSD values have been added to all comparisons, with a focus on the deviations of the cytosolic domains, which are most relevant to our conformational assignments and discussions.

      3. To highlight the (comparatively small) changes in the TMD, we have expanded Table 1 - Figure Supplement 1 to include panels showing the outward-open half-channel in the E1 states with a constriction at the KdpA/KdpB interface and the inward-open half-channel in the E2 states. The largest observable rearrangements do however take place in the cytosolic domains. This is an absolute agreement with previous studies, which focused more on the transition occurring within the transmembrane region during the transport cycle (Stock et al, Nature Communication 2018; Silberberg et al, Nature Communication 2021; Sweet et al., PNAS 2021).

      4. The ions observed in the intersubunit tunnel are all before the point at which the tunnel closes, explaining why there is no difference in this region between E1 and E2 structures. Moreover, as we discussed in our last publication (Silberberg, Corey, Hielkema et al., 2021, Nat. Comms.), the assignment of non-protein densities along the entire length of the tunnel is contentious and can only be certain in the selectivity filter of KdpA and the CBS of KdpB.

      5. The release pathway from the CBS does not feature any defined K+ coordination sites, so ions are not expected to stay bound along this inward-open half-channel.

      My second key remark concerns the "E1-P tight is the consequence of an impaired E1-P/E2-P transition" section, and the associated discussion, which is very interesting. I am not convinced though that the nucleotide and phosphate mimic-stabilized states (such as E1-P:ADP) represent the high-energy E1P state, as I believe is indicated in the text. Supportive of this, in SERCA, the shifts from the E1:ATP to the E1P:ADP structures are modest, while the following high-energy Ca-bound E1P and E2P states remain elusive (see Fig. 1 in PMID: 32219166, from 3N8G to 3BA6). Or maybe this is not what the authors claim, or the situation is different for KdpFABC? Associated, while I agree with the statement in rows 234-237 (that the authors likely have caught an off-cycle state), I wonder if the tight E1-P configuration could relate to the elusive high-energy states (although initially counter-intuitive as it has been caught in the structure)? The claims on rows 358-360 and 420-422 are not in conflict with such an idea, and the authors touch on this subject on rows 436-450. Can it be excluded that it is the proper elusive E1P state? If the state is related to the E1P conformation it may well have bearing also on other P-type ATPases and this could be expanded upon.

      This a good point, particularly since the E1P·ADP state is the most populated state in our sample, which is also counterintuitive to “high-energy unstable state”. One possible explanation is that this state already has some of the E1-P strains (which we can see in the clash of D307-P with D518/D522), but the ADP and its associated Mg2+ in particular help to stabilize this. Once ADP dissociates and takes the Mg2+ with it, the full destabilization takes effect in the actual high-energy E1P state. Nonetheless, we consider it fair to compare the E1P tight with the E1P·ADP to look for electrostatic relaxation. We have clarified the sequence of events and our hypothesized role the ADP/Mg2+ have in stabilizing the E1P·ADP state that we can see (lines 609-619): “Moreover, a comparison of the E1P tight structure with the E1P·ADP structure, its most immediate precursor in the conformational cycle obtained, reveals a number of significant rearrangements within the P domain (Figure 5B,C). First, Helix 6 (KdpB538-545) is partially unwound and has moved away from helix 5 towards the A domain, alongside the tilting of helix 4 of the A domain (Figure 5B,C – arrow 2). Second, and of particular interest, are the additional local changes that occur in the immediate vicinity of the phosphorylated KdpBD307. In the E1P·ADP structure, the catalytic aspartyl phosphate, located in the D307KTG signature motif, points towards the negatively charged KdpBD518/D522. This strain is likely to become even more unfavorable once ADP dissociates in the E1P state, as the Mg2+ associated with the ADP partially shields these clashes. The ensuing repulsion might serve as a driving force for the system to relax into the E2 state in the catalytic cycle.”

      We believe it is highly unlikely that the reported E1-P tight state represents an on-cycle high-energy E1P intermediate. For one, we observe a relaxation of electrostatic strains in this structure, in particular when compared to the obtained E1P ADP state. By contrast, the E1P should be the most energetically unfavourable state possible to ensure the rapid transition to the E2P state. As such, this state should be a transient state, making it less likely to be obtainable structurally as an accumulated state. Additionally, the association of the N domain with the A domain in the tight conformation, which would have to be reverted, would be a surprising intermediary step in the transition from E1P to E2P. Altogether, the here reported E1P tight state most likely represents an off-cycle state.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript was well written and interrogates an exciting and important question about whether thalamic sub-regions serve as essential "hubs" for interconnecting diverse cognitive processes. This lesion dataset, combined with normative imaging analyses, serves as a fairly unique and powerful way to address this question.

      Overall, I found the data analysis and processing to be appropriate. I have a few additional questions that remain to be answered to strengthen the conclusions of the authors.

      1. The number of cases of thalamic lesions was small (20 participants) and the sites of overlap in this group is at maximum 5 cases. Finding focal thalamic lesions with the appropriate characteristics is likely to be relatively hard, so this smaller sample size is not surprising, but it suggests that the overlap analyses conducted to identify "multi-domain" hub sites will be relatively underpowered. Given these considerations, I was a bit surprised that the authors did not start with a more hypothesis driven approach (i.e., separating the groups into those with damage to hubs vs. non-hubs) rather than using this more exploratory overlap analysis. It is particularly concerning that the primary "multi-domain" overlap site is also the primary site of overlap in general across thalamic lesion cases (Fig. 2A).

      An issue that arises when attempting to separate lesions into “hub” versus “non-hub” lesions at the study onset is there is not an accepted definition or threshold for a binary categorization of hubs. The primary metric for estimating hub property, participation coefficient (PC), is a continuous measure ranging from 0 to 1, without an objective threshold to differentiate hub versus non-hub regions. Thus, a binary classification would require exploring an arbitrary threshold for splitting our sample. Our concern is that assigning an arbitrary threshold and delineating groups based on that threshold would be equally, if not more, exploratory. However, we appreciate this comment and future studies may be able to use the results of the current analysis to formulate an a priori threshold based on our current results. Similarly, given the relative difficulty recruiting patients with focal thalamic lesions, we did not have enough power to do a linear regression testing the relationship between PC and the global deficit score. Weighing all these factors, we determined that counting the number of tests impaired, and defining global deficit as more than one domain impaired, is a more objective and less exploratory approach for addressing our specific hypotheses than arbitrarily splitting PC values.

      We agree with the reviewer that our unequal lesion coverage in the thalamus is a limitation. We have acknowledged this in the discussion section (line 561). There may very likely be other integrative sites (for example the medial pulvinar) that we missed simply because we did not have sufficient lesion coverage. We have updated our discussion section (line 561) to more explicitly discuss the limitation of our study.

      1. Many of the comparison lesion sites (Fig. 1A) appear to target white matter rather than grey matter locations. Given that white matter damage may have systematically different consequences as grey matter damage, it may be important to control for these characteristics.

      We have conducted further analyses to better control for the effects of white matter damage.

      1. The use of cortical lesion locations as generic controls was a bit puzzling to me, as there are hub locations in the cortex as well as in the thalamus. It would be useful to determine whether hub locations in the cortex and thalamus show similar properties, and that an overlap approach such as the one utilized here, is effective at identifying hubs in the cortex given the larger size of this group.

      We have conducted additional analyses to replicate our findings and validate our approach in a group of 145 expanded comparison patients. We found that comparison patients with lesions to brain regions with higher PC values exhibited more global deficits, when compared to patients that did not exhibit global deficits. Results from this additional analysis were included in Figure 6.

      1. While I think the current findings are very intriguing, I think the results would be further strengthened if the authors were able to confirm: (1) that the multi-domain thalamic lesions are not more likely to impact multiple nuclei or borders between nuclei (this could also lead to a multi-domain profile of results) and (2) that the locations of these locations are consistent in their network functions across individuals (perhaps through comparisons with Greene et al., 2020 or more extended analyses of the datasets included in this work) as this would strengthen the connection between the individual lesion cases and the normative sample analyses.

      We can confirm that multi-domain thalamic lesions did not cover more thalamic subdivisions (anatomical nuclei or functional parcellations). We also examined whether the multi-domain lesion site consistently showed high PC values in individual normative subjects. We calculated thalamic PC values for each of the 235 normative subjects, and compared the average PC values in the multi-domain lesion site versus the single domain-lesion site across these normative subjects. We found the multi-domain site exhibited significantly higher PC values (Figure 5D, t(234) = 6.472, p < 0.001). This suggest that the multi-domain lesion site consistently showed stronger connector hub property across individual normative subjects.

      We also visually compared our results with Greene et al., 2020 (see below). We found that in the dorsal thalamus (z >10), there was a good spatial overlap between the integration zone reported in Greene et al 2020 and the multi-domain lesion site that we identified. In the ventral thalamus (z < 4), we did not identify the posterior thalamus as part of the multi-domain lesion site, likely because we did not have sufficient lesion coverage in the posterior thalamus.

      In terms of describing the putative network functions of the thalamic lesion sites, results presented in Figure 7A indicate that multi-domain lesion sites in the thalamus were broadly coupled with cortical functional networks previously implicated in domain-general control processes, such as the cingulo-opercular network, the fronto-parietal network, and the dorsal attention network.

      Greene, Deanna J., et al. "Integrative and network-specific connectivity of the basal ganglia and thalamus defined in individuals." Neuron 105.4 (2020): 742-758.

    1. Author Response

      Reviewer #1 (Public Review):

      This study investigates low-frequency (LF) local field potentials and high-frequency (HF, >30 Hz) broadband activity in response to the visual presentation of faces. To this end, rhythmic visual stimuli were presented to 121 human participants undergoing depth electrode recordings for epilepsy. Recordings were obtained from the ventral occipito-temporal cortex and brain activity was analyzed using a frequency-tagging approach. The results show that the spatial, functional, and timing properties of LF and HF responses are largely similar, which in part contradicts previous investigations in smaller groups of participants. Together, these findings provide novel and convincing insights into the properties and functional significance of LF and HF brain responses to sensory stimuli.

      Strengths

      • The properties and functional significance of LF and HF brain responses is a timely and relevant basic science topic.

      • The study includes intracranial recordings in a uniquely high number of human participants.

      • Using a frequency tagging paradigm for recording and comparing LF and HF responses is innovative and straightforward.

      • The manuscript is well-written and well-illustrated, and the interpretation of the findings is mostly appropriate.

      Weaknesses

      • The writing style of the manuscript sometimes reflects a "race" between the functional significance of LF and HF brain responses and researchers focusing on one or the other. A more neutral and balanced writing style might be more appropriate.

      We would like first to thank the reviewer for his/her positive evaluation as well as constructive and helpful comments for revising our manuscript.

      Regarding the writing style: we had one major goal in this study, which is to investigate the relationship between low and high frequencies. However, it is fair to say – as we indicate in our introduction section – that low frequency responses are increasingly cast aside in the intracranial recording literature. That is, an increasing proportion of publications simply disregard the evoked electrophysiological response that occur at the low end of the frequency spectrum, to focus exclusively on the high-frequency response (e.g., Crone et al., 2001; Flinker et al., 2011; Mesgarani and Chang, 2012; Bastin et al., 2013; Davidesco et al., 2013; Kadipasoaglu et al., 2016; 2017; Shum et al., 2013; Golan et al., 2016; 2017; Grossman et al., 2019; Wang et al., 2021, see list of references at the end of the reply).

      Thus, on top of the direct objective comparison between the two types of signals that our study originally provides, we think that it is fair to somehow reestablish the functional significance of low frequency activity in intracranial recording studies.

      The writing style reflects that perspective rather than a race between the functional significance of LF and HF brain responses.

      • It remains unclear whether and how the current findings generalize to the processing of other sensory stimuli and paradigms. Rhythmic presentation of visual stimuli at 6 Hz with face stimuli every five stimuli (1.2 Hz) represents a very particular type of sensory stimulation. Stimulation with other stimuli, or at other frequencies likely induce different responses. This important limitation should be appropriately acknowledged in the manuscript.

      We agree with the Reviewer 1 (see also Reviewer 2) that it is indeed important to discuss whether the current findings generalize to the other brain functions and to previous findings obtained with different methodologies. We argue that our original methodological approach allows maximizing the generalizability of our findings.

      First, frequency-tagging approach is a longstanding stimulation method, starting from the 1930s (i.e., well before standard evoked potential recording methods; Adrian & Matthews, 1934; intracranially: Kamp et al., 1960) and widely used in vison science (Regan, 1989; Norcia et al., 2015) but also in other domains (e.g., auditory, somato-sensory stimulation). More importantly, this approach does not only significantly increase the signal-to-noise ratio of neural responses, but the objectivity and the reliability of the LF-HF signal comparison (objective identification and quantification of the responses, very similar analysis pipelines).

      Second, regarding the frequency of stimulation, our scalp EEG studies with high-level stimuli (generally faces) have shown that the frequency selection has little effect on the amplitude and the shape of the responses, as long as the frequency is chosen within a suitable range for the studied function (Alonso-Prieto et al., 2013). Regarding the paradigm used specifically in the present study (originally reported in Rossion et al., 2015 and discussed in detail for iEEG studies in Rossion et al., 2018), it has been validated with a wide range of approaches (EEG, MEG, iEEG, fMRI) and populations (healthy adults, patients, children and infants), identifying typically lateralized occipito-temporal face-selective neural activity with a peak in the middle section of the lateral fusiform gyrus (Jonas et al., 2016; Hagen et al., 2020 in iEEG; Gao et al., 2018 in fMRI).

      Importantly, specifically for the paradigm used in the present study, our experiments have shown that the neural face-selective responses are strictly identical whether the faces are inserted at periodic or non-periodic intervals within the train of nonface objects (Quek & Rossion, 2017), that the ratio of periodicity for faces vs. objects (e.g., 1/5, 1/7 … 1/11) does not matter as long as the face-selective responses do not overlap in time (Retter & Rossion, 2016; Retter et al., 2020) and that the responses are identical across a suitable range of base frequency rates (Retter et al., 2020).

      Finally, we fully acknowledge that the category-selective responses would be different in amplitude and localization for other types of stimuli, as also shown in our previous EEG (Jacques et al., 2016) and iEEG (Hagen et al., 2020) studies. Yet, as indicated in our introduction and discussion section, there are many advantages of using such a highly familiar and salient stimulus as faces, and in the visual domain at least we are confident that our conclusions regarding the relationship between low and high frequencies would generalize to other categories of stimuli.

      We added a new section on the generalizability of our findings at the end of the Discussion, p.32-33 (line 880) (see also Reviewer 2’s comments). Please see above in the “essential revisions” for the full added section.

      Reviewer #2 (Public Review):

      The study by Jacques and colleagues examines two types of signals obtained from human intracortical electroencephalography (iEEG) measures, the steady-state visual evoked potential and a broadband response extending to higher frequencies (>100 Hz). The study is much larger than typical for iEEG, with 121 subjects and ~8,000 recording sites. The main purpose of the study is to compare the two signals in terms of spatial specificity and stimulus tuning (here, to images of faces vs other kinds of images).

      The experiments consisted of subjects viewing images presented 6 times per second, with every 5th image depicting a face. Thus the stimulus frequency is 6 Hz and the face image frequency is 1.2 Hz. The main measures of interest are the responses at 1.2 Hz and harmonics, which indicate face selectivity (a different response to the face images than the other images). To compare the two types of signals (evoked potential and broadband), the authors measure either the voltage fluctuations at 1.2 Hz and harmonics (steady-state visually evoked potential) or the fluctuations of broadband power at these same frequencies.

      Much prior work has led to the interpretation of the broadband signal as the best iEEG correlate of spatially local neuronal activity, with some studies even linking the high-frequency broadband signal to the local firing rate of neurons near the electrode. In contrast, the evoked potential is often thought to arise from synchronous neural activity spread over a relatively large spatial extent. As such, the broadband signal, particularly in higher frequencies (here, 30-160 Hz) is often believed to carry more specific information about brain responses, both in terms of spatial fidelity to the cortical sources (the cortical point spread function) and in terms of functional tuning (e.g., preference for one stimulus class over another). This study challenges these claims, particularly, the first one, and concludes that (1) the point spread functions of the two signals are nearly identical, (2) the cortical locations giving rise to the two signals are nearly identical, and (3) the evoked potential has a considerably higher signal-to-noise ratio.

      These conclusions are surprising, particularly the first one (same point spread functions) given the literature which seems to have mostly concluded that the broadband signal is more local. As such, the findings pose a challenge to the field in interpreting the neuronal basis of the various iEEG signals. The study is large and well done, and the analysis and visualizations are generally clear and convincing. The similarity in cortical localization (which brain areas give rise to face-selective signals) and in point-spread functions are especially clear and convincing.

      We thank the reviewer for his/her fair and positive evaluation of our work and helpful comments.

      Although the reviewer does not disagree or criticize our methodology, we would like to reply to their comment about the surprising nature of our findings (particularly the similar spatial extent of LF and HF). In fact, we think that there is little evidence for a difference in ‘point-spread’ function in the literature, and thus that these results are not really that surprising. As we indicate in the original submission (discussion), in human studies, to our knowledge, the only direct comparisons of spatial extent of LF responses and HF is performed by counting and reporting the number of significant electrodes showing a significant response in the two signals (Miller et al., 2007; Crone et al., 1998; Pfurtscheller et al., 2003; see list of references at the end of the reply). Overall, these studies find a smaller number of significant electrodes with HF compared to LF. Intracranial EEG studies pointing to a more focal origin of HF activity generally cite one or several of these publications (e.g. Shum et al., 2013). In the current study, we replicate this finding and provide additional analyses showing that it is confounded with SNR differences across signals and created artificially by the statistical threshold. When no threshold is used and a more appropriate measure of spatial extent is computed (here, spatial extent at half maximum), we find no difference between the 2 signals, except for a small difference in the left anterior temporal lobe. Moreover, in intracranial EEG literature, the localness of the HF response is often backed by the hypothesis that HF is a proxy for firing rate. Indeed, since spikes are supposed to be local, it is implied that HF has to be local as well. However, while clear correlations have been found between HF measured with micro-electrodes and firing rate (e.g., Nir et al. 2007; Manning et al., 2009), there is no information on how local the activity measured at these electrodes is, and no evidence that the HF signal is more local than LF signal in these recordings. Last, the link between (local?) firing rate and HF/broadband signal has been show using micro-electrodes which vastly differ in size compared to macro-electrodes. The nature of the relationship and its spatial properties may differ between micro-electrodes and macro-electrodes used in ECOG/SEEG recordings.

      We feel these points were all already discussed thoroughly in the original submission of the manuscript (see p. 28-30 in the revised manuscript) and did not modify the revised manuscript.

      The lack of difference between the two signals (other than SNR), might ordinarily raise suspicion that there is some kind of confound, meaning that the two measures are not independent. Yet there are no obvious confounds: in principle, the broadband measure could reflect the high-frequency portion of the evoked response, rather than a separate, non-phase locked response to the signal. However, this is unlikely, given the rapid fall-off in the SSVEP at amplitudes much lower than the 30 Hz low-frequency end of the broadband measure. And the lack of difference between the two signals should not be confused for a null result: both signals are robust and reliable, and both are largely found in the expected parts of the brain for face selectivity (meaning the authors did not fail to measure the signals - it just turns out that the two measures have highly similar characteristics).

      The current reviewer and reviewer #3 both commented or raised concerned about the fact that HF signal as measured in our study might be contaminated by LF evoked response, thereby explaining our findings of a strong similarity between the 2 signals.

      This was actually a potential (minor) concern given the time-frequency (wavelet) parameters used in the original manuscript. Indeed, the frequency bandwidth (as measured as half width at half maximum) of the wavelet used at the lower bound (30Hz) of the HF signal extended to 11Hz (i.e., half width at half maximum = 19 Hz). At 40Hz, the bandwidth extended to 24Hz (i.e., HWHM = 16 Hz). While low-frequency face-selective responses at that range (above 16 Hz) are negligible (see e.g., Retter & Rossion, 2016; and data below for the present study), they could have potentially slightly contaminated the high frequency activity indeed.

      To fully ensure that our findings could not be explained by such a contamination, we recomputed the HF signal using wavelets with a smaller frequency bandwidth and changed the high frequency range to 40-160 Hz. This ensures that the lowest frequency included in the HF signal (defined as the bottom of the frequency range minus half of the frequency bandwidth, i.e., half width at half maximum) is 30 Hz, which is well above the highest significant harmonic of face-selective response in our frequency-tagging experiment (i.e., 22.8 Hz ; defined as the harmonic of face frequency where, at group level, the number of recording contacts with a significant response was not higher than the number of significant contacts detected for noise in bins surrounding harmonics of the face frequency, see figure below). Thus, the signal measured in the 40-160 Hz range is not contaminated by lower frequency evoked responses.

      We recomputed all analyses and statistics as reported in the original manuscript with the new HF definition. Overall, this change had very little impact on the findings, except for slightly lower correlation between HF and LF (in Occipital and Anterior temporal lobe) when using single recording contacts as unit data points (Note that we slightly modified the way we compute the maximal expected correlation. Originally we used the test-retest reliability averaged over LF and HF; in the revised version we use the lower reliability value of the 2 signals, which is more correct since the lower reliability is the true upper limit of the correlation). This indicates that the HF activity was mostly independent from phase-locked LF signal already in the original submission. However, since the analyses with the revised time-frequency analyses parameters enforce this independence, the revised analyses are reported as the main analyses in the manuscript.

      The manuscript was completely revised accordingly and all figures (main and supplementary) were modified to reflect these new analyses. We also extended the methods section on HF analyses (p. 37) to indicate that HF parameters were selected to ensure independence of the HF signal from the LF evoked response, and provide additional information on wavelet frequency bandwidth.

      There are some limitations to the possible generalizability of the conclusions drawn here. First, all of the experiments are of the same type (steady-state paradigm). It could be that with a different experimental design (e.g., slower and/or jittered presentation) the results would differ. In particular, the regularity of the stimulation (6 Hz images, 1.2 Hz faces) might cause the cortex to enter a rhythmic and non-typical state, with more correlated responses across signal types. Nonetheless, the steady-state paradigm is widely used in research, and even if the conclusions turn out to hold only for this paradigm, they would be important. (And of course, they might generalize beyond it.)

      We understand the concern of the reviewer and appreciate the last statement about the wide use of the steady-state paradigm and the importance of our conclusions. Above that, we are very confident that our results can be generalized to slower and jittered presentations. Indeed, with this paradigm in particular, we have compared different frequency rates and periodic and nonperiodic stimulations in previous studies (Retter & Rossion, 2016; Quek et al., 2017; Retter et al., 2020). Importantly, specifically for the paradigm used in the present study, the neural face-selective responses are strictly identical whether the faces are inserted at periodic or non-periodic intervals within the train of nonface objects (Quek & Rossion, 2017), showing that the regularity of stimulation does not cause a non-typical state.

      Please see our reply above to essential revisions and reviewer 1, in which we fully address this issue, as well as the revised discussion section (p. 32-33).

      A second limitation is the type of stimulus and neural responses - images of faces, face-selectivity of neural responses. If the differences from previous work on these types of signals are due to the type of experiment - e.g., finger movements and motor cortex, spatial summation and visual cortex - rather than to the difference in sample size of type of analysis, then the conclusions about the similarity of the two types of signals would be more constrained. Again, this is not a flaw in the study, but rather a possible limitation in the generality of the conclusions.

      This is a good point, which has been discussed above also. Please note that this was already partly discussed in the original manuscript when discussing the potential factors explaining the spatial differences between our study and motor cortex studies:

      “Second, the hypothesis for a more focal HF compared to LF signals is mostly supported by recordings performed in a single region, the sensorimotor cortex (Miller et al., 2007; Crone et al., 1998; Pfurtscheller et al., 2003; Hermes et al., 2012), which largely consist of primary cortices. In contrast, here we recorded across a very large cortical region, the VOTC, composed of many different areas with various cortical geometries and cytoarchitectonic properties. Moreover, by recording higher-order category-selective activity, we measured activity confined to associative areas. Both neuronal density (Collins et al., 2010; Turner et al., 2016) and myelination (Bryant and Preuss, 2018) are substantially lower in associative cortices than in primary cortices in primates, and these factors may thus contribute to the lack of spatial extent difference between HF and LF observed here as compared to previous reports.” (p. 29-30).

      Also in the same section (p. 30) we refer to the type of signals compared in previous motor cortex studies:

      “Third, previous studies compared the spatial properties of an increase (relative to baseline) in HF amplitude to the spatial properties of a decrease (i.e. event-related desynchronization) of LF amplitude in the alpha and beta frequency ranges (Crone et al.,1998; 2001; Pfurtscheller et al., 2003; Miller et al., 2007; Hermes et al., 2012). This comparison may be unwarranted due to likely different mechanisms, brain networks and cortical layers involved in generating neuronal increases and decreases (e.g., input vs. modulatory signal, Pfurtscheller and Lopes da Silva, 1999; Schroeder and Lakatos, 2009). In the current study, our frequency-domain analysis makes no assumption about the increase and decrease of signals by face relative to non-face stimuli.”

      In the original submission, we also acknowledged that the functional correspondence between LF and HF signals is not at ceiling (p. 31) :

      “We acknowledge that the correlations found here are not at ceiling and that there were also slight offsets in the location of maximum amplitude across signals along electrode arrays (Figures 5 and 6). This lack of a complete functional overlap between LF and HF is also in line with previous reports of slightly different selectivity and functional properties across these signals, such as a different sensitivity to spatial summation (Winawer et al., 2013), to selective attention (Davidesko et al., 2013) or to stimulus repetition (Privmann et al., 2011). While part of these differences may be due to methodological differences in signal quantification, they also underline that these signals are not always strongly related, due to several factors. For instance, although both signals involve post-synaptic (i.e., dentritic) neural events, they nevertheless have distinct neurophysiological origins (that are not yet fully understood; see Buszaki, 2012; Leszczyński et al., 2020; Miller et al., 2009). In addition, these differing neurophysiological origins may interact with the precise setting of the recording sites capturing these signals (e.g., geometry/orientation of the neural sources relative to the recording site, cortical depth in which the signals are measured).”

      Additional arguments regarding the generalizability can be found in the added section of the discussion as mentioned above.

      Finally, the study relies on depth electrodes, which differs from some prior work on broadband signals using surface electrodes. Depth electrodes (stereotactic EEG) are in quite wide use so this too is not a criticism of the methods. Nonetheless, an important question is the degree to which the conclusions generalize, and surface electrodes, which tend to have higher SNR for broadband measures, might, in principle, show a different pattern than that observed her.

      This is an interesting point, which cannot be addressed in our study obviously. We agree with the reviewer’s point. However, in contrast to ECoG, which is restricted to superficial cortical layers and gyri, SEEG has the advantages of sampling all cortical layers and a wide range anatomical structures (gyri, sulci, deep structures as medial temporal structures. Therefore, we believe that using SEEG ensures maximal generalizability of our findings. Overall, the relatively low spatial resolution of these 2 recording methods (i.e., several millimeters) compared the average cortical thickness (~2-3 mm) makes it very unlikely that SEEG and ECOG would reveal different patterns of LF-HF functional correspondence.

      We added this point in a new section on the generalizability of our findings at the end of the Discussion (p.33, line 896).

      Overall, the large study and elegant approach have led to some provocative conclusions that will likely challenge near-consensus views in the field. It is an important step forward in the quantitate analysis of human neuroscience measurements.

      We sincerely thank the reviewer for his/her appreciation of our work

      Reviewer #3 (Public Review):

      Jacques et al. aim to assess properties of low and high-frequency signal content in intracranial stereo encephalography data in the human associative cortex using a frequency tagging paradigm using face stimuli. In the results, a high correspondence between high- and low-frequency content in terms of concordant dynamics is highlighted. The major critique is that the assessment in the way it was performed is not valid to disambiguate neural dynamics of responses in low- and high-frequency frequency bands and to make general claims about their selectivity and interplay.

      The periodic visual stimulation induces a sharp non-sinusoidal transient impulse response with power across all frequencies (see Fig. 1D time-frequency representation). The calculated mean high-frequency amplitude envelope will therefore be dependent on properties of the used time-frequency calculation as well as noise level (e.g. 1/f contributions) in the chosen frequency band, but it will not reflect intrinsic high-frequency physiology or dynamics as it reflects spectral leakage of the transient response amplitude envelope. For instance, one can generate a synthetic non-sinusoidal signal (e.g., as a sum of sine + a number of harmonics) and apply the processing pipeline to generate the LF and HF components as illustrated in Fig. 1. This will yield two signals which will be highly similar regardless of how the LF component manifests. The fact that the two low and high-frequency measures closely track each other in spatial specificity and amplitudes/onset times and selectivity is due to the fact that they reflect exactly the same signal content. It is not possible with the measures as they have been calculated here to disambiguate physiological low- and high-frequency responses in a general way, e.g., in the absence of such a strong input drive.

      The reviewer expresses strong concerns that our measure of HF activity is merely a reflection of spectral leakage from (lower-frequencies) evoked responses. In other words, physiological HF activity would not exist in our dataset and would be artificially created by our analyses. We should start by mentioning that this comment is in no way specific to our study, but could in fact be directed at all electrophysiological studies measuring stimulus-driven responses in higher frequency bands.

      Reviewer 2 also commented on the possible contamination of evoked response in HF signal.

      This was actually a potential (minor) concern given the time-frequency (wavelet) parameters used in the original manuscript. Indeed, the frequency bandwidth (as measured as half width at half maximum) of the wavelet used at the lower bound (30Hz) of the HF signal extended to 11Hz (i.e., half width at half maximum = 19 Hz). At 40Hz, the bandwidth extended to 24Hz (i.e., HWHM = 16 Hz). While low-frequency face-selective responses at that range (above 16 Hz) are negligible (see e.g., Retter & Rossion, 2016; and data below for the present study), they could have potentially slightly contaminated the high frequency activity indeed.

      To ensure that our findings cannot be explained by such a contamination, we recomputed the HF signal using wavelet with a smaller frequency bandwidth and changed the frequency range to 40-160Hz. This ensures that the lowest frequency included in the HF signal (defined as the bottom of the frequency range minus half of the frequency bandwidth, i.e., half width at half maximum) was 30 Hz. This was well above the highest significant harmonic of face-selective response in our FPVS experiment which was 22.8 Hz (defined as the harmonic of face frequency where, at group level, the number of recording contacts with a significant response was not higher than the number of significant contacts detected for noise in bins surrounding harmonics of the face frequency, see figure below). This ensures that the signal measured in the 40-160Hz range is not contaminated by lower frequency evoked responses.

      We recomputed all analyses and statistics from the manuscript with the new HF definition. Overall, this change had very little impact on the findings, except for slightly lower correlation between HF and LF (in Occipital and Anterior temporal lobe) when using single recording contacts as unit data points (Note that we slightly modified the way we compute the maximal expected correlation. Originally we used the test-retest reliability averaged over LF and HF; now we use the lower reliability value of the 2 signals, which is more correct since the lower reliability is the true upper limit of the correlation) This indicates that the HF activity was mostly independent from phase-locked LF signal already in the original submission. However, since the analyses with the revised time-frequency analyses parameters enforces this independence, we choose to keep the revised analyses as the main analyses in the manuscript.

      The manuscript was completely revised accordingly and all figures (main and supplementary) were modified to reflect the new analyses. We also extended the method section on HF analyses (p. 37) to indicate that HF parameters were selected to ensure independence of the HF signal from the LF evoked response, and provide additional information on wavelet frequency bandwidth.

      We believe our change in the time-frequency parameters and frequency range (40-160 Hz), the supplementary analyses using 80-160 Hz signal (per request of reviewer #2; see Figure 5 – figure supplement 4 and 5) and the fact that harmonics of the face frequency signal are not observed beyond ~23Hz, provide sufficient assurances that our findings are not driven by a contamination of HF signal by evoked/LF responses (i.e., spectral leakage).

      With respect to the comment of the reviewer on the 1/f contributions on frequency band computation, as indicated in the original manuscript, the HF amplitude envelope is converted to percent signal change, separately for each frequency bin over the HF frequency range, BEFORE averaging across frequency bands. This steps works as a normalization step to remove the 1/f bias and ensures that each frequency in the HF range contributes equally to the computed HF signal. This was added to the method section (HF analysis, p 38 (line 1038) ): ” This normalization step ensures that each frequency in the HF range contributes equally to the computed HF signal, despite the overall 1/f relationship between amplitude and frequency in EEG.”

      The connection of the calculated measures to ERPs for the low-frequency and population activity for the high-frequency measures for their frequency tagging paradigm is not clear and not validated, but throughout the text they are equated, starting from the introduction.

      The frequency-tagging approach is widely used in the electrophysiology literature (Norcia et al., 2015) and as such requires no further validation. In the case our particular design, the connection between frequency-domain and time-domain representation for low-frequencies has been shown in numerous of our publications with scalp EEG (Rossion et al., 2015; Jacques et al., 2016; Retter and Rossion, 2016; Retter et al., 2020). FPVS sequences can be segmented around the presentation of the face image (just like in a traditional ERP experiment) and averaged in the time-domain to reveal ERPs (e.g., Jacques et al., 2016; Retter and Rossion, 2016; Retter et al., 2020). Face-selectivity of these ERPs can be isolated by selectively removing the base rate frequencies through notch-filtering (e.g., Retter and Rossion, 2016; Retter et al., 2020). Further, we have shown that the face-selective ERPs generated in such sequences are independent of the periodicity, or temporal predictability, of the face appearance (Queck et al. 2017) and to a large extent to the frequency of face presentation (i.e., unless faces are presented too close to each other, i.e., below 400 ms interval; Retter and Rossion, 2016). The high frequency signal in our study is measured in the same manner as in other studies and we simply quantify the periodic amplitude modulation of the HF signal. HF responses in frequency-tagging paradigm has been measured before (e.g., Winawer et al., 2013). In the current manuscript, Figure 1 provides a rational and explanation of the methodology. We also think that our manuscript in itself provides a form of validation for the quantification of HF signal in our particular frequency-tagging setup.

    1. Author Response:

      Reviewer #1 (Public Review):

      Kursel et al. examined the evolution of synaptonemal complex proteins in C.elegans. While the sequence of the SC proteins evolved rapidly analysis of the structure of SC central region proteins from Caenorhabditis, Drosophila and mammalian species revealed that the length and placement of the coiled-coil domains, as well as overall protein length, were highly conserved across species. This conservation in the structure of coiled-coil proteins within the SC led to the proposal that the conserved structural parameters of the SC proteins and their coiled-coil domains could be used to identify central region components of the SC in species where components could not be identified on sequence conservation alone. Kursel et al demonstrated their parameters could be used to identify a transverse filament protein of the SC in the organism Pristionchus pacificus.

      Due to high sequence divergence identifying SC proteins in new model systems has been challenging. The identification by Kursel et al. of potential search parameters to identify these diverged proteins will be useful to the those who work on the synaptonemal complex. This approach has the potential to applicable to other types of proteins that show rapid sequence divergence. As the mammalian, fly, and worm SC proteins all displayed different lengths and placements of their coiled-coil domains within their SC proteins this approach is limited by the availability of related identified sequences to the model organism of interest. Additionally, this approach may still yield multiple candidates that fit the structural parameters which will require additional means to ultimately identify the protein of interest. The data in the manuscript supports the authors' claims of structural conservation within SC proteins but only additional applications of their search methods will reveal how useful it is to search for other types of proteins based on structural features.

      We thank the reviewer for their summary and feedback. We hope that with the ever-lowered costs of genome assembly and the expansion of CRISPR/Cas9 gene-editing capabilities, the pipeline we developed will be applicable to more clades and species. We agree that it will be interesting to expand our method beyond the SC. Going forward, we are excited to test whether it will enable us to identify other types of proteins, especially those that are part of condensates. In this light, our finding that centrosomal proteins are also enriched in the same evolutionary class as SC proteins is especially intriguing.

      Reviewer #2 (Public Review):

      In this article, Kursel and colleagues sought to identify evolutionary features of components of the SC the are evident in the absence of strict amino-acid conservation. After identifying three joint evolutionary properties of SC proteins - conservation of coiled-coil architecture, conservation of length and significant amino acid divergence - they show that these properties can be used to identify unknown SC proteins in divergent species. Overall, their general conclusion is very well supported and they do an excellent job functionally testing their approach by showing that one identified candidate for a novel SC protein in Pristionchus is in fact a component of the SC. In addition to providing new insight into the evolutionary forces that shape the evolution of SC proteins, this article provides new insight into how one might generally identify functionally similar or homologous proteins despite very deep divergence. Thus, this work has broader relevance to molecular evolution and evolution of protein structure.

      There are some places where smaller conclusions need more support. In particular, it is not entirely clear that this triple pattern - conservation of coiled-coil architecture, conservation of length and significant amino acid divergence - is broadly applicable to SC components beyond Dipterans and Nematodes. In particular, the pattern is weaker in Eutherian mammals. Some further investigation is needed to claim that the pattern is similar in mammals. In addition, it is not clear if coiled-coil conservation rather than simply having a coiled-coil domain is important as a mark of SC proteins. A comparison of coiled-coil conservation among proteins that have coiled-coil domains would be needed for this conclusion. Finally, there should be some additional clarification that not all nematode SC proteins have a pattern of insertion and deletion that is limited to regions outside of the coil-coil domains.

      We thank the reviewer for their appreciation of the broader impacts of our work to molecular evolution and for their suggestions for providing more support for our conclusions. We have addressed each of these points below (1. the evolutionary pattern in mammals, 2. the value of the coiled-coil conservation score, and 3. clarification of the indel analysis).

      1) As suggested, we have added dot plots comparing mammalian SC proteins to all other mammalian proteins for the three metrics central to this manuscript - amino acid substitutions per site, coiled-coil conservation scores and coefficient of variation of protein length. The plots (shown here) can be found in Figure 3 – figure supplement 4.

      These plots provide additional evidence that the evolutionary pattern of mammalian SC proteins is similar to (although weaker than) that of Caenorhabitis and Drosophila.

      In panel (A), we show the median amino acid substitutions per site of SC proteins is higher than other proteins in mammals, although the difference is not significant. We discuss two reasons why the divergence trend is weaker for mammalian SC proteins in the results. Briefly summarized they are, 1. The overall divergence of the mammalian proteome is less than that of the Caenorhabditis or Drosophila proteome, and 2. Mammalian SC proteins may face additional evolutionary constraints due to novel functions including mammalian-specific protein interactions.

      In panel (B), we show that mammalian SC proteins have a significantly higher coiled-coil conservation score than other proteins.

      In panel (C), we show coefficient of variation of protein length for mammalian SC proteins is not significantly different than other proteins. We hypothesize that this could be due to gene annotation errors which plague even very high-quality genomes. For example, we found annotation errors in 23 (18%) of the 125 Caenorhabditis SC proteins examined in this study. Uncorrected, these errors often read as large insertions or deletions, and artificially large coefficient of variation. We use L. africana SYCE3 to demonstrate how potential annotation errors could impact our measure of length variation in mammalian SC proteins. L. africana SYCE3 has conspicuous N- and C-terminal extensions not found in any other SYCE3. Excluding that single protein - L. africana SYCE3 – reduces the average length variation from 29% to 4% in the SYCE3 orthogroup, below the median of other proteins. Correspondingly, the median SC coefficient of variation of protein length drops from 20% (unfilled black circle) to 12% (dashed, unfilled circle). While systematic manual annotation of the Eutherian mammals proteomes is beyond the scope of this manuscript, we added in the Discussion explicit reference to the implications of annotation errors on our ability to systematically address evolutionary pressures affecting indels.

      2) We thank the reviewers for this important suggestion. Indeed, the inclusion of the few examples in Figure 2 were meant as demonstration rather than a statistical analysis. To create a group of proteins that would serve as appropriate control for conservation of the length and organization of the of coiled-coils, we selected orthogroups in which 90% of the proteins in the group had a coiled-coil domain of 21 amino acids or longer. This left 916 Caenorhabditis orthogroups including all SC proteins. We found that the median coiled-coil conservation score of SC proteins was significantly higher than that of the other coiled-coil proteins, confirming our comparisons to the entire proteome. We have included this analysis as a figure supplement to figure 2 (dot plot shown here and Figure 2 – figure supplement 1) and added text to the results and methods describing the analysis.

      More broadly, this result suggests that our coiled-coil conservation score is more informative than a binary measure of coiled-coil domain prediction (i.e. presence/absence of coiled-coil). The additional information contained in the coiled-coil conservation score likely comes from the fact that we take into account whether or not the coiled-coil domains are aligned across species; which reflects a higher degree of secondary structure conservation. We believe that future work to develop better measures of conservation of secondary structures will hone our ability to identify conservation of other protein classes.

      3) We have clarified this point in our revised manuscript, highlighting that when analyzed as a group, indels are excluded in coiled-coils of Caenorhabditis SC proteins, and that significance is also observed for specific SC proteins where enough indels are present to perform statistical tests. Two of the SC proteins, SYP-2 and SYP-3, had only two indels each, preventing us from performing tests of significance. We have also added text to the discussion directly addressing the limitations of automatically-assigned gene annotations on the ability to test evolutionary pressures on indels genome-wide.

      Reviewer #3 (Public Review):

      The manuscript "Unconventional conservation reveals structure-function relationships in the synaptonemal complex" by Kursel, Cope, and Rog, describes a novel bioinformatics analysis of proteins in the eukaryotic synaptonemal complex (SC). The SC is a highly conserved structure that links paired homologs in prophase of meiosis, and in most organisms is required for the successful completion of interhomolog recombination. An enigmatic feature of SC proteins is that they are highly diverged between organisms, to the point where they are nearly unrecognizable by sequence alone except among closely related organisms. Kursel et al show that within the Caenorhabditis family of nematodes, SC proteins show a reproducible pattern of coiled-coil segments and highly conserved overall length, while their primary sequences are extremely diverged. They use these findings to develop a method to identify new SC candidate proteins in a diverged nematode, Pristionchus pacificus, and confirm that one of these candidates is the main SC transverse filament protein in this organism. Finally, the authors expand their analysis to SC proteins in flies (Drosophila melanogaster and relatives) and eutherian mammals, and show similar findings in these protein families. In the discussion, the authors describe an interesting and compelling theory that the coiled coils of SC proteins directly support phase separation/condensation of these proteins to aid assembly of the SC superstructure.

      Overall, this work is well done, the findings are well-supported, and are of interest to meiosis researchers; especially those working directly on the SC. The manuscript is also well put-together: I could barely find a typo. From a broader perspective, however, I'm not convinced that the work provides a new paradigm for thinking about "conservation" in protein families and how to best detect it. Methods that use structural information to detect homology between highly diverged proteins beyond the capabilities of BLAST or even PSI-BLAST are well-developed (e.g. PHYRE2, HHPred, and others). The use of coiled-coil length as a metric for conservation, while it works nicely in the case of SC proteins, is likely to not be generalizable to other protein families. Even within SC proteins, the method does not seem to scale past specific families to, say, allow identification of homology between distantly-related eukaryotic groups (e.g. between Caenorhabditis and Drosophila or Caenorhabditis and eutherian mammals). To be fair, this failure to scale is not because of any limitation with the method; rather, simply that SC proteins diverge quickly through evolution. Overall, however, these limitations seem to limit the application of this method to the specialized case of SC proteins, thus limiting the audience and scope of the work.

      We appreciate the reviewer’s consideration of possible limitations of our study. However, we disagree that this method, and the insights gained from it, will be limited to SC proteins. A clear demonstration is that the centrosomal protein SPD-5 (Centrosomin in Drosophila, CdkRap2 in mammals) cannot be identified across clades using sequence homology despite performing a conserved and fundamental cellular function. We hypothesize that similar forces have shaped the evolution of SPD-5 and other centrosomal proteins that are enriched in the same evolutionary class as SC proteins (Figure 3 – figure supplement 1). Functional tests of these predictions will be an exciting area of future research.

      As this review notes, an exciting hypothesis stemming from our work is that proteins with diverged primary sequence and conserved secondary structures (coiled-coils, disordered protein domains or others) will be over-represented in condensates. Anecdotally this is indeed true, as both the SC and the centrosome were shown to be condensates. The burgeoning interest in condensates, and the development of tools to study them in vivo and in vitro, are bound to test the broad applicability of this hypothesis.

    1. Author Response:

      Evaluation Summary:

      The authors assessed multivariate relations between a dimensionality-reduced symptom space and brain imaging features, using a large database of individuals with psychosis-spectrum disorders (PSD). Demonstrating both high stability and reproducibility of their approaches, this work showed a promise that diagnosis or treatment of PSD can benefit from a proposed data-driven brain-symptom mapping framework. It is therefore of broad potential interest across cognitive and translational neuroscience.

      We are very grateful for the positive feedback and the careful read of our paper. We would especially like to thank the Reviewers for taking the time to read this lengthy and complex manuscript and for providing their helpful and highly constructive feedback. Overall, we hope the Editor and the Reviewers will find that our responses address all the comments and that the requested changes and edits improved the paper.

      Reviewer 1 (Public Review):

      The paper assessed the relationship between a dimensionality-reduced symptom space and functional brain imaging features based on the large multicentric data of individuals with psychosis-spectrum disorders (PSD).

      The strength of this study is that i) in every analysis, the authors provided high-level evidence of reproducibility in their findings, ii) the study included several control analyses to test other comparable alternatives or independent techniques (e.g., ICA, univariate vs. multivariate), and iii) correlating to independently acquired pharmacological neuroimaging and gene expression maps, the study highlighted neurobiological validity of their results.

      Overall the study has originality and several important tips and guidance for behavior-brain mapping, although the paper contains heavy descriptions about data mining techniques such as several dimensionality reduction algorithms (e.g., PCA, ICA, and CCA) and prediction models.

      We thank the Reviewer for their insightful comments and we appreciate the positive feedback. Regarding the descriptions of methods and analytical techniques, we have removed these descriptions out of the main Results text and figure captions. Detailed descriptions are still provided in the Methods, so that they do not detract from the core message of the paper but can still be referenced if a reader wishes to look up the details of these methods within the context of our analyses.

      Although relatively minors, I also have few points on the weaknesses, including i) an incomplete description about how to tell the PSD effects from the normal spectrum, ii) a lack of overarching interpretation for other principal components rather than only the 3rd one, and iii) somewhat expected results in the stability of PC and relevant indices.

      We are very appreciative of the constructive feedback and feel that these revisions have strengthened our paper. We have addressed these points in the revision as following:

      i) We are grateful to the Reviewer for bringing up this point as it has allowed us to further explore the interesting observation we made regarding shared versus distinct neural variance in our data. It is important to not confuse the neural PCA (i.e. the independent neural features that can be detected in the PSD and healthy control samples) versus the neuro-behavioral mapping. In other words, both PSD patients and healthy controls are human and therefore there are a number of neural functions that both cohorts exhibit that may have nothing to do with the symptom mapping in PSD patients. For instance, basic regulatory functions such as control of cardiac and respiratory cycles, motor functions, vision, etc. We hypothesized therefore that there are more common than distinct neural features that are on average shared across humans irrespective of their psychopathology status. Consequently, there may only be a ‘residual’ symptom-relevant neural variance. Therefore, in the manuscript we bring up the possibility that a substantial proportion of neural variance may not be clinically relevant. If this is in fact true then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance does not map to clinical features and therefore is orthogonal statistically. We have now verified this hypothesis quantitatively and have added extensive analyses to highlight this important observation made the the Reviewer. We first conducted a PCA using the parcellated GBC data from all 436 PSD and 202 CON (a matrix with dimensions 638 subjects x 718 parcels). We will refer to this as the GBC-PCA to avoid confusion with the symptom/behavioral PCA described elsewhere in the manuscript. This GBC-PCA resulted in 637 independent GBC-PCs. Since PCs are orthogonal to each other, we then partialled out the variance attributable to GBC-PC1 from the PSD data by reconstructing the PSD GBC matrix using only scores and coefficients from the remaining 636 GBC-PCs (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. The results are shown in Fig. S21 and reproduced below. Removing the first PC of shared neural variance (which accounted for about 15.8% of the total GBC variance across CON and PSD) from PSD data attenuated the statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution.

      We repeated the symptom-neural regression next with the first 2 GBC-PCs partialled out of the PSD data Fig. S22, with the first 3 PCs parsed out Fig. S23, and with the first 4 neural PCs parsed out Fig. S24. The symptom-neural maps remain fairly robust, although the similarity with the original βP CGBC maps does drop as more common neural variance is parsed out. These figures are also shown below:

      Fig. S21. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first neural PC parsed out. If a substantial proportion of neural variance is not be clinically relevant, then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance will not map to clinical features. We therefore performed a PCA on CON and PSD GBC to compute the shared neural variance (see Methods), and then parsed out the first GBC-PC from the PSD GBC data (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The first GBC-PC accounted for about 15.8% of the total GBC variance across CON and PSD. Removing GBC-PC1 from PSD data attenuated the βP C1GBC statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S22. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first two neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−2, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S23. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first three neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−3, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S24. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first four neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first four GBC-PC from the PSD GBC data (GBˆCwoP C1−4, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      For comparison, we also computed the βP CGBC maps for control subjects, shown in Fig. S11. In support of the βP CGBC in PSD being circuit-relevant, we observed only mild associations between GBC and PC scores in healthy controls:

      Results: All 5 PCs captured unique patterns of GBC variation across the PSD (Fig. S10), which were not observed in CON (Fig. S11). ... Discussion: On the contrary, this bi-directional “Psychosis Configuration” axis also showed strong negative variation along neural regions that map onto the sensory-motor and associative control regions, also strongly implicated in PSD (1, 2). The “bi-directionality” property of the PC symptom-neural maps may thus be desirable for identifying neural features that support individual patient selection. For instance, it may be possible that PC3 reflects residual untreated psychosis symptoms in this chronic PSD sample, which may reveal key treatment neural targets. In support of this circuit being symptom-relevant, it is notable that we observed a mild association between GBC and PC scores in the CON sample (Fig. S11).

      ii) In our original submission we spotlighted PC3 because of its pattern of loadings on to hallmark symptoms of PSD, including strong positive loadings across Positive symptom items in the PANSS and conversely strong negative loadings on to most Negative items. It was necessary to fully examine this dimension in particular because these are key characteristics of the target psychiatric population, and we found that the focus on PC3 was innovative because it provided an opportunity to quantify a fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. This is a powerful demonstration of how data-driven techniques such as PCA can reveal properties intrinsic to the structure of PSD-relevant symptom data which may in turn improve the mapping of symptom-neural relationships. We refrained from explaining each of the five PCs in detail in the main text as we felt that it would further complicate an already dense manuscript. Instead, we opted to provide the interpretation and data from all analyses for all five PCs in the Supplement. However, in response to the Reviewers’ thoughtful feedback that more focus should be placed on other components, we have expanded the presentation and discussion of all five components (both regarding the symptom profiles and neural maps) in the main text:

      Results: Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive loadings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      iii) We felt that demonstrating the stability of the PCA solution was extremely important, given that this degree of rigor has not previously been tested using broad behavioral measures across psychosis symptoms and cognition in a cross-diagnostic PSD sample. Additionally, we demonstrated reproducibility of the PCA solution using independent split-half samples. Furthermore, we derived stable neural maps using the PCA solution. In our original submission we show that the CCA solution was not reproducible in our dataset. Following the Reviewers’ feedback, we computed the estimated sample sizes needed to sufficiently power our multivariate analyses for stable/reproducible solutions. using the methods in (3). These results are discussed in detail in our resubmitted manuscript and in our response to the Critiques section below.

      Reviewer 2 (Public Review):

      The work by Ji et al is an interesting and rather comprehensive analysis of the trend of developing data-driven methods for developing brain-symptom dimension biomarkers that bring a biological basis to the symptoms (across PANSS and cognitive features) that relate to psychotic disorders. To this end, the authors performed several interesting multivariate analyses to decompose the symptom/behavioural dimensions and functional connectivity data. To this end, the authors use data from individuals from a transdiagnostic group of individuals recruited by the BSNIP cohort and combine high-level methods in order to integrate both types of modalities. Conceptually there are several strengths to this paper that should be applauded. However, I do think that there are important aspects of this paper that need revision to improve readability and to better compare the methods to what is in the field and provide a balanced view relative to previous work with the same basic concepts that they are building their work around. Overall, I feel as though the work could advance our knowledge in the development of biomarkers or subject level identifiers for psychiatric disorders and potentially be elevated to the level of an individual "subject screener". While this is a noble goal, this will require more data and information in the future as a means to do this. This is certainly an important step forward in this regard.

      We thank the Reviewer for their insightful and constructive comments about our manuscript. We have revised the text to make it easier to read and to clarify our results in the context of prior works in the field. We fully agree that a great deal more work needs to be completed before achieving single-subject level treatment selection, but we hope that our manuscript provides a helpful step towards this goal.

      Strengths:

      • Combined analysis of canonical psychosis symptoms and cognitive deficits across multiple traditional psychosis-related diagnoses offers one of the most comprehensive mappings of impairments experienced within PSD to brain features to date
      • Cross-validation analyses and use of various datasets (diagnostic replication, pharmacological neuroimaging) is extremely impressive, well motivated, and thorough. In addition the authors use a large dataset and provide "out of sample" validity
      • Medication status and dosage also accounted for
      • Similarly, the extensive examination of both univariate and multivariate neuro-behavioural solutions from a methodological viewpoint, including the testing of multiple configurations of CCA (i.e. with different parcellation granularities), offers very strong support for the selected symptom-to-neural mapping
      • The plots of the obtained PC axes compared to those of standard clinical symptom aggregate scales provide a really elegant illustration of the differences and demonstrate clearly the value of data-driven symptom reduction over conventional categories
      • The comparison of the obtained neuro-behavioural map for the "Psychosis configuration" symptom dimension to both pharmacological neuroimaging and neural gene expression maps highlights direct possible links with both underlying disorder mechanisms and possible avenues for treatment development and application
      • The authors' explicit investigation of whether PSD and healthy controls share a major portion of neural variance (possibly present across all people) has strong implications for future brain-behaviour mapping studies, and provides a starting point for narrowing the neural feature space to just the subset of features showing symptom-relevant variance in PSD

      We are very grateful for the positive feedback. We would like to thank the Reviewers for taking the time to read this admittedly dense manuscript and for providing their helpful critique.

      Critiques:

      • Overall I found the paper very hard to read. There are abbreviation everywhere for every concept that is introduced. The paper is methods heavy (which I am not opposed to and quite like). It is clear that the authors took a lot of care in thinking about the methods that were chosen. That said, I think that the organization would benefit from a more traditional Intro, Methods, Results, and Discussion formatting so that it would be easier to parse the Results. The figures are extremely dense and there are often terms that are coined or used that are not or poorly defined.

      We appreciate the constructive feedback around how to remove the dense content and to pay more attention to the frequency of abbreviations, which impact readability. We implemented the strategies suggested by the Reviewer and have moved the Methods section after the Introduction to make the subsequent Results section easier to understand and contextualize. For clarity and length, we have moved methodological details previously in the Results and figure captions to the Methods (e.g. descriptions of dimensionality reduction and prediction techniques). This way, the Methods are now expanded for clarity without detracting from the readability of the core results of the paper. Also, we have also simplified the text in places where there was room for more clarity. For convenience and ease of use of the numerous abbreviations, we have also added a table to the Supplement (Supplementary Table S1).

      • One thing I found conceptually difficult is the explicit comparison to the work in the Xia paper from the Satterthwaite group. Is this a fair comparison? The sample is extremely different as it is non clinical and comes from the general population. Can it be suggested that the groups that are clinically defined here are comparable? Is this an appropriate comparison and standard to make. To suggest that the work in that paper is not reproducible is flawed in this light.

      This is an extremely important point to clarify and we apologize that we did not make it sufficiently clear in the initial submission. Here we are not attempting to replicate the results of Xia et al., which we understand were derived in a fundamentally different sample than ours both demographically and clinically, with testing very different questions. Rather, this paper is just one example out of a number of recent papers which employed multivariate methods (CCA) to tackle the mapping between neural and behavioral features. The key point here is that this approach does not produce reproducible results due to over-fitting, as demonstrated robustly in the present paper. It is very important to highlight that in fact we did not single out any one paper when making this point. In fact, we do not mention the Xia paper explicitly anywhere and we were very careful to cite multiple papers in support of the multivariate over-fitting argument, which is now a well-know issue (4). Nevertheless, the Reviewers make an excellent point here and we acknowledge that while CCA was not reproducible in the present dataset, this does not explicitly imply that the results in the Xia et al. paper (or any other paper for that matter) are not reproducible by definition (i.e. until someone formally attempts to falsify them). We have made this point explicit in the revised paper, as shown below. Furthermore, in line with the provided feedback, we also applied the multivariate power calculator derived by Helmer et al. (3), which quantitatively illustrates the statistical point around CCA instability.

      Results: Several recent studies have reported “latent” neuro-behavioral relationships using multivariate statistics (5–7), which would be preferable because they simultaneously solve for maximal covariation across neural and behavioral features. Though concerns have emerged whether such multivariate results will replicate due to the size of the feature space relative to the size of the clinical samples (4), Given the possibility of deriving a stable multivariate effect, here we tested if results improve with canonical correlation analysis (CCA) (8) which maximizes relationships between linear combinations of symptom (B) and neural features (N) across all PSD (Fig. 5A).

      Discussion: Here we attempted to use multivariate solutions (i.e. CCA) to quantify symptom and neural feature co- variation. In principle, CCA is well-suited to address the brain-behavioral mapping problem. However, symptom-neural mapping using CCA across either parcel-level or network-level solutionsin our sample was not reproducible even when using a low-dimensional symptom solution and parcellated neural data as a starting point. Therefore, while CCA (and related multivariate methods such as partial least squares) are theoretically appropriate and may be helped by regularization methods such as sparse CCA, in practice many available psychiatric neuroimaging datasets may not provide sufficient power to resolve stable multivariate symptom-neural solutions (3). A key pressing need for forthcoming studies will be to use multivariate power calculators to inform sample sizes needed for resolving stable symptom-neural geometries at the single subject level. Of note, though we were unable to derive a stable CCA in the present sample, this does not imply that the multivariate neuro-behavioral effect may not be reproducible with larger effect sizes and/or sample sizes. Critically, this does highlight the importance of power calculations prior to computing multivariate brain-behavioral solutions (3).

      • Why was PCA selected for the analysis rather than ICA? Authors mention that PCA enables the discovery of orthogonal symptom dimensions, but don't elaborate on why this is expected to better capture behavioural variation within PSD compared to non-orthogonal dimensions. Given that symptom and/or cognitive items in conventional assessments are likely to be correlated in one way or another, allowing correlations to be present in the low-rank behavioural solution may better represent the original clinical profiles and drive more accurate brain-behaviour mapping. Moreover, as alluded to in the Discussion, employing an oblique rotation in the identification of dimensionality-reduced symptom axes may have actually resulted in a brain-behaviour space that is more generalizable to other psychiatric spectra. Why not use something more relevant to symptom/behaviour data like a factor analysis?

      This is a very important point! We agree with the Reviewer that an oblique solution may better fit the data. For this reason, we performed an ICA as shown in the Supplement. We chose to show PCA for the main analyses here because it is a deterministic solution and the number of significant components could be computed via permutation testing. Importantly, certain components from the ICA solution in this sample were highly similar to the PCs shown in the main solution (Supplementary Note 1), as measured by comparing the subject behavioral scores (Fig. S4), and neural maps (Fig. S13). However, notably, certain components in the ICA and PCA solutions did not appear to have a one-to-one mapping (e.g. PCs 1-3 and ICs 1-3). The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps. The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps. We have added this language to the main text Results:

      Notably, independent component analysis (ICA), an alternative dimensionality reduction procedure which does not enforce component orthogonality, produced similar effects for this PSD sample, see Supplementary Note 1 & Fig. S4A). Certain pairs of components between the PCA and ICA solutions appear to be highly similar and exclusively mapped (IC5 and PC4; IC4 and PC5) (Fig. S4B). On the other hand, PCs 1-3 and ICs 1-3 do not exhibit a one-to-one mapping. For example, PC3 appears to correlate positively with IC2 and equally strongly negatively with IC3, suggesting that these two ICs are oblique to the PC and perhaps reflect symptom variation that is explained by a single PC. The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps Fig. ??G). The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps.

      Additionally, the Reviewer raises an important point, and we agree that orthogonal versus oblique solutions warrant further investigation especially with regards to other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of behavioral variation in prodromal individuals, as these individuals are in the early stages of exhibiting psychosis-relevant symptoms and may show early diverging of dimensions of behavioral variation. We elaborate on this further in the Discussion:

      Another important aspect that will require further characterization is the possibility of oblique axes in the symptom-neural geometry. While orthogonal axes derived via PCA were appropriate here and similar to the ICA-derived axes in this solution, it is possible that oblique dimensions more clearly reflect the geometry of other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of neuro-behavioral variation in a sample of prodromal individuals, as these patients are exhibiting early-stage psychosis-like symptoms and may show signs of diverging along different trajectories.

      Critically, these factors should constitute key extensions of an iteratively more robust model for indi- vidualized symptom-neural mapping across the PSD and other psychiatric spectra. Relatedly, it will be important to identify the ‘limits’ of a given BBS solution – namely a PSD-derived effect may not generalize into the mood spectrum (i.e. both the symptom space and the resulting symptom-neural mapping is orthogonal). It will be important to evaluate if this framework can be used to initialize symptom-neural mapping across other mental health symptom spectra, such as mood/anxiety disorders.

      • The gene expression mapping section lacks some justification for why the 7 genes of interest were specifically chosen from among the numerous serotonin and GABA receptors and interneuron markers (relevant for PSD) available in the AHBA. Brief reference to the believed significance of the chosen genes in psychosis pathology would have helped to contextualize the observed relationship with the neuro-behavioural map.

      We thank the Reviewer for providing this suggestion and agree that it will strengthen the section on gene expression analysis. Of note, we did justify the choice for these genes, but we appreciate the opportunity to expand on the neurobiology of selected genes and their relevance to PSD. We have made these edits to the text:

      We focus here on serotonin receptor subunits (HTR1E, HTR2C, HTR2A), GABA receptor subunits (GABRA1, GABRA5), and the interneuron markers somatostatin (SST) and parvalbumin (PVALB). Serotonin agonists such as LSD have been shown to induce PSD-like symptoms in healthy adults (9) and the serotonin antagonism of “second-generation” antipsychotics are thought to contribute to their efficacy in targeting broad PSD symptoms (10–12). Abnormalities in GABAergic interneurons, which provide inhibitory control in neural circuits, may contribute to cognitive deficits in PSD (13–15) and additionally lead to downstream excitatory dysfunction that underlies other PSD symptoms (16, 17). In particular, a loss of prefrontal parvalbumin-expression fast-spiking interneurons has been implicated in PSD (18–21).

      • What the identified univariate neuro-behavioural mapping for PC3 ("psychosis configuration") actually means from an empirical or brain network perspective is not really ever discussed in detail. E.g., in Results, "a high positive PC3 score was associated with both reduced GBC across insular and superior dorsal cingulate cortices, thalamus, and anterior cerebellum and elevated GBC across precuneus, medial prefrontal, inferior parietal, superior temporal cortices and posterior lateral cerebellum." While the meaning and calculation of GBC can be gleaned from the Methods, a direct interpretation of the neuro-behavioural results in terms of the types of symptoms contributing to PC3 and relative hyper-/hypo-connectivity of the DMN compared to e.g. healthy controls could facilitate easier comparisons with the findings of past studies (since GBC does not seem to be a very commonly-used measure in the psychosis fMRI literature). Also important since GBC is a summary measure of the average connectivity of a region, and doesn't provide any specificity in terms of which regions in particular are more or less connected within a functional network (an inherent limitation of this measure which warrants further attention).

      We acknowledge that GBC is a linear combination measure that by definition does not provide information on connectivity between any one specific pair of neural regions. However, as shown by highly robust and reproducible neurobehavioral maps, GBC seems to be suitable as a first-pass metric in the absence of a priori assumptions of how specific regional connectivity may map to the PC symptom dimensions, and it has been shown to be sensitive to altered patterns of overall neural connectivity in PSD cohorts (22–25) as well as in models of psychosis (9, 26). Moreover, it is an assumption free method for dimensionality reduction of the neural connectivity matrix (which is a massive feature space). Furthermore, GBC provides neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices), which were necessary for quantifying the relationship with independent molecular benchmark maps (i.e. pharmacological maps and gene expression maps). We do acknowledge that there are limitations to the method which we now discuss in the paper. Furthermore we agree with the Reviewer that the specific regions implicated in these symptom-neural relationships warrants a more detailed investigation and we plan to develop this further in future studies, such as with seed-based functional connectivity using regions implicated in PSD (e.g. thalamus (2, 27)) or restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions. We have provided elaboration and clarification regarding this point in the Discussion:

      Another improvement would be to optimize neural data reduction sensitivity for specific symptom variation (28). We chose to use GBC for our initial geometry characterizations as it is a principled and assumption-free data-reduction metric that captures (dys)connectivity across the whole brain and generates neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices) that are necessary for benchmarking against molecular imaging maps. However, GBC is a summary measure that by definition does not provide information regarding connectivity between specific pairs of neural regions, which may prove to be highly symptom-relevant and informative. Thus symptom-neural relationships should be further explored with higher-resolution metrics, such as restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions, or seed-based FC using regions implicated in PSD (e.g. thalamus (2, 27)).

      • Possibly a nitpick, but while the inclusion of cognitive measures for PSD individuals is a main (self-)selling point of the paper, there's very limited focus on the "Cognitive functioning" component (PC2) of the PCA solution. Examining Fig. S8K, the GBC map for this cognitive component seems almost to be the inverse for that of the "Psychosis configuration" component (PC3) focused on in the rest of the paper. Since PC3 does not seem to have high loadings from any of the cognitive items, but it is known that psychosis spectrum individuals tend to exhibit cognitive deficits which also have strong predictive power for illness trajectory, some discussion of how multiple univariate neuro-behavioural features could feasibly be used in conjunction with one another could have been really interesting.

      This is an important piece of feedback concerning the cognitive measure aspect of the study. As the Reviewer recognizes, cognition is a core element of PSD symptoms and the key reason for including this symptom into the model. Notably, the finding that one dimension captures a substantial proportion of cognitive performance-related variance, independent of other residual symptom axes, has not previously been reported and we fully agree that expanding on this effect is important and warrants further discussion. We would like to take two of the key points from the Reviewers’ feedback and expand further. First, we recognize that upon qualitative inspection PC2 and PC3 neural maps appear strongly anti-correlated. However, as demonstrated in Fig. S9O, PC2 and PC3 maps were anti-correlated at r=-0.47. For comparison, the PC2 map was highly anti-correlated with the BACS composite cognitive map (r=-0.81). This implies that the PC2 map in fact reflects unique neural circuit variance that is relevant for cognition, but not necessarily an inverse of the PC3.

      In other words, these data suggest that there are PSD patients with more (or less) severe cognitive deficits independent of any other symptom axis, which would be in line with the observation that these symptoms are not treatable with antipsychotic medication (and therefore should not correlate with symptoms that are treatable by such medications; i.e. PC3). We have now added these points into the revised paper:

      Results Fig. 1E highlights loading configurations of symptom measures forming each PC. To aid interpretation, we assigned a name for each PC based on its most strongly weighted symptom measures. This naming is qualitative but informed by the pattern of loadings of the original 36 symptom measures (Fig. 1). For example, PC1 was highly consistent with a general impairment dimension (i.e. “Global Functioning”); PC2 reflected more exclusively variation in cognition (i.e. “Cognitive Functioning”); PC3 indexed a complex configuration of psychosis-spectrum relevant items (i.e. “Psy- chosis Configuration”); PC4 generally captured variation mood and anxiety related items (i.e. “Affective Valence”); finally, PC5 reflected variation in arousal and level of excitement (i.e. “Agitation/Excitation”). For instance, a generally impaired patient would have a highly negative PC1 score, which would reflect low performance on cognition and elevated scores on most other symptomatic items. Conversely, an individual with a high positive PC3 score would exhibit delusional, grandiose, and/or hallucinatory behavior, whereas a person with a negative PC3 score would exhibit motor retardation, social avoid- ance, possibly a withdrawn affective state with blunted affect (29). Comprehensive loadings for all 5 PCs are shown in Fig. 3G. Fig. 1F highlights the mean of each of the 3 diagnostic groups (colored spheres) and healthy controls (black sphere) projected into a 3-dimensional orthogonal coordinate system for PCs 1,2 & 3 (x,y,z axes respectively; alternative views of the 3-dimensional coordinate system with all patients projected are shown in Fig. 3). Critically, PC axes were not parallel with traditional aggregate symptom scales. For instance, PC3 is angled at 45◦ to the dominant direction of PANSS Positive and Negative symptom variation (purple and blue arrows respectively in Fig. 1F). ... Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive load- ings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      Another nitpick, but the Y axes of Fig. 8C-E are not consistent, which causes some of the lines of best fit to be a bit misleading (e.g. GABRA1 appears to have a more strongly positive gene-PC relationship than HTR1E, when in reality the opposite is true.)

      We have scaled each axis to best show the data in each plot but see how this is confusing and recognise the need to correct this. We have remade the plots with consistent axes labelling.

      • The authors explain the apparent low reproducibility of their multivariate PSD neuro-behavioural solution using the argument that many psychiatric neuroimaging datasets are too small for multivariate analyses to be sufficiently powered. Applying an existing multivariate power analysis to their own data as empirical support for this idea would have made it even more compelling. The following paper suggests guidelines for sample sizes required for CCA/PLS as well as a multivariate calculator: Helmer, M., Warrington, S. D., Mohammadi-Nejad, A.-R., Ji, J. L., Howell, A., Rosand, B., Anticevic, A., Sotiropoulos, S. N., & Murray, J. D. (2020). On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations (p. 2020.08.25.265546). https://doi.org/10.1101/2020.08.25.265546

      We deeply appreciate the Reviewer’s suggestion and the opportunity to incorporate the methods from the Helmer et al. paper. We now highlight the importance of having sufficiently powered samples for multivariate analyses in our other manuscript first-authored by our colleague Dr. Markus Helmer (3). Using the method described in the above paper (GEMMR version 0.1.2), we computed the estimated sample sizes required to power multivariate CCA analyses with 718 neural features and 5 behavioral (PC) features (i.e. the feature set used throughout the rest of the paper):

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required.

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required. We also computed the estimated sample sizes required for 180 neural features (symmetrized neural cortical parcels) and 5 symptom PC features, consistent with the CCA reported in our main text:

      Assuming that rtrue is likely below 0.3, this minimal required sample size remains at least an order of magnitude greater than the size of our present sample, consistent with the finding that the CCA solution computed using these data was unstable. As a lower limit for the required sample size plausible using the feature sets reported in our paper, we additionally computed for comparison the estimated N needed with the smallest number of features explored in our analyses, i.e. 12 neural functional network features and 5 symptom PC features:

      These required sample sizes are closer to the N=436 used in the present sample and samples reported in the clinical neuroimaging literature. This is consistent with the observation that when using 12 neural and 5 symptom features (Fig. S15C) the detected canonical correlation r = 0.38 for CV1 is much lower (and likely not inflated due to overfitting) and may be closer to the true effect because with the n=436 this effect is resolvable. This is in contrast to the 180 neural features and 5 symptom feature CCA solution where we observed a null CCA effect around r > 0.6 across all 5 CVs. This clearly highlights the inflation of the effect in the situation where the feature space grows. There is no a priori plausible reason to believe that the effect for 180 vs. 5 feature mapping is literally double the effect when using 12 vs. 5 feature mapping - especially as the 12 features are networks derived from the 180 parcels (i.e. the effect should be comparable rather than 2x smaller). Consequently, if the true CCA effect with 180 vs. 5 features was actually in the more comparable r = 0.38, we would need >5,000 subjects to resolve a reproducible neuro-behavioral CCA map (an order of magnitude more than in the BSNIP sample). Moreover, to confidently detect effects if rtrue is actually less than 0.3, we would require a sample size >8,145 subjects. We have added this to the Results section on our CCA results:

      Next, we tested if the 180-parcel CCA solution is stable and reproducible, as done with PC-to-GBC univariate results. The CCA solution was robust when tested with k-fold and leave-site-out cross- validation (Fig. S16) likely because these methods use CCA loadings derived from the full sample. However, the CCA loadings did not replicate in non-overlapping split-half samples (Fig. 5L, see see Supplementary Note 4). Moreover, a leave-one-subject-out cross-validation revealed that removing a single subject from the sample affected the CCA solution such that it did not generalize to the left-out subject (Fig. 5M). This is in contrast to the PCA-to-GBC univariate mapping, which was substantially more reproducible for all attempted cross-validations relative to the CCA approach. This is likely because substantially more power is needed to resolve a stable multivariate neuro-behavioral effect with this many features. Indeed, a multivariate power analysis using 180 neural features and 5 symptom features, and assuming a true canonical correlation of r = 0.3, suggests that a minimal sample size of N = 8145 is needed to sufficiently detect the effect (3), an order of magnitude greater than the available sample size. Therefore, we leverage the univariate neuro-behavioral result for subsequent subject-specific model optimization and comparisons to molecular neuroimaging maps.

      Additionally, we added the following to Supplementary Note 4: Establishing the Reproducibility of the CCA Solution:

      Here we outline the details of the split-half replication for the CCA solution. Specifically, the full patient sample was randomly split (referred to as “H1” and “H2” respectively), while preserving the proportion of patients in each diagnostic group. Then, CCA was performed independently for H1 and H2. While the loadings for behavioral PCs and original behavioral items are somewhat similar (mean r 0.5) between the two CCAs in each run, the neural loadings were not stable across H1 and H2 CCA solutions. Critically, CCA results did not perform well for leave-one-subject-out cross-validation (Fig. 5M). Here, one patient was held out while CCA was performed using all data from the remaining 435 patients. The loadings matrices Ψ and Θ from the CCA were then used to calculate the “predicted” neural and behavioral latent scores for all 5 CVs for the patient that was held out of the CCA solution. This process was repeated for every patient and the final result was evaluated for reproducibility. As described in the main text, this did not yield reproducible CCA effects (Fig. 5M). Of note, CCA may yield higher reproducibility if the neural feature space were to be further reduced. As noted, our approach was to first parcellate the BOLD signal and then use GBC as a data-driven method to yield a neuro-biologically and quantitatively interpretable neural data reduction, and we additionally symmetrized the result across hemispheres. Nevertheless, in sharp contrast to the PCA univariate feature selection approach, the CCA solutions were still not stable in the present sample size of N = 436. Indeed, a multivariate power analysis (3) estimates that the following sample sizes will be required to sufficiently power a CCA between 180 neural features and 5 symptom features, at different levels of true canonical correlation (rtrue):

      To test if further neural feature space reduction may be improve reproducibility, we also evaluated CCA solutions with neural GBC parcellated according to 12 brain-wide functional networks derived from the recent HCP driven network parcellation (30). Again, we computed the CCA for all 36 item-level symptom as well as 5 PCs (Fig. S15). As with the parcel-level effects, the network-level CCA analysis produced significant results (for CV1 when using 36 item-level scores and for all 5 CVs when using the 5 PC-derived scores). Here the result produced much lower canonical correlations ( 0.3-0.5); however, these effects (for CV1) clearly exceeded the 95% confidence interval generated via random permutations, suggesting that they may reflect the true canonical correlation. We observed a similar result when we evaluated CCAs computed with neural GBC from 192 symmetrized subcortical parcels and 36 symptoms or 5 PCs (Fig. S14). In other words, data-reducing the neural signal to 12 functional networks likely averaged out parcel-level information that may carry symptom-relevant variance, but may be closer to capturing the true effect. Indeed, the power analysis suggests that the current sample size is closer to that needed to detect an effect with 12 + 5 features:

      Note that we do not present a CCA conducted with parcels across the whole brain, as the number of variables would exceed the number of observations. However, the multivariate power analysis using 718 neural features and 5 symptom features estimates that the following sample sizes would be required to detect the following effects:

      This analysis suggests that even the lowest bound of 10k samples exceeds the present available sample size by two orders of magnitude.

      We have also added Fig. S19, illustrating these power analyses results:

      Fig. S19. Multivariate power analysis for CCA. Sample sizes were calculated according to (3), see also https://gemmr.readthedocs.io/en/latest/. We computed the multivariate power analyses for three versions of CCA reported in this manuscript: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features. (A) At different levels of features, the ratio of samples (i.e. subjects) required per feature to derive a stable CCA solution remains approximately the same across all values of rtrue. As discussed in (3), at rtrue = 0.3 the number of samples required per feature is about 40, which is much greater than the ratio of samples to features available in our dataset. (B) The total number of samples required (nreq)) for a stable CCA solution given the total number of neural and symptom features used in our analyses, at different values of rtrue. In general these required sample sizes are much greater than the N=436 (light grey line) PSD in our present dataset, consistent with the finding that the CCA solutions computed using our data were unstable. Notably, the ‘12 vs. 5’ CCA assuming rtrue = 0.3 requires only 700 subjects, which is closest to the N=436 (horizontal grey line) used in the present sample. This may be in line with the observation of the CCA with 12 neural vs 5 symptom features (Fig. S15C) that the canonical correlation (r = 0.38 for CV1) clearly exceeds the 95% confidence interval, and may be closer to the true effect. However, to confidently detect effects in such an analysis (particularly if rtrue is actually less than 0.3), a larger sample would likely still be needed.

      We also added the corresponding methods in the Methods section:

      Multivariate CCA Power Analysis. Multivariate power analyses to estimate the minimum sample size needed to sufficiently power a CCA were computed using methods described in (3), using the Genera- tive Modeling of Multivariate Relationships tool (gemmr, https://github.com/murraylab/ gemmr (v0.1.2)). Briefly, a model was built by: 1) Generating synthetic datasets for the two input data matrices, by sampling from a multivariate normal distribution with a joint covariance matrix that was structured to encode CCA solutions with specified properties; 2) Performing CCAs on these synthetic datasets. Because the joint covariance matrix is known, the true values of estimated association strength, weights, scores, and loadings of the CCA, as well as the errors for these four metrics, can also be computed. In addition, statistical power that the estimated association strength is different from 0 is determined through permutation testing; 3) Varying parameters of the generative model (number of features, assumed true between-set correlation, within-set variance structure for both datasets) the required sample size Nreq is determined in each case such that statistical power reaches 90% and all of the above described error metrics fall to a target level of 10%; and 4) Fitting and validating a linear model to predict the required sample size Nreq from parameters of the generative model. This linear model was then used to calculate Nreq for CCA in three data scenarios: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features.

      • Given the relatively even distribution of males and females in the dataset, some examination of sex effects on symptom dimension loadings or neuro-behavioural maps would have been interesting (other demographic characteristics like age and SES are summarized for subjects but also not investigated). I think this is a missed opportunity.

      We have now provided additional analyses for the core PCA and univariate GBC mapping results, testing for effects of age, sex, and SES in Fig. S8. Briefly, we observed a significant positive relationship between age and PC3 scores, which may be because older patients (whom presumably have been ill for a longer time) exhibit more severe symptoms along the positive PC3 – Psychosis Configuration dimension. We also observed a significant negative relationship between Hollingshead index of SES and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). We also found significant sex differences in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores.

      Fig. S8. Effects of age, socio-economic status, and sex on symptom PCA solution. (A) Correlations between symptom PC scores and age (years) across N=436 PSD. Pearson’s correlation value and uncorrected p-values are reported above scatterplots. After Bonferroni correction, we observed a significant positive relationship between age and PC3 score. This may be because older patients have been ill for a longer period of time and exhibit more severe symptoms along the positive PC3 dimension. (B) Correlations between symptom PC scores and socio-economic status (SES) as measured by the Hollingshead Index of Social Position (31), across N=387 PSD with available data. The index is computed as (Hollingshead occupation score * 7) + (Hollingshead education score * 4); a higher score indicates lower SES (32). We observed a significant negative relationship between Hollingshead index and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). (C) The Hollingshead index can be split into five classes, with 1 being the highest and 5 being the lowest SES class (31). Consistent with (B) we found a significant difference between the classes after Bonferroni correction for PC1 and PC2 scores. (D) Distributions of PC scores across Hollingshead SES classes show the overlap in scores. White lines indicate the mean score in each class. (E) Differences in PC scores between (M)ale and (F)emale PSD subjects. We found a significant difference between sexes in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores. (F) Distributions of PC scores across M and F subjects show the overlap in scores. White lines indicate the mean score for each sex.

      Bibliography

      1. Jie Lisa Ji, Caroline Diehl, Charles Schleifer, Carol A Tamminga, Matcheri S Keshavan, John A Sweeney, Brett A Clementz, S Kristian Hill, Godfrey Pearlson, Genevieve Yang, et al. Schizophrenia exhibits bi-directional brain-wide alterations in cortico-striato-cerebellar circuits. Cerebral Cortex, 29(11):4463–4487, 2019.
      2. Alan Anticevic, Michael W Cole, Grega Repovs, John D Murray, Margaret S Brumbaugh, Anderson M Winkler, Aleksandar Savic, John H Krystal, Godfrey D Pearlson, and David C Glahn. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral cortex, 24(12):3116–3130, 2013.
      3. Markus Helmer, Shaun D Warrington, Ali-Reza Mohammadi-Nejad, Jie Lisa Ji, Amber Howell, Benjamin Rosand, Alan Anticevic, Stamatios N Sotiropoulos, and John D Murray. On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. bioRxiv, 2020. .
      4. Richard Dinga, Lianne Schmaal, Brenda WJH Penninx, Marie Jose van Tol, Dick J Veltman, Laura van Velzen, Maarten Mennes, Nic JA van der Wee, and Andre F Marquand. Evaluating the evidence for biotypes of depression: Methodological replication and extension of. NeuroImage: Clinical, 22:101796, 2019.
      5. Cedric Huchuan Xia, Zongming Ma, Rastko Ciric, Shi Gu, Richard F Betzel, Antonia N Kaczkurkin, Monica E Calkins, Philip A Cook, Angel Garcia de la Garza, Simon N Vandekar, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nature communications, 9(1):3003, 2018.
      6. Andrew T Drysdale, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, Benjamin Zebley, Desmond J Oathes, Amit Etkin, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine, 23(1):28, 2017.
      7. Meichen Yu, Kristin A Linn, Russell T Shinohara, Desmond J Oathes, Philip A Cook, Romain Duprat, Tyler M Moore, Maria A Oquendo, Mary L Phillips, Melvin McInnis, et al. Childhood trauma history is linked to abnormal brain connectivity in major depression. Proceedings of the National Academy of Sciences, 116(17):8582–8590, 2019.
      8. David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
      9. Katrin H Preller, Joshua B Burt, Jie Lisa Ji, Charles H Schleifer, Brendan D Adkinson, Philipp Stämpfli, Erich Seifritz, Grega Repovs, John H Krystal, John D Murray, et al. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife, 7:e35082, 2018.
      10. Mark A Geyer and Franz X Vollenweider. Serotonin research: contributions to understanding psychoses. Trends in pharmacological sciences, 29(9):445–453, 2008.
      11. H Y Meltzer, B W Massey, and M Horiguchi. Serotonin receptors as targets for drugs useful to treat psychosis and cognitive impairment in schizophrenia. Current pharmaceutical biotechnology, 13(8):1572–1586, 2012.
      12. Anissa Abi-Dargham, Marc Laruelle, George K Aghajanian, Dennis Charney, and John Krystal. The role of serotonin in the pathophysiology and treatment of schizophrenia. The Journal of neuropsychiatry and clinical neurosciences, 9(1):1–17, 1997.
      13. Francine M Benes and Sabina Berretta. Gabaergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 25(1):1–27, 2001.
      14. Melis Inan, Timothy J. Petros, and Stewart A. Anderson. Losing your inhibition: Linking cortical gabaergic interneurons to schizophrenia. Neurobiology of Disease, 53:36–48, 2013. ISSN 0969-9961. . What clinical findings can teach us about the neurobiology of schizophrenia?
      15. Samuel J Dienel and David A Lewis. Alterations in cortical interneurons and cognitive function in schizophrenia. Neurobiology of disease, 131:104208, 2019.
      16. John E Lisman, Joseph T Coyle, Robert W Green, Daniel C Javitt, Francine M Benes, Stephan Heckers, and Anthony A Grace. Circuit-based framework for understanding neurotransmitter and risk gene interactions in schizophrenia. Trends in neurosciences, 31(5):234–242, 2008.
      17. Anthony A Grace. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience, 17(8):524, 2016.
      18. John F Enwright III, Zhiguang Huo, Dominique Arion, John P Corradi, George Tseng, and David A Lewis. Transcriptome alterations of prefrontal cortical parvalbumin neurons in schizophrenia. Molecular psychiatry, 23(7): 1606–1613, 2018.
      19. Daniel J Lodge, Margarita M Behrens, and Anthony A Grace. A loss of parvalbumin-containing interneurons is associated with diminished oscillatory activity in an animal model of schizophrenia. Journal of Neuroscience, 29(8): 2344–2354, 2009.
      20. Clare L Beasley and Gavin P Reynolds. Parvalbumin-immunoreactive neurons are reduced in the prefrontal cortex of schizophrenics. Schizophrenia research, 24(3):349–355, 1997.
      21. David A Lewis, Allison A Curley, Jill R Glausier, and David W Volk. Cortical parvalbumin interneurons and cognitive dysfunction in schizophrenia. Trends in neurosciences, 35(1):57–67, 2012.
      22. Alan Anticevic, Margaret S Brumbaugh, Anderson M Winkler, Lauren E Lombardo, Jennifer Barrett, Phillip R Corlett, Hedy Kober, June Gruber, Grega Repovs, Michael W Cole, et al. Global prefrontal and fronto-amygdala dysconnectivity in bipolar i disorder with psychosis history. Biological psychiatry, 73(6):565–573, 2013.
      23. Alex Fornito, Jong Yoon, Andrew Zalesky, Edward T Bullmore, and Cameron S Carter. General and specific functional connectivity disturbances in first-episode schizophrenia during cognitive control performance. Biological psychiatry, 70(1):64–72, 2011.
      24. Avital Hahamy, Vince Calhoun, Godfrey Pearlson, Michal Harel, Nachum Stern, Fanny Attar, Rafael Malach, and Roy Salomon. Save the global: global signal connectivity as a tool for studying clinical populations with functional magnetic resonance imaging. Brain connectivity, 4(6):395–403, 2014.
      25. Michael W Cole, Alan Anticevic, Grega Repovs, and Deanna Barch. Variable global dysconnectivity and individual differences in schizophrenia. Biological psychiatry, 70(1):43–50, 2011.
      26. Naomi R Driesen, Gregory McCarthy, Zubin Bhagwagar, Michael Bloch, Vincent Calhoun, Deepak C D’Souza, Ralitza Gueorguieva, George He, Ramani Ramachandran, Raymond F Suckow, et al. Relationship of resting brain hyperconnectivity and schizophrenia-like symptoms produced by the nmda receptor antagonist ketamine in humans. Molecular psychiatry, 18(11):1199–1204, 2013.
      27. Neil D Woodward, Baxter Rogers, and Stephan Heckers. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia research, 130(1-3):86–93, 2011.
      28. Zarrar Shehzad, Clare Kelly, Philip T Reiss, R Cameron Craddock, John W Emerson, Katie McMahon, David A Copland, F Xavier Castellanos, and Michael P Milham. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 93 Pt 1:74–94, Jun 2014. .
      29. Alan J Gelenberg. The catatonic syndrome. The Lancet, 307(7973):1339–1341, 1976.
      30. Jie Lisa Ji, Marjolein Spronk, Kaustubh Kulkarni, Grega Repovš, Alan Anticevic, and Michael W Cole. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185:35–57, 2019.
      31. August B Hollingshead et al. Four factor index of social status. 1975.
      32. Jaya L Padmanabhan, Neeraj Tandon, Chiara S Haller, Ian T Mathew, Shaun M Eack, Brett A Clementz, Godfrey D Pearlson, John A Sweeney, Carol A Tamminga, and Matcheri S Keshavan. Correlations between brain structure and symptom dimensions of psychosis in schizophrenia, schizoaffective, and psychotic bipolar i disorders. Schizophrenia bulletin, 41(1):154–162, 2015.
    1. Author Response

      Reviewer #1 (Public Review):

      Buglak et al. describe a role for the nuclear envelope protein Sun1 in endothelial mechanotransduction and vascular development. The study provides a full mechanistic investigation of how Sun1 is achieving its function, which supports the concept that nuclear anchoring is important for proper mechanosensing and junctional organization. The experiments have been well designed and were quantified based on independent experiments. The experiments are convincing and of high quality and include Sun1 depletion in endothelial cell cultures, zebrafish, and in endothelial-specific inducible knockouts in mice.

      We thank the reviewer for their enthusiastic comments and for noting our use of multiple model systems.

      Reviewer #2 (Public Review):

      Endothelial cells mediate the growth of the vascular system but they also need to prevent vascular leakage, which involves interactions with neighboring endothelial cells (ECs) through junctional protein complexes. Buglak et al. report that the EC nucleus controls the function of cell-cell junctions through the nuclear envelope-associated proteins SUN1 and Nesprin-1. They argue that SUN1 controls microtubule dynamics and junctional stability through the RhoA activator GEF-H1.

      In my view, this study is interesting and addresses an important but very little-studied question, namely the link between the EC nucleus and cell junctions in the periphery. The study has also made use of different model systems, i.e. genetically modified mice, zebrafish, and cultured endothelial cells, which confirms certain findings and utilizes the specific advantages of each model system. A weakness is that some important controls are missing. In addition, the evidence for the proposed molecular mechanism should be strengthened.

      We thank the reviewer for their interest in our work and for highlighting the relative lack of information regarding connections between the EC nucleus and cell periphery, and for noting our use of multiple model systems. We thank the reviewer for suggesting additional controls and mechanistic support, and we have made the revisions described below.

      Specific comments:

      1) Data showing the efficiency of Sun1 inactivation in the murine endothelial cells is lacking. It would be best to see what is happening on the protein level, but it would already help a great deal if the authors could show a reduction of the transcript in sorted ECs. The excision of a DNA fragment shown in the lung (Fig. 1-suppl. 1C) is not quantitative at all. In addition, the gel has been run way too short so it is impossible to even estimate the size of the DNA fragment.

      We agree that the DNA excision is not sufficient to demonstrate excision efficiency. We attempted examination of SUN1 protein levels in mutant retinas via immunofluorescence, but to date we have not found a SUN1 antibody that works in mouse retinal explants. We argue that mouse EC isolation protocols enrich but don’t give 100% purity, so that RNA analysis of lung tissue also has caveats. Finally, we contend that our demonstration of a consistent vascular phenotype in Sun1iECKO mutant retinas argues that excision has occurred. To test the efficiency of our excision protocol, we bred Cdh5CreERT2 mice with the ROSAmT/mG excision reporter (cells express tdTomato absent Cre activity and express GFP upon Cre-mediated excision (Muzumdar et al., 2007). Utilizing the same excision protocol as used for the Sun1iECKO mice, we see a significantly high level of excision in retinal vessels only in the presence of Cdh5CreERT2 (Reviewer Figure 1).

      Reviewer Figure 1: Cdh5CreERT2 efficiently excises in endothelial cells of the mouse postnatal retina. (A) Representative images of P7 mouse retinas with the indicated genotypes, stained for ERG (white, nucleus). tdTomato (magenta) is expressed in cells that have not undergone Cre-mediated excision, while GFP (green) is expressed in excised cells. Scale bar, 100μm. (B) Quantification of tdTomato fluorescence relative to GFP fluorescence as shown in A. tdTomato and GFP fluorescence of endothelial cells was measured by creating a mask of the ERG channel. n=3 mice per genotype. ***, p<0.001 by student’s two-tailed unpaired t-test.

      2) The authors show an increase in vessel density in the periphery of the growing Sun1 mutant retinal vasculature. It would be important to add staining with a marker labelling EC nuclei (e.g. Erg) because higher vessel density might reflect changes in cell size/shape or number, which has also implications for the appearance of cell-cell junctions. More ECs crowded within a small area are likely to have more complicated junctions. Furthermore, it would be useful and straightforward to assess EC proliferation, which is mentioned later in the experiments with cultured ECs but has not been addressed in the in vivo part.

      We concur that ERG staining is important to show any changes in nuclear shape or cell density in the post-natal retina. We now include this data in Figure1-figure supplement 1F-G. We do not see obvious changes in nuclear shape or number, though we do observe some crowding in Sun1iECKO retinas, consistent with increased density. However, when normalized to total vessel area, we do not observe a significant difference in the nuclear signal density in Sun1iECKO mutant retinas relative to controls.

      3) It appears that the loss of Sun1/sun1b in mice and zebrafish is compatible with major aspects of vascular growth and leads to changes in filopodia dynamics and vascular permeability (during development) without severe and lasting disruption of the EC network. It would be helpful to know whether the loss-of-function mutants can ultimately form a normal vascular network in the retina and trunk, respectively. It might be sufficient to mention this in the text.

      We thank the reviewer for pointing this out. It is true that developmental defects in the vasculature resulting from various genetic mutations are often resolved over time. We’ve made text changes to discuss viability of Sun1 global KO mice and lack of perduring effects in sun1 morphant fish, perhaps resulting from compensation by SUN2, which is partially functionally redundant with SUN1 in vivo (Lei et al., 2009; Zhang, et al., 2009) (p. 20).

      4) The only readout after the rescue of the SUN1 knockdown by GEF-H1 depletion is the appearance of VE-cadherin+ junctions (Fig. 6G and H). This is insufficient evidence for a relatively strong conclusion. The authors should at least look at microtubules. They might also want to consider the activation status of RhoA as a good biochemical readout. It is argued that RhoA activity goes up (see Fig. 7C) but there is no data supporting this conclusion. It is also not clear whether "diffuse" GEF-H1 localization translates into increased Rho A activity, as is suggested by the Rho kinase inhibition experiment. GEF-H1 levels in the Western blot in (Fig. 6- supplement 2C) have not been quantitated.

      We agree that analysis of RhoA activity and additional analysis of rescued junctions strengthens our conclusions, so we performed these experiments. New data (Figure 6IJ) shows that co-depletion of SUN1 and GEF-H1 rescues junction integrity as measured by biotin-matrix labeling. Interestingly, co-depletion of SUN1 and GEF-H1 does not rescue reduced microtubule density at the periphery (Figure 6-figure supplement 3BC), placing GEF-H1 downstream of aberrant microtubule dynamics in SUN1 depleted cells. This is consistent with our model (Figure 8) describing how loss of SUN1 leads to increased microtubule depolymerization, resulting in release and activation of GEF-H1 that goes on to affect actomyosin contractility and junction integrity. In addition, we include images of the junctions in GEF-H1 single KD (Figure 6-figure supplement 3BC) and quantify the western blot in Figure 6-figure supplement 3A.

      We performed RhoA activity assays and new data shows that SUN1 depletion results in increased RhoA activation, while co-depletion of SUN1 and GEF-H1 ameliorates this increase (Figure 6-figure supplement 2D). This is consistent with our model in which loss of SUN1 leads to increased RhoA activity via release of GEF-H1 from microtubules. In addition, we now cite a recent study describing that GEF-H1 is activated when unbound to microtubules, with this activation resulting in increased RhoA activity (Azoitei et al., 2019).

      5) The criticism raised for the GEF-H1 rescue also applies to the co-depletion of SUN1 and Nesprin-1. This mechanistic aspect is currently somewhat weak and should be strengthened. Again, Rho A activity might be a useful and quantitative biochemical readout.

      We respectfully point out that we showed that co-depletion of nesprin-1 and SUN1 rescues SUN1 knockdown effects via several readouts, including rescue of junction morphology, biotin labeling, microtubule localization at the periphery, and GEFH1/microtubule localization. We’ve moved this data to the main figure (Figure 7B-C, E-F) to better highlight these mechanistic findings. These results are consistent with our model that nesprin-1 effects are upstream of GEF-H1 localization. We also added results showing that nesprin-1 knockdown alone does not affect junction integrity, microtubule density, or GEF-H1/microtubule localization (Figure 7-figure supplement 1B-G).

      Reviewer #3 (Public Review):

      Here, Buglak and coauthors describe the effect of Sun1 deficiency on endothelial junctions. Sun1 is a component of the LINC complex, connecting the inner nuclear membrane with the cytoskeleton. The authors show that in the absence of Sun1, the morphology of the endothelial adherens junction protein VE-cadherin is altered, indicative of increased internalization of VE-cadherin. The change in VE-cadherin dynamics correlates with decreased angiogenic sprouting as shown using in vivo and in vitro models. The study would benefit from a stricter presentation of the data and needs additional controls in certain analyses.

      We thank the reviewer for their insightful comments, and in response we have performed the revisions described below.

      1) The authors implicate the changes in VE-cadherin morphology to be of consequence for "barrier function" and mention barrier function frequently throughout the text, for example in the heading on page 12: "SUN1 stabilizes endothelial cell-cell junctions and regulates barrier function". The concept of "barrier" implies the ability of endothelial cells to restrict the passage of molecules and cells across the vessel wall. This is tested only marginally (Suppl Fig 1F) and these data are not quantified. Increased leakage of 10kDa dextran in a P6-7 Sun1-deficient retina as shown here probably reflects the increased immaturity of the Sun1-deficient retinal vasculature. From these data, the authors cannot state that Sun1 regulates the barrier or barrier function (unclear what exactly the authors refer to when they make a distinction between the barrier as such on the one hand and barrier function on the other). The authors can, if they do more experiments, state that loss of Sun1 leads to increased leakage in the early postnatal stages in the retina. However, if they wish to characterize the vascular barrier, there is a wide range of other tissue that should be tested, in the presence and absence of disease. Moreover, a regulatory role for Sun1 would imply that Sun1 normally, possibly through changes in its expression levels, would modulate the barrier properties to allow more or less leakage in different circumstances. However, no such data are shown. The authors would need to go through their paper and remove statements regarding the regulation of the barrier and barrier function since these are conclusions that lack foundation.

      We thank the reviewer for pointing out that the language used regarding the function and integrity of the junctions is confusing, although we suggest that the endothelial cell properties measured by our assays are typically equated with “barrier function” in the literature. However, we have edited our language to precisely describe our results as suggested by the reviewer.

      2) In Fig 6g, the authors show that "depletion of GEF-H1 in endothelial cells that were also depleted for SUN1 rescued the destabilized cell-cell junctions observed with SUN1 KD alone". However, it is quite clear that Sun1 depletion also affects cell shape and cell alignment and this is not rescued by GEF-H1 depletion (Fig 6g). This should be described and commented on. Moreover please show the effects of GEF-H1 alone.

      We thank the reviewer for pointing out the effects on cell shape. SUN1 depletion typically leads to shape changes consistent with elevated contractility, but this is considered to be downstream of the effects quantified here. We updated the panel in Figure 6G to a more representative image showing cell shape rescue by co-depletion of SUN1 and GEF-H1. We present new data panels showing that GEF-H1 depletion alone does not affect junction integrity (Figure 6I-J). We also present new data showing that co-depletion of GEF-H1 and SUN1 does not rescue microtubule density at the periphery (Figure 6-figure supplement 3B-C), consistent with our model that GEF-H1 activation is downstream of microtubule perturbations induced by SUN1 loss.

      3) In Fig. 6a, the authors show rescue of junction morphology in Sun1-depleted cells by deletion of Nesprin1. The effect of Nesprin1 KD alone is missing.

      We thank the reviewer for this comment, and we now include new panels (Figure 7figure supplement 1B-G) demonstrating that Nesprin-1 depletion does not affect biotin-matrix labeling, peripheral microtubule density, or GEF-H1/microtubule localization absent co-depletion with SUN1. These findings are consistent with our model that Nesprin-1 loss does not affect cell junctions on its own because it is held in a non-functional complex with SUN1 that is not available in the absence of SUN1.

      References

      Azoitei, M. L., Noh, J., Marston, D. J., Roudot, P., Marshall, C. B., Daugird, T. A., Lisanza, S. L., Sandί, M., Ikura, M., Sondek, J., Rottapel, R., Hahn, K. M., Danuser, & Danuser, G. (2019). Spatiotemporal dynamics of GEF-H1 activation controlled by microtubule- and Src-mediated pathways. Journal of Cell Biology, 218(9), 3077-3097. https://doi.org/10.1083/jcb.201812073

      Denis, K. B., Cabe, J. I., Danielsson, B. E., Tieu, K. V, Mayer, C. R., & Conway, D. E. (2021). The LINC complex is required for endothelial cell adhesion and adaptation to shear stress and cyclic stretch. Molecular Biology of the Cell, mbcE20110698. https://doi.org/10.1091/mbc.E20-11-0698

      King, S. J., Nowak, K., Suryavanshi, N., Holt, I., Shanahan, C. M., & Ridley, A. J. (2014). Nesprin-1 and nesprin-2 regulate endothelial cell shape and migration. Cytoskeleton (Hoboken, N.J.), 71(7), 423–434. https://doi.org/10.1002/cm.21182

      Lei, K., Zhang, X., Ding, X., Guo, X., Chen, M., Zhu, B., Xu, T., Zhuang, Y., Xu, R., & Han, M. (2009). SUN1 and SUN2 play critical but partially redundant roles in anchoring nuclei in skeletal muscle cells in mice. PNAS, 106(25), 10207–10212.

      Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L., & Luo, L. (2007). A global doublefluorescent Cre reporter mouse. Genesis, 45(9), 593-605. https://doi.org/10.1002/dvg.20335

      Ueda, N., Maekawa, M., Matsui, T. S., Deguchi, S., Takata, T., Katahira, J., Higashiyama, S., & Hieda, M. (2022). Inner Nuclear Membrane Protein, SUN1, is Required for Cytoskeletal Force Generation and Focal Adhesion Maturation. Frontiers in Cell and Developmental Biology, 10, 885859. https://doi.org/10.3389/fcell.2022.885859

      Zhang, X., Lei, K., Yuan, X., Wu, X., Zhuang, Y., Xu, T., Xu, R., & Han, M. (2009). SUN1/2 and Syne/Nesprin-1/2 complexes connect centrosome to the nucleus during neurogenesis and neuronal migration in mice. Neuron, 64(2), 173–187. https://doi.org/10.1016/j.neuron.2009.08.018.

    1. Author Response

      Reviewer #1 (Public Review):

      In Figure 1A, the authors should show TEM images of control mock treated samples to show the difference between infected and healthy tissue. Based on the data shown in Figure 1B-E that the overexpression of GFP-P in N. benthamiana leads to formation of liquid-like granules. Does this occur during virus infection? Since authors have infectious clones, can it be used to show that the virally encoded P protein in infected cells does indeed exist as liquid-like granules? If the fusion of GFP to P protein affects its function, the authors could fuse just the spGFP11 and co-infiltrate with p35S-spGFP1-10. These experiments will show that the P protein when delivered from virus does indeed form liquid-like granules in plants cells. Authors should include controls in Figure 1H to show that the interaction between P protein and ER is specific.

      We agree with the reviewer and appreciate the helpful suggestion. As suggested, we added TEM images of control mock treated barley leaves. We also carried out immune-electron microscope to show the presence of BYSMV P protein in the viroplasms. Please see Figure 1–Figure supplement 1.

      BYSMV is a negative-stranded RNA virus, and is strictly dependent on insect vector transmission for infecting barley plants. We have tried to fuse GFP to BYSMV P in the full-length infectious clones. Unfortunately, we could not rescue BYSMV-GFP-P into barley plants through insect transmission.

      In Figure 1H, we used a PM localized membrane protein LRR84A as a negative control to show LRR84A-GS and BYSMV P could not form granules although they might associate at molecular distances. Therefore, the P granules were formed and tethered to the ER tubules. Please see Figure 1–Figure supplement 4

      Data shown in Figure 2 do demonstrate that the purified P protein could undergo phase separation. Furthermore, it can recruit viral N protein and part of viral genomic RNA to P protein induced granules in vitro.

      Because the full-length BYSMV RNA has 12,706 nt and is difficult to be transcribed in vitro, we cannot show whether the BYSMV genome is recruited into the droplets. We have softened the claim and state that the P-N droplets can recruit 5′ trailer of BYSMV genome as shown in Figure 3B. Please see line 22, 177 and 190.

      Based on the data shown in Figure 4 using phospho-null and phospho-mimetic mutants of P protein, the authors conclude that phosphorylation inhibits P protein phase separation. It is unclear based on the experiments, why endogenous NbCK1 fails to phosphorylate GFP-P-WT and inhibit formation of liquid-like granules similar to that of GFP-P-S5D mutant? Is this due to overexpression of GFP-P-WT? To overcome this, the authors should perform these experiments as suggested above using infectious clones and these P protein mutants.

      As we known, phosphorylation and dephosphorylation are reversible processes in eukaryotic cells. Therefore, as shown in Figure 5B and 6B, the GFP-PWT protein have two bands, corresponding to P74 and P72, which represent hyperphosphorylation and hypophosphorylated forms, respectively. Only overexpression of NbCK1 induced high ratio of P74 to P72 in vivo, and then abolished phase separation of BYSMV.

      In Figure 5, the authors overexpress NbCK1 in N. benthamiana or use an in vitro co-purification scheme to show that NbCK1 inhibits phase separation properties of P protein. These results show that overexpression of both GFP-P and NbCK1 proteins is required to induce liquid-like granules. Does this occur during normal virus infection? During normal virus infection, P protein is produced in the plant cells and the endogenous NbCK1 will regulate the phosphorylation state of P protein. These are reasons for authors to perform some of the experiments using infectious clones. Furthermore, the authors have antibodies to P protein and this could be used to show the level of P protein that is produced during the normal infection process.

      We detected the P protein existed as two phosphorylation forms in BYSMV-infected barley leaves, and λPPase treatment decreased the P44 phosphorylation form. Therefore, these results indicate that endogenous CK1 cannot phosphorylate BYSMV P completely.

      Based on the data shown in Figure 6, the authors conclude that phase separated P protein state promotes replication but inhibits transcription by overexpressing P-S5A and P-S5D mutants. To directly show that the NbCK1 controlled phosphorylation state of P regulates this process, authors should knockdown/knockout NbCK1 and see if it increases P protein condensates and promote recruitment of viral proteins and genomic RNA to increase viral replication.

      In our previous studies, BLAST searches showed that the N. benthamiana and barley genomes encode 14 CK1 orthologs, most of which can phosphorylated the SR region of BYSMV P. Therefore, it is difficult to make knockdown/knockout lines of all the CK1 orthologues. Accordingly, we generated a point mutant (K38R and D128N) in HvCK1.2, in which the kinase activity was abolished. Overexpression of HvCK1.2DN inhibit endogenous CK1-mediated phosphorylation of BYSMV P, indicating that HvCK1.2DN is a dominant-negative mutant.

      It is important to note that both replication and transcription are required for efficient infection of negative-stranded RNA viruses. Therefore, our previous studies have revealed that both PS5A and PS5D are required for BYSMV infection. Therefore, expression of HvCK1.2DN in BYSMV vector inhibit virus infection by impairing the balance of endogenous CK1-mediated phosphorylation in BYSMV P.

      Reviewer #2 (Public Review):

      The manuscript by Fang et al. details the ability of the P protein from Barley yellow striate mosaic virus (BYSMV) to form phase-separated droplets both in vitro and in vivo. The authors demonstrate P droplet formation using recombinant proteins and confocal microscopy, FRAP to demonstrate fluidity, and observed droplet fusion. The authors also used an elaborate split-GFP system to demonstrate that P droplets associate with the tubulur ER network. Next, the authors demonstrate that the N protein and a short fragment of viral RNA can also partition into P droplets. Since Rhabdovirus P proteins have been shown to phase separate and form "virus factories" (see https://doi.org/10.1038/s41467-017-00102-9), the novelty from this work is the rigorous and conclusive demonstration that the P droplets only exist in the unphosphorylated form. The authors identify 5 critical serine residues in IDR2 of P protein that when hyper-phosphorylated /cannot form droplets. Next, the authors conclusively demonstrate that the host kinase CK1 is responsible for P phosphorylation using both transient assays in N. benthamiana and a co-expression assay in E. coli. These findings will likely lead to future studies identifying cellular kinases that affect phase separation of viral and cellular proteins and increases our understanding of regulation of condensate formation. Next, the authors investigated whether P droplets regulated virus replication and transcription using a minireplicon system. The minireplicon system needs to be better described as the results were seemingly conflicting. The authors also used a full-length GFP-reporter virus to test whether phase separation was critical for virus fitness in both barley and the insect vector. The authors used 1, 6-hexanediol which broadly suppresses liquid-liquid phase separation and concluded that phase separation is required for virus fitness (based on reduced virus accumulation with 1,6 HD). However, this conclusion is flawed since 1,6-hexanediol is known to cause cell toxicity and likely created a less favorable environment for virus replication, independent of P protein phase separation. These with other issues are detailed below:

      1. In Figure 3B, the authors display three types of P-N droplets including uniform, N hollow, and P-N hollow droplets. The authors do not state the proportion of droplets observed or any potential significance of the three types. Finally, as "hollow" droplets are not typically observed, is there a possibility that a contaminating protein (not fluorescent) from E. coli is a resident client protein in these droplets? The protein purity was not >95% based on the SDS-PAGE gels presented in the supplementary figures. Do these abnormalities arise from the droplets being imaged in different focal planes? Unless some explanation is given for these observations, this reviewer does not see any significance in the findings pertaining to "hollow" droplets.

      Thanks for your constructive suggestions. We removed the "hollow" droplets as suggested. We think that the hollow droplets might be an intermediate form of LLPS. Please see PAGE 7 and 8 of revised manuscript.

      1. Pertaining to the sorting of "genomic" RNA into the P-N droplets, it is unlikely that RNA sorting is specific for BYSMV RNA. In other words, if you incubate a non-viral RNA with P-N droplets, is it sorted? The authors conclusion that genomic RNA is incorporated into droplets is misleading in a sense that a very small fragment of RNA was used. Cy5 can be incorporated into full-length genomic RNAs during in vitro transcription and would be a more suitable approach for the conclusions reached.

      Thanks for your constructive suggestions. Unfortunately, we could not obtain the in vitro transcripts of the full-length genomic RNAs (12706 nucleotides). We have softened the claim and state that the P-N droplets can recruit the 5′ trailer of BYSMV genome as shown in Figure 3B. Please see line 22, 177 and 190.

      According to previous studies (Ivanov, et al., 2011), the Rhabdovirus P protein can bind to nascent N moleculaes, forming a soluble N/P complex, to prevent from encapsidating cellular RNAs. Therefore, we suppose that the P-N droplets can incorporate viral genomic RNA specifically.

      Reference: Ivanov I, Yabukarski F, Ruigrok RW, Jamin M. 2011. Structural insights into the rhabdovirus transcription/ replication complex. Virus Research 162:126–137. DOI: https://doi.org/10.1016/j.virusres.2011.09.025

      1. In Figure 4C, it is unclear how the "views" were selected for granule counting. The methods should be better described as this reviewer would find it difficult to select fields of view in an unbiased manner. This is especially true as expression via agroinfiltration can vary between cells in agroinfiltrated regions. The methods described for granule counting and granule sizes are not suitable for publication. These should be expanded (i.e. what ImageJ tools were used?).

      We agree with the reviewer that it is important to select fields of view in an unbiased manner. We selected the representative views and provided large views in the new Supplement Figures. In addition, we added new detail methods in revision. Please see Figure 4–Figure supplement 1, Figure 5–Figure supplement 1, and method (line 489-498).

      1. In Figure 4F, the authors state that they expected P-S5A to only be present in the pellet fraction since it existed in the condensed state. However, WT P also forms condensates and was not found in the pellet, but rather exclusively in the supernatant. Therefore, the assumption of condensed droplets only being found in the pellet appears to be incorrect.

      Many thanks for pointing this out. This method is based on a previous study (Hubstenberger et al., 2017). The centrifugation method might efficiently precipitate large granules more than small granules. As shown in Figure 4B, GFP-PS5A formed large granules, therefore GFP-PS5A mainly existed in the pellet. In contrast, GFP-PWT only existed in small granule and fusion state, thus most of GFP-PWT protein was existed in supernatant, and only little GFP-PWT protein in the pellet. These results also indicate the increased phase separation activity of GFP-PS5A compared with GFP-PWT. Please see the new Figure 4F.

      Reference: Hubstenberger A, Courel M, Benard M, Souquere S, Ernoult-Lange M, Chouaib R, Yi Z, Morlot JB, Munier A, Fradet M, et al. 2017. P-Body Purification Reveals the Condensation of Repressed mRNA Regulons. Molecular Cell 68(1): 144-157 e145.

      1. The authors conclude that P-S5A has enhanced phase separation based on confocal microscopy data (Fig S6A). The data presented is not convincing. Microscopy alone is difficult for comparing phase separation between two proteins. Quantitative data should be collected in the form of turbidity assays (a common assay for phase separation). If P-S5A has enhanced phase separation compared to WT, then S5A should have increased turbidity (OD600) under identical phase separation conditions. The microscopy data presented was not quantified in any way and the authors could have picked fields of view in a biased manner.

      Thanks for your constructive suggestions. As suggested, turbidity assays were performed to show both GFP-PWT and GFP-PS5A had increased turbidity (OD600) compared with GFP. Please see Figure 4–Figure supplement 3.

      1. The authors constructed minireplicons to determine whether mutant P proteins influence RNA replication using trans N and L proteins. However, this reviewer finds the minireplicon design confusing. How is DsRFP translated from the replicon? If a frameshift mutation was introduced into RsGFP, wouldn't this block DsRFP translation as well? Or is start/stop transcription used? Second, the use of the 2x35S promoter makes it difficult to differentiate between 35S-driven transcription and replication by L. How do you know the increased DsRFP observed with P5A is not due to increased transcription from the 35S promoter? The RT-qPCR data is also very confusing. It is not clear that panel D is only examining the transcription of RFP (I assume via start/stop transcription) whereas panel C is targeting the minireplicon.

      Thank you for your questions and we are sorry for the lack of clarity regarding to the mini-replicon vectors. Here, we updated the Figure supplement 14 to show replication and transcription of BYSMV minireplicon, a negative-stranded RNA virus derivative. In addition, we insert an A after the start codon to abolish the translation of GFP mRNA, which allow us to observe phase separation of GFP-PWT, GFP-PS5A, and GFP-PS5D during virus replication. Use this system, we wanted to show the localization and phase separation of GFP-PWT, GFP-PS5A, and GFP-PS5D during replication and transcription of BYS-agMR. Please see Figure 6–Figure supplement 1.

      1. Pertaining to the replication assay in Fig. 6, transcription of RFP mRNA was reduced by S5A and increased by S5D. However, the RFP translation (via Panel A microscopy) is reversed. How do you explain increased RFP mRNA transcription by S5D but very low RFP fluorescence? The data between Panels A, C, and D do not support one another.

      Many thanks for pointing this out! We also noticed the interesting results that have been repeated independently. As shown the illustration of BYSMV-agMR system in Figure 6–Figure supplement 1, the relative transcriptional activities of different GFP-P mutants were calculated from the normalized RFP transcript levels relative to the gMR replicate template (RFP mRNA/gMR), because replicating minigenomes are templates for viral transcription.

      Since GFP-PS5D supported decreased replication, the ratio of RFP mRNA/gMR increased although the RFP mRNA of GFP-PS5D is not increased. In addition, the foci number of GFP-PS5D is much less than GFP-PWT and GFP-PS5A, indicating mRNAs in GFP-PS5D samples may contain aberrant transcripts those cannot be translated the RFP protein. In contrast, mRNAs in GFP-PS5A samples are translated efficiently. These results were in consistent with our previous studies using the free PWT, PS5A, and PS5D.

      Reference: Gao Q, et al. 2020. Casein kinase 1 regulates cytorhabdovirus replication and transcription by phosphorylating a phosphoprotein serine-rich motif. The Plant Cell 32(9): 2878-2897.

      1. The authors relied on 1,6-hexanediol to suppress phase separation in both insect vectors and barley. However, the authors disregarded several publications demonstrating cellular toxicity by 1,6-hexanediol and a report that 1,6-HD impairs kinase and phosphatase activities (see below). doi: 10.1016/j.jbc.2021.100260,

      We agree with the reviewer that 1, 6-hexanediol induced cellular toxicity. Therefore, we removed these results, which does not affect the main conclusion of our results.

      1. The authors state that reduced accumulation of BYSMV-GFP in insects and barley under HEX treatment "indicate that phase separation is important for cross-kingdom infection of BYSMV in insect vectors and host plants." The above statement is confounded by many factors, the most obvious being that HEX treatment is most likely toxic to cells and as a result cannot support efficient virus accumulation. Also, since HEX treatment interferes with phosphorylation (see REF above) its use here should be avoided since P phase separation is regulated by phosphorylation.

      We agree with the reviewer that 1, 6-hexanediol induced cellular toxicity and hereby affected infections of BYSMV and other viruses. In addition, 1, 6-hexanediol would inhibit LLPS of cellular membraneless organelles, such as P-bodies, stress granules, cajal bodies, and the nucleolus, which also affect different virus infections directly or indirectly. Therefore, we removed these results, which does not affect the main conclusion of our results.

      Reviewer #3 (Public Review):

      Membrane-less organelles formed through liquid-liquid phase separation (LLPS) provide spatiotemporal control of host immunity responses and other cellular processes. Viruses are obligate pathogens proliferating in host cells which lead their RNAs and proteins are more likely to be targeted by immune-related membrane-less organelles. To successfully infect and proliferate in host cells, virus need to efficiently suppressing the immune function of those immune-related membrane-less organelles. Moreover, viruses also generate exogenous membrane-less organelles/RNA granules to facilitate their proliferation. Accordingly, host cells also need to target and suppress the functions of exogenous membrane-less organelles/RNA granules generated by viruses, the underlying mechanisms of which are still mysterious.

      In this study, Fang et al. investigated how plant kinase confers resistance against viruses via modulating the phosphorylation and phase separation of BYSMV P protein. They firstly characterized the phase separation feature of BYSMV P protein. They also discovered that droplets formed by P protein recruit viral RNA and other viral protein in vivo. The phase separation activity of P protein is inhibited by the phosphorylation on its intrinsically disordered region. Combined with their previous study, this study demonstrated that host casein kinase (CK1) decreases the phase separation of P protein via increasing the phosphorylation of P protein. Finally, the author claimed that the phase separation of P protein facilitates BYSMV replication but decreases its transcription. Taking together, this study uncovered the molecular mechanism of plant regulating viral proliferation via decreasing the formation of exogenous RNA granules/membraneless organelles. Overall, this paper tells an interesting story about the host immunity targeting viruses via modulating the dynamics of exogenous membraneless organelles, and uncovers the modulation of viral protein phase separation by host protein, which is a hotspot in plant immunity, and the writing is logical.

      Thanks for your positive comment on our studies.

    1. Author Response:

      Reviewer #1 (Public Review):

      Here the authors use a variety of sophisticated approaches to assess the contribution of synaptic parameters to dendritic integration across neuronal maturation. They provide high-quality data identifying cellular parameters that underlie differences in AMPAR-mediated synaptic currents measured between adolescent and adult cerebellar stellate cells, and conclude that differences are attributed to an increase in the complexity of the dendritic arbor. This conclusion relies primarily on the ability of a previously described model for adult stellate cells to recapitulate the age-dependent changes in EPSCs by a change in dendritic branching with no change in synapse density. These rigorous results have implications for understanding how changing structure during neuronal development affects integration of AMPR-mediated synaptic responses.

      The data showing that younger SCs have smaller dendritic arbors but similar synapse density is well-documented and provides compelling evidence that these structural changes affect dendritic integration. But the main conclusion also relies on the assumption that the biophysical model built for adult SCs applies to adolescent SCs, and there are additional relevant variables related to synaptic function that have not been fully assessed. Thus, the main conclusions would be strengthened and broadened by additional experimental validation.

      We thank the reviewer for the positive assessment of the quality and importance of our manuscript. Below we address the reviewer’s comments directly but would like to stress that the goal of the manuscript was to understand the cellular mechanisms underlying developmental slowing of mEPSCs in SCs and the consequent implication for developmental changes in dendritic integration, which have rarely been examined to date, and not to establish a detailed biophysical model of cerebellar SCs. The latter would require dual-electrode recordings (one on 0.5 um dendrites), detailed description of the expression, dendritic localization of the gap junction protein connexin 36 (as done in Szoboszlay neuron 2016), and a detailed description prameter variability across the SC population (e.g. variations in AMPAR content at synapses, Rm, and dendritic morphology). Such experiments are well beyond the scope of the manuscript. Here we use biophysical simulations to support conclusions derived from specific experiments, more as a proof of principle rather than a strict quantitative prediction.

      Nevertheless, we would like to clarify our selection of parameters for the biophysical models for immature and adult SCs. We did not simply “assume” that the biophysical models were the same at the two developmental stages. We either used evidence from the literature or our own measured parameters to establish an immature SC model. As compared to adult SCs, we found that immature SCs had 1) an identical membrane time constant, 2) an only slightly larger dendrite diameter, 3) decreased dendritic branching and maximum lengths, 4) a comparable synapse density, and 5) a homogeneous synapse distribution. Taken together, we concluded that increased dendritic branching during SC maturation resulted in a larger fraction of synapses at longer electrotonic distances in adult SCs. These experimental findings were incorporated into two distinct biophysical models representing immature and adult SCs. Evidence from the literature suggests that voltage-gated channels expression is not altered between the two developmental stages studied here. Therefore, like the adult SC model, we considered only the passive membrane properties and the dendritic morphology. The simulation results supported our conclusion that the increased apparent dendritic filtering of mEPSCs resulted from a change in the distribution of synapse distance to the soma rather than cable properties. Some of the measured parameters (e.g., membrane time constant) were not clearly stated manuscript, which we have corrected in the revised manuscript.

      We are not sure what the reviewer meant by suggesting that we did not examine “other relevant variables related to synaptic function.” Later, the reviewer refers to alterations in AMPAR subunit composition or changes in cleft glutamate concentration (low-affinity AMPAR antagonist experiments). We performed experiments to directly examine both possible contributions by comparing qEPSC kinetics and performing low-affinity antagonist experiments, respectively, but we found that neither mechanism could account for the developmental slowing of mEPSCs. We, therefore, did not explore further possible developmental changes AMPAR subunits. See below for a more specific response and above for newly added text.

      While many exciting questions could be examined in the future, we do not think the present study requires additional experiments. Nevertheless, we recognize that perhaps we can improve the description of the results to justify our conclusions better (see specifics below).

      Reviewer #2 (Public Review):

      This manuscript investigates the cellular mechanisms underlying the maturation of synaptic integration in molecular layer interneurons in the cerebellar cortex. The authors use an impressive combination of techniques to address this question: patch-clamp recordings, 2-photon and electron microscopy, and compartmental modelling. The study builds conceptually and technically on previous work by these authors (Abrahamsson et al. 2012) and extends the principles described in that paper to investigate how developmental changes in dendritic morphology, synapse distribution and strength combine to determine the impact of synaptic inputs at the soma.

      1) Models are constructed to confirm the interpretation of experimental results, mostly repeating the simulations from Abrahamsson et al. (2012) using 3D reconstructed morphologies. The results are as expected from cable theory, given the (passive) model assumptions. While this confirmation is welcome and important, it is disappointing to see the opportunity missed to explore the implications of the experimental findings in greater detail. For instance, with the observed distributions of synapses, are there more segregated subunits available for computation in adult vs immature neurons?

      As described in our response to reviewer 1, this manuscript intends to identify the cellular mechanisms accounting developmental slowing of mEPSCs and its implication for dendritic integration. The modeling was designed to support the most plausible explanation that increased branching resulted in more synapses at longer electrotonic distances. This finding is novel and merits more in-depth examination at a computation level in future studies.

      Quantifying dendritic segregation is non-trivial due to dendritic nonlinearities and the difficulties in setting criteria for electrical “isolation” of inputs. However, because the space constant does not change with development, while both dendrite length and branching increase, it is rather logical to conclude qualitatively that the number of computational segments increases with development.

      We have added the following sentence to the Discussion (line 579):

      “Moreover, since the space constant does not change significantly with development and the dendritic tree complexity increases, the number of computational segments is expected to increase with development.”

      How do SCs respond at different developmental stages with in vivo-like patterns of input, rather than isolated activation of synapses? Answering these sorts of questions would provide quantitative support for the conclusion that computational properties evolve with development.

      While this is indeed a vital question, the in vivo patterns of synaptic activity are not known, so it is difficult to devise experiments to arrive at definitive conclusions.

      2) From a technical perspective, the modeling appears to be well-executed, though more methodological detail is required for it to be reproducible. The AMPA receptor model and reversal potential are unspecified, as is the procedure for fitting the kinetics to data.

      We did not use an explicit channel model to generate synaptic conductances. We simply used the default multiexponential function of Neuron (single exponential rise and single exponential decay) and adjusted the parameters tauRise and tauDecay such that simulated EPSCs matched somatic quantal EPSC amplitude, rise time and τdecay (Figure 4).

      We added the following text to the methods (line 708):

      “The peak and kinetics of the AMPAR-mediated synaptic conductance waveforms (gsyn) were set to simulate qEPSCs that matched the amplitude and kinetics of experimental somatic quantal EPSCs and evoked EPSCs. Immature quantal gsyn had an peak amplitude of 0.00175 μS, a 10-90 % RT of 0.0748 ms and a half-width of 0.36 ms (NEURON synaptic conductance parameter Tau0 = 0.073 ms, Tau1 = 0.26 ms and Gmax = 0.004 μS) while mature quantal gsyn had an peak amplitude of 0.00133 μS, a 10-90 % RT of 0.072 ms and a half-width of 0.341 ms (NEURON synaptic conductance parameters Tau0 = 0.072 ms, Tau1 = 0.24 ms and Gmax = 0.0032 μS). For all simulations, the reversal potential was set to 0 mV and the holing membrane potential was to – 70 mV. Experimental somatic PPR for EPSCs were reproduced with a gsyn 2/ gsyn 1 of 2.25.”

      Were simulations performed at resting potential, and if yes, what was the value?

      The membrane potential was set at – 70 mV to match that of experimental recordings and has been updated in the Methods section.

      How was the quality of the morphological reconstructions assessed? Accurate measurement of dendritic diameters is crucial to the simulations in this study, so providing additional morphometrics would be helpful for assessing the results. Will the models and morphologies be deposited in ModelDB or similar?

      For the two reconstructions imported into NEURON for simulations, we manually curated the dendritic diameters to verify a matching of the estimated diameter to that of the fluorescence image using NeuroStudio, which uses a robust subpixel estimation algorithm (Rayburst diameter, Rodriguez et al. 2008). The reconstructions include all variations in diameter throughout the dendritic tree (see as a example the the result of the reconstruction on the image below for the immature SC presented in the Figure 2D). The mean diameter across the entire dendritic tree of the reconstructed immature and adult SC was 0.42 and 0.36 μm, respectively, similar to the ratio of measured diameters estimated using confocal microscopy.

      We have updated the methods section to include how reconstructions were curated and analyzed (line 693).

      “An immature (P16) and adult SC (P42) were patch loaded with 30 μM Alexa 594 in the pipette and imaged using 2PLSM. Both cells were reconstructed in 3D using NeuronStudio in a semiautomatic mode which uses a robust subpixel estimation algorithm (calculation of Rayburst diameter (Rodriguez et al., 2008)). We manually curated the diameters to verify that it matched the fluorescence image to faithfully account for all variations in diameter throughout the dendritic tree. The measured diameter across the entire dendritic tree of the reconstructed immature and adult SCs was 0.42 and 0.36 μm, respectively. The 16% smaller diameter in adult was similar to the 13% obtained from confocal image analysis from many SCs (see Figure 2B).”

      We agree with the reviewer that accurate measurements of dendritic diameters are crucial for the simulations. We did not rely soley on the reconstructed SCs, but we also performed highresolution confocal microscopy analysis of 16 different dye-filled SCs. We examined differences in the FWHM of intensity line profiles drawn perpendicular to the dendrite between immature and adult SCs. The FWHM is a good approximation of dendritic diameter and was performed similarly to adult SCs (Abrahamsson et al., 2012) to allow direct assessment of possible developmental differences. We confirmed that 98% of the estimated diameters are larger than the imaging resolution (0.27 μm). We observed only a small developmental difference in the mean FWHM (0.41 vs. 0.47 μm, 13% reduction) using this approach. Because the dendritic filtering is similar for diameters ranging from 0.3 to 0.6 μm (Figure 4G and 4H, Abrahamsson et al. 2012), we concluded that developmental changes in dendritic diameter cannot account for for developmental differences in mEPSC time course.

      We added the following text to the methods (line 777):

      “The imaging resolution within the molecular layer was estimated from the width of intensity line profiles of SC axons. The FWHM was 0.30 +/- 0.01 μm (n = 57 measurements over 16 axons) and a mean of 0.27 +/- 0.01 μm (n = 16) when taking into account the thinnest section for each axon. Only 2% of all dendritic measurements are less than 270 nm, suggesting that the dendritic diameter estimation is hardly affected by the resolution of our microscope”

      Regarding additional morphometrics:

      1) We added two panels (H and I) to Figure 6 showing the number of primary dendrites and branch points for immature and adult using the same estimation criteria as Myoga et al;, 2009. We have updated the Results section (line 389). “Thus, the larger number of puncta located further from the soma in adult SCs is not due to increased puncta density with distance, but a larger dendritic lengths (Figure 6E and 6F) and many more distal dendritic branches (Figure 6G, Sholl analysis) due to a larger number of branch points (Figure 6H), but not a larger number of primary dendrites (Figure 6I). The similarity between the shapes of synapse (Figure 6B) and dentric segment (Figure 6C) distributions was captured by a similarity in their skewness (0.38 vs. 0.32 for both distributions in immature and -0.10 and -0.08 for adult distributions). These data demonstrate that increased dendritic complexity during SC maturation is responsible for a prominent shift toward distal synapses in adult SCs.

      2) As suggested by the reviewer, we estimated the dendritic width as a function branch order and observed a small reduction of dendritic segments as a function of distance from the soma that does not significantly alter the dendritic filtering (0.35 to 0.6 μm): there is a tendency to observe smaller diameter for more distal segments.

      3) We also show the variability in dendritic diameter within single SCs and between different SCs, which can be very large. These results have been added to Figure 2B. See also point one below in response to “comment to authors.”

      We will upload the two SC reconstructions to ModelDB.

      3) The Discussion should justify the assumption of AMPA-only synapses in the model (by citing available experimental data) as well as the limitations of this assumption in the case of different spatiotemporal patterns of parallel fiber activation.

      NMDARs are extrasynaptic in immature and adult SCs. Therefore they do not contribute to postsynaptic strength in response to low-frequency synaptic activation. We therefore do not consider their contribution to synaptic integration in this study. Please see also out detailed response to reviewer’s point 4. We have updated the Results accordingly.

      4) What is the likely influence of gap junction coupling between SCs on the results presented here, and on synaptic integration in SCs more generally - and how does it change during development? This should also be discussed.

      Please see a detailed response to Editor’s point 2. In brief, all recordings were performed without perturbing gap junction coupling between cells, which have been shown to affect axial resistance and membrane capacitance in other cell types (Szoboszlay et al., 2016). While our simulations do not explicitly include gap junctions, their effect on passive membrane properties is implicitly included because we matched the simulated membrane time constant to experimental values. Moreover, gap junctions are more prominent in cerebellar basket cells than SCs in both p18 to p21 animals (Rieubland 2015) and adult mice (Hoehne et al., 2020). Ultimately, the impact of gap junctions also depends on their distance from the activated synapses (Szoboszlay et al., 2016). Unfortunately, the distribution of gap junctions in SCs and their conductance is not known at this time. We, therefore, did not explicitly consider gap junction in this study.

      Nevertheless, we have added a section in the Discussion (line 552):

      “We cannot rule out that developmental changes in gap junction expression could contribute to the maturation of SC dendritic integration, since they are thought to contribute to the axial resistivity and capacitance of neurons (Szoboszlay et al., 2016). All the recordings were made with gap junctions intact, including for membrane time constant measurements. However, their expression in SCs is likely to be lower than their basket cell counterparts (Hoehne et al., 2020; Rieubland et al., 2014).”

      5) All experiments and all simulations in the manuscript were done in voltage clamp (the Methods section should give further details, including the series resistance). What is the significance of the key results of the manuscript on synapse distribution and branching pattern of postsynaptic dendrites in immature and adult SCs for the typical mode of synaptic integration in vivo, i.e. in current clamp? What is their significance for neuronal output, considering that SCs are spontaneously active?

      It should be noted that not all simulations were done in voltage-clamp, see figure 8.

      Nevertheless, we have given additional details about the following experimental and simulation parameters:

      1) Description of the whole-cell voltage-clamp procedure.

      2) Series resistance values of experiments and used for simulations.

      Initial simulations with the idealized SC model were performed with a Rs of 20 MOhm. In the reconstructed model Rs was set at 16 mOhm to match more precisely the experimental values obtained for the mEPSC experiments. We verified that there were no statistical difference in Rs between Immature and adult recordings.

      Reviewer #3 (Public Review):

      1) Although the authors were thorough in their efforts to find the mechanism underlying the differences in the young and adult SC synaptic event time course, the authors should consider the possibility of inherently different glutamate receptors, either by alterations in the subunit composition or by an additional modulatory subunit. The literature actually suggests that this might be the case, as several publications described altered AMPA receptor properties (not just density) during development in stellate cells (Bureau, Mulle 2004; Sun, Liu 2007; Liu, Cull-Candy 2002). The authors need to address these possibilities, as modulatory subunits are known to alter receptor kinetics and conductance as well.

      Properties of synaptic AMPAR in SCs are known to change during development and in an activity-dependent manner. EPSCs in immature SC have been shown to be mediated by calcium permeable AMPARs, predominantly containing GluR3 subunits that are associated with TARP γ2 and γ7 (Soto et al. 2007; Bats et al., 2012). During development GluR2 subunits are inserted to the synaptic AMPAR in an activity-dependent manner (Liu et al, 2000), affecting the receptors’ calcium permeability (Liu et al., 2002). However, those developmental changes do not appear to affect EPSC kinetics (Liu et al., 2002) and have very little impact on AMPAR conductance (Soto et al., 2007). When we compare qEPSC kinetics for somatic synapses between immature and adult SC, we did not observe changes in EPSC decay. In the light of this observation and also consistent with the studies cited above, we concluded that differences in AMPAR composition could not contribute to kinetics differences observed in the developmental changes in mEPSC properties.

      We have modified the manuscript to make this point clearer (see section starting line 332) :

      “This reduction in synaptic conductance could be due to a reduction in the number of synaptic AMPARs activated and/or a developmental change in AMPAR subunits. SC synaptic AMPARs are composed of GluA2 and GluA3 subunits associated with TARP γ2 and γ7 (Bats et al., 2012; Liu and Cull-Candy, 2000; Soto et al., 2007; Yamazaki et al., 2015). During development, GluR2 subunits are inserted to the synaptic AMPAR in an activity-dependent manner (Liu and Cull-Candy, 2002), affecting receptors calcium permeability (Liu and Cull-Candy, 2000). However, those developmental changes have little impact on AMPAR conductance (Soto et al., 2007), nor do they appear to affect EPSC kinetics (Liu and Cull-Candy, 2002); the latter is consistent with our findings. Therefore the developmental reduction in postsynaptic strength most likely results from fewer AMPARs activated by the release of glutamate from the fusion of a single vesicle. “

      The authors correctly identify the relationship between local dendritic resistance and the reduction of driving force, but they assume the same relationship for young SCs as well in their model. This assumption is not supported by recordings, and as there are several publications about the disparity of input impedance for young versus adult cells (Schmidt-Hieber, Bischoffberger 2007).

      The input resistance of the dendrite will indeed determine local depolarization and loss of driving force. However, its impact on dendritic integration depends on it precise value, and perhaps the reviewer thought we “assumed” that the input resistance to be the same between immature and adult SCs. This was not the case, and we have since clarified this in the manuscript. We performed three important measurements that support a loss of driving force in immature SCs (for reference, the input resistance for an infinite cable is described by the following equation (Rn= sqrt(RmRi/2)/(2pi*r^(3/2)), where r is the dendrite radius):

      1) The input resistance is inversely proportional to the dendritic diameter, which we measured to be only slightly larger in immature SCs (0.47 versus 0.41 μm). This result is described in Figure 2.

      2) We measured the membrane time constant, which provides an estimate of the total membrane conductance multiplied by the total capacitance. The values between the two ages were similar, suggesting a slightly larger membrane resistance to compensate the smaller total membrane capacitance of the immature SCs. This was explicitly accounted for when performing the simulations using reconstructed immature and adult SCs (Figure 2 and 7 and 8) by adjusting the specific membrane resistance until the simulated membrane time constant matched experimental values. These values were not clearly mentioned and are now included on line 233 in the Results and 704 in the Methods.

      3) We directly examined paired-pulse facilitation of synapses onto immature SC dendrites versus that for somatic synapses. We previously showed in adult SCs that sublinear summation of synaptic responses, due to loss of synaptic current driving force (Tran- Van-Minh et al. 2016), manifests in decreased facilitation for dendritic synapses (Abrahamsson et al. 2012). Figure 8A shows that indeed dendritic facilitation was less than observed in the soma. We have now modified Figure 8 to include the results of the simulations showing that the biophysical model could reproduce this difference in shortterm plasticity (Figure 8B).

      Together, we believe these measurements support the presence of similar sublinear summation mechanisms in immature SCs.

      2) The authors use extracellular stimulation of parallel fibers. The authors note that due to the orientation of the PF, and the slicing angle, they can restrict the spatial extent of the stimuli. However, this method does not guarantee that the stimulated fibers will all connect to the same dendritic branch. Whether two stimulated synapses connect to the same dendrite or not can heavily influence summation. This is especially a great concern for these cells as the Scholl analysis showed that young and adult SC cells have different amount of distal dendrites. Therefore, if the stimulated axons connect to several different neighboring dendrites instead of the one or two in case of young SC cells, then the model calculations and the conclusions about the summation rules may be erroneous.

      We selected isolated dendrites and delivered voltage stimuli using small diameter glass electrodes (~ 1 μm) 10 - 15 V above threshold to stimulate single dendrites. This procedure excites GC axons in brain slices made from adult mice within less than 10 μm from the tip (Figure 2C, Tran-Van-Minh et al. 2016). It produces large dendritic depolarizations that are sufficient to decrease synaptic current driving force (Figure 1, Tran-Van-Minh et al. 2016). When we reproduced the conductance ratio using uncaging of single dendrites, we observed paired-pulse facilitation in the dendrites – suggesting that electrical stimulation activated synapses on common dendritic branches, or at least within close electrotonic distance to cause large dendritic depolarizations (Figure 7, Abrahamsson et al. 2012). Finally, we expect that the decreased branching in immature SCs further ensures that a majority of recorded synapses are contacting a common dendritic segment. We cannot rule out that occasionally some synaptic responses recorded at the soma are from synapses on different dendritic branches, but we do not see how this would alter our results and change our principal conclusions, particularly since this possible error only effects the interpretation of how many synapses are activated in paired-pulse experiments. The majority of the conclusions arise from the stimulation of single vesicle release events, and given the strikingly perpendicular orientation of GC axons, a 10 μm error in synapse location along a dendrite when we stimulated in the outthird would not alter our interpretations of the data.

    1. eLife Assessment

      This important work provides mechanistic insights into the development of cardiac arrhythmia and establishes a new experimental use case for optogenetics in studying cardiac electrophysiology. The agreement between computational models and experimental observations provides a convincing level of evidence that wave train-induced pacemaker activity can originate in continuously depolarized tissue, with the limitation that there may be differences between depolarization arising from constant optogenetic stimulation, as opposed to pathophysiological tissue depolarization. Future experiments in vivo and in other tissue preparations would extend the generality of these findings.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Teplenin and coworkers assesses the combined effects of localized depolarization and excitatory electrical stimulation in myocardial monolayers. They study the electrophysiological behaviour of cultured neonatal rat ventricular cardiomyocytes expressing the light-gated cation channel Cheriff, allowing them to induce local depolarization of varying area and amplitude, the latter titrated by the applied light intensity. In addition, they used computational modeling to screen for critical parameters determining state transitions, and for dissecting the underlying mechanisms. Two stable states, thus bistability, could be induced upon local depolarization and electrical stimulation, one state characterized by a constant membrane voltage and a second spontaneously firing, thus oscillatory state. The resulting 'state' of the monolayer was dependent on the duration and frequency of electrical stimuli, as well as the size of the illuminated area and the applied light intensity determining the degree of depolarization as well as the steepness of the local voltage gradient. In addition to the induction of oscillatory behaviour, they also tested frequency-dependent termination of induced oscillations.

      Strengths:

      The data from optogenetic experiments and computational modelling provide quantitative insights into the parameter space determining the induction of spontaneous excitation in the monolayer. The most important findings can also be reproduced using a strongly reduced computational model, suggesting that the observed phenomena might be more generally applicable.

      Weaknesses:

      While the study is thoroughly performed and provides interesting mechanistic insights into scenarios of ventricular arrhythmogenesis in the presence of localized depolarized tissue areas, the translational perspective of the study remains relatively vague. In addition, the chosen theoretical approach and the way the data is presented might make it difficult for the wider community of cardiac researchers to understand the significance of the study.

      Comments on Revision:

      The provided revisions address some of the raised concerns, but they do not change my general assessment of the paper, including its strengths and weaknesses.

    3. Reviewer #2 (Public review):

      In the presented manuscript, Teplenin and colleagues use both electrical pacing and optogenetic stimulation to create a reproducible, controllable source of ectopy in cardiomyocyte monolayers. To accomplish this, they use a careful calibration of electrical pacing characteristics (i.e., frequency, number of pulses) and illumination characteristics (i.e., light intensity, surface area) to show that there exists a "sweet spot" where oscillatory excitations can emerge proximal to the optogenetically depolarized region following electrical pacing cessation, akin to pacemaker cells. Furthermore, the authors demonstrate that a high-frequency electrical wave-train can be used to terminate these oscillatory excitations. The authors observed this oscillatory phenomenon both in vitro (using neonatal rat ventricular cardiomyocyte monolayers) and in silico (using a computational action potential model of the same cell type). These are surprising findings and provide a novel approach for studying triggered activity in cardiac tissue.

      The study is extremely thorough and one of the more memorable and grounded applications of cardiac optogenetics in the past decade. One of the benefits of the authors' "two-prong" approach of experimental preps and computational models is that they could probe the number of potential variable combinations much deeper than through in vitro experiments alone. The strong similarities between the real-life and computational findings suggest that these oscillatory excitations are consistent, reproducible, and controllable.

      Triggered activity, which can lead to ventricular arrhythmias and cardiac sudden death, has been largely contributed to sub-cellular phenomena, such as early or delayed afterdepolarizations, and thus to date has largely been studied in isolated single cardiomyocytes. However, these findings have been difficult to translate to tissue- and organ-scale experiments, as well-coupled cardiac tissue has notably different electrical properties. This underscores the significance of the study's methodological advances: use of a constant depolarizing current in a subset of (illuminated) cells to reliably result in triggered activity could facilitate the more consistent evaluation of triggered activity at various scales. An experimental prep that is both repeatable and controllable (i.e., both initiated and terminated through the same means) is a boon for further inquiry.

      The authors also substantially explored phase space and single cell analyses to document how this "hidden" bi-stable phenomenon can be uncovered during emergent collective tissue behavior. Calibration and testing of different aspects (e.g.: light intensity, illuminated surface area, electrical pulse frequency, electrical pulse count) and other deeper analyses, as illustrated in Figures S3-S8 and Video S1, are significant and commendable.

      Given the study is computational, it is surprising that the authors did not replicate their findings using well-validated adult ventricular cardiomyocyte action potential models, such ten Tusscher 2006 or O'Hara 2011. This may have felt out-of-scope, given the nice alignment of rat cardiomyocyte data between in vitro and in silico experiments. However, it would have been helpful peace-of-mind validation, given the significant ionic current differences between neonatal rat and adult ventricular tissue. It is not fully clear whether the pulse trains could have resulted in the same bi-stable oscillatory behavior, given the longer APD of humans relative to rats. The observed phenomenon certainly would be frequency-dependent and would have required tedious calibration for a new cell type, albeit partially mitigated by the relative ease of in silico experiments.

      There are likely also mechanistic differences between this optogenetically-tied oscillatory behavior and triggered activity observed in other studies. This is because the constant light-elicited depolarizing current is disrupting the typical resting cardiomyocyte state, thereby altering the balance between depolarizing ionic currents (such as Na+ and Ca2+) and repolarizing ionic currents (such as K+ and Ca2+). The oscillatory excitations appear to later emerge at the border of the illuminated region and non-stimulated surrounding tissue, which is likely an area of high source-sink mismatch. The authors appear to acknowledge differences in this oscillatory behavior and previous sub-cellular triggered activity research in their discussion of ectopic pacemaker activity, which are canonically observed in genetic, pharmacologic, or pathological ionic conditions. Regardless, it is exciting to see new ground being broken in this difficult-to-characterize experimental space, even if the method illustrated here may not necessarily be broadly applicable.

      Comments on revisions:

      I have read the authors' rebuttal to our earlier comments and do not have any further questions or comments. Thank you for implementing the minor improvements to Figure visualizations and for creating Video S1 to accompany the article.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Teplenin and coworkers assesses the combined effects of localized depolarization and excitatory electrical stimulation in myocardial monolayers. They study the electrophysiological behaviour of cultured neonatal rat ventricular cardiomyocytes expressing the light-gated cation channel Cheriff, allowing them to induce local depolarization of varying area and amplitude, the latter titrated by the applied light intensity. In addition, they used computational modeling to screen for critical parameters determining state transitions and to dissect the underlying mechanisms. Two stable states, thus bistability, could be induced upon local depolarization and electrical stimulation, one state characterized by a constant membrane voltage and a second, spontaneously firing, thus oscillatory state. The resulting 'state' of the monolayer was dependent on the duration and frequency of electrical stimuli, as well as the size of the illuminated area and the applied light intensity, determining the degree of depolarization as well as the steepness of the local voltage gradient. In addition to the induction of oscillatory behaviour, they also tested frequency-dependent termination of induced oscillations.

      Strengths:

      The data from optogenetic experiments and computational modelling provide quantitative insights into the parameter space determining the induction of spontaneous excitation in the monolayer. The most important findings can also be reproduced using a strongly reduced computational model, suggesting that the observed phenomena might be more generally applicable.

      Weaknesses:

      While the study is thoroughly performed and provides interesting mechanistic insights into scenarios of ventricular arrhythmogenesis in the presence of localized depolarized tissue areas, the translational perspective of the study remains relatively vague. In addition, the chosen theoretical approach and the way the data are presented might make it difficult for the wider community of cardiac researchers to understand the significance of the study.

      Reviewer #2 (Public review):

      In the presented manuscript, Teplenin and colleagues use both electrical pacing and optogenetic stimulation to create a reproducible, controllable source of ectopy in cardiomyocyte monolayers. To accomplish this, they use a careful calibration of electrical pacing characteristics (i.e., frequency, number of pulses) and illumination characteristics (i.e., light intensity, surface area) to show that there exists a "sweet spot" where oscillatory excitations can emerge proximal to the optogenetically depolarized region following electrical pacing cessation, akin to pacemaker cells. Furthermore, the authors demonstrate that a high-frequency electrical wave-train can be used to terminate these oscillatory excitations. The authors observed this oscillatory phenomenon both in vitro (using neonatal rat ventricular cardiomyocyte monolayers) and in silico (using a computational action potential model of the same cell type). These are surprising findings and provide a novel approach for studying triggered activity in cardiac tissue.

      The study is extremely thorough and one of the more memorable and grounded applications of cardiac optogenetics in the past decade. One of the benefits of the authors' "two-prong" approach of experimental preps and computational models is that they could probe the number of potential variable combinations much deeper than through in vitro experiments alone. The strong similarities between the real-life and computational findings suggest that these oscillatory excitations are consistent, reproducible, and controllable.

      Triggered activity, which can lead to ventricular arrhythmias and cardiac sudden death, has been largely attributed to sub-cellular phenomena, such as early or delayed afterdepolarizations, and thus to date has largely been studied in isolated single cardiomyocytes. However, these findings have been difficult to translate to tissue and organ-scale experiments, as well-coupled cardiac tissue has notably different electrical properties. This underscores the significance of the study's methodological advances: the use of a constant depolarizing current in a subset of (illuminated) cells to reliably result in triggered activity could facilitate the more consistent evaluation of triggered activity at various scales. An experimental prep that is both repeatable and controllable (i.e., both initiated and terminated through the same means).

      The authors also substantially explored phase space and single-cell analyses to document how this "hidden" bi-stable phenomenon can be uncovered during emergent collective tissue behavior. Calibration and testing of different aspects (e.g., light intensity, illuminated surface area, electrical pulse frequency, electrical pulse count) and other deeper analyses, as illustrated in Appendix 2, Figures 3-8, are significant and commendable.

      Given that the study is computational, it is surprising that the authors did not replicate their findings using well-validated adult ventricular cardiomyocyte action potential models, such as ten Tusscher 2006 or O'Hara 2011. This may have felt out of scope, given the nice alignment of rat cardiomyocyte data between in vitro and in silico experiments. However, it would have been helpful peace-of-mind validation, given the significant ionic current differences between neonatal rat and adult ventricular tissue. It is not fully clear whether the pulse trains could have resulted in the same bi-stable oscillatory behavior, given the longer APD of humans relative to rats. The observed phenomenon certainly would be frequency-dependent and would have required tedious calibration for a new cell type, albeit partially mitigated by the relative ease of in silico experiments.

      For all its strengths, there are likely significant mechanistic differences between this optogenetically tied oscillatory behavior and triggered activity observed in other studies. This is because the constant light-elicited depolarizing current is disrupting the typical resting cardiomyocyte state, thereby altering the balance between depolarizing ionic currents (such as Na+ and Ca2+) and repolarizing ionic currents (such as K+ and Ca2+). The oscillatory excitations appear to later emerge at the border of the illuminated region and non-stimulated surrounding tissue, which is likely an area of high source-sink mismatch. The authors appear to acknowledge differences in this oscillatory behavior and previous sub-cellular triggered activity research in their discussion of ectopic pacemaker activity, which is canonically expected more so from genetic or pathological conditions. Regardless, it is exciting to see new ground being broken in this difficult-to-characterize experimental space, even if the method illustrated here may not necessarily be broadly applicable.

      We thank the reviewers for their thoughtful and constructive feedback, as well as for recognizing the conceptual and technical strengths of our work. We are especially pleased that our integrated use of optogenetics, electrical pacing, and computational modelling was seen as a rigorous and innovative approach to investigating spontaneous excitability in cardiac tissue.

      At the core of our study was the decision to focus exclusively on neonatal rat ventricular cardiomyocytes. This ensured a tightly controlled and consistent environment across experimental and computational settings, allowing for direct comparison and deeper mechanistic insight. While extending our findings to adult or human cardiomyocytes would enhance translational relevance, such efforts are complicated by the distinct ionic properties and action potential dynamics of these cells, as also noted by Reviewer #2. For this foundational study, we chose to prioritize depth and clarity over breadth.

      Our computational domain was designed to faithfully reflect the experimental system. The strong agreement between both domains is encouraging and supports the robustness of our framework. Although some degree of theoretical abstraction was necessary (thereby sometimes making it a bit harder to read), it reflects the intrinsic complexity of the collective behaviours we aimed to capture such as emergent bi-stability. To make these ideas more accessible, we included simplified illustrations, a reduced model, and extensive supplementary material.

      A key insight from our work is the emergence of oscillatory behaviour through interaction of illuminated and non-illuminated regions. Rather than replicating classical sub-cellular triggered activity, this behaviour arises from systems-level dynamics shaped by the imposed depolarizing current and surrounding electrotonic environment. By tuning illumination and local pacing parameters, we could reproducibly induce and suppress these oscillations, thereby providing a controllable platform to study ectopy as a manifestation of spatial heterogeneity and collective dynamics.

      Altogether, our aim was to build a clear and versatile model system for investigating how spatial structure and pacing influence the conditions under which bistability becomes apparent in cardiac tissue. We believe this platform lays strong groundwork for future extensions into more physiologically and clinically relevant contexts.

      In revising the manuscript, we carefully addressed all points raised by the reviewers. We have also responded to each of their specific comments in detail, which are provided below.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      Please find my specific comments and suggestions below:

      (1) Line 64: When first introduced, the concept of 'emergent bi-stability' may not be clear to the reader.

      We concur that the full breadth of the concept of emergent bi-stability may not be immediately clear upon first mention. Nonetheless, its components have been introduced separately: “emergent” was linked to multicellular behaviour in line 63, while “bi-stability” was described in detail in lines 39–56. We therefore believe that readers could form an intuitive understanding of the combined term, which will be further clarified as the manuscript develops. To further ease comprehension of the reader, we have added the following clarification to line 64:

      “Within this dynamic system of cardiomyocytes, we investigated emergent bi-stability (a concept that will be explained more thoroughly later on) in cell monolayers under the influence of spatial depolarization patterns.”

      (2) Lines 67-80: While the introduction until line 66 is extremely well written, the introduction of both cardiac arrhythmia and cardiac optogenetics could be improved. It is especially surprising that miniSOG is first mentioned as a tool for optogenetic depolarisation of cardiomyocytes, as the authors would probably agree that Channelrhodopsins are by far the most commonly applied tools for optogenetic depolarisation (please also refer to the literature by others in this respect). In addition, miniSOG has side effects other than depolarisation, and thus cannot be the tool of choice when not directly studying the effects of oxidative stress or damage.

      The reviewer is absolutely correct in noting that channelrhodopsins are the most commonly applied tools for optogenetic depolarisation. We introduced miniSOG primarily for historical context: the effects of specific depolarization patterns on collective pacemaker activity were first observed with this tool (Teplenin et al., 2018). In that paper, we also reported ultralong action potentials, occurring as a side effect of cumulative miniSOG-induced ROS damage. In the following paragraph (starting at line 81), we emphasize that membrane potential can be controlled much better using channelrhodopsins, which is why we employed them in the present study.

      (3) Line 78: I appreciate the concept of 'high curvature', but please always state which parameter(s) you are referring to (membrane voltage in space/time, etc?).

      We corrected our statement to include the specification of space curvature of the depolarised region:

      “In such a system, it was previously observed that spatiotemporal illumination can give rise to collective behaviour and ectopic waves (Teplenin et al. (2018)) originating from illuminated/depolarised regions (with high spatial curvature).”

      (4) Line 79: 'bi-stable state' - not yet properly introduced in this context.

      The bi-stability mentioned here refers back to single cell bistability introduced in Teplenin et al. (2018), which we cited again for clarity.

      “These waves resulted from the interplay between the diffusion current and the single cell bi-stable state (Teplenin et al. (2018)) that was induced in the illuminated region.”

      (5) Line 84-85: 'these ion channels allow the cells to respond' - please describe the channel used; and please correct: the channels respond to light, not the cells. Re-ordering this paragraph may help, because first you introduce channels for depolarization, then you go back to both de- and hyperpolarization. On the same note, which channels can be used for hyperpolarization of cardiomyocytes? I am not aware of any, even WiChR shows depolarizing effects in cardiomyocytes during prolonged activation (Vierock et al. 2022). Please delete: 'through a direct pathway' (Channelrhodopsins a directly light-gated channels, there are no pathways involved).

      We realised that the confusion arose from our use of incorrect terminology: we mistakenly wrote hyperpolarisation instead of repolarisation. In addition to channelrhodopsins such as WiChR, other tools can also induce a repolarising effect, including light-activatable chloride pumps (e.g., JAWS). However, to improve clarity, we recognize that repolarisation is not relevant to our manuscript and therefore decided to remove its mention (see below). Regarding the reported depolarising effects of WiChR in Vierock et al. (2022), we speculate that these may arise either from the specific phenotype of the cardiomyocytes used in the study, i.e. human induced pluripotent stem cell-derived atrial myocytes (aCMs), or from the particular ionic conditions applied during patch-clamp recordings (e.g., a bath solution containing 1 mM KCl). Notably, even after prolonged WiChR activation, the aCMs maintained a strongly negative maximum diastolic potential of approximately –55 mV.

      “Although effects of illuminating miniSOG with light might lead to formation of depolarised areas, it is difficult to control the process precisely since it depolarises cardiomyocytes indirectly. Therefore, in this manuscript, we used light-sensitive ion channels to obtain more refined control over cardiomyocyte depolarisation. These ion channels allow the cells to respond to specific wavelengths of light, facilitating direct depolarisation (Ördög et al. (2021, 2023)). By inducing cardiomyocyte depolarisation only in the illuminated areas, optogenetics enables precise spatiotemporal control of cardiac excitability, an attribute we exploit in this manuscript (Appendix 2 Figure 1).”

      (6) Figure 1: What would be the y-axis of the 'energy-like curves' in B? What exactly did you plot here?

      The graphs in Figure 1B are schematic representations intended to clarify the phenomenon for the reader. They do not depict actual data from any simulation or experiment. We clarified this misunderstanding by specifying that Figure 1B is a schematic representation of the effects at play in this paper.

      “(B) Schematic representation showing how light intensity influences collective behaviour of excitable systems, transitioning between a stationary state (STA) at low illumination intensities and an oscillatory state (OSC) at high illumination intensities. Bi-stability occurs at intermediate light intensities, where transitions between states are dependent on periodic wave train properties. TR. OSC, transient oscillations.”

      To expand slightly beyond the paper: our schematic representation was inspired by a common visualization in dynamical systems used to illustrate bi-stability (for an example, see Fig. 3 in Schleimer, J. H., Hesse, J., Contreras, S. A., & Schreiber, S. (2021). Firing statistics in the bistable regime of neurons with homoclinic spike generation. Physical Review E, 103(1), 012407.). In this framework, the y-axis can indeed be interpreted as an energy landscape, which is related to a probability measure through the Boltzmann distribution: . Here, p denotes the probability of occupying a particular state (STA or OSC). This probability can be estimated from the area (BCL × number of pulses) falling within each state, as shown in Fig. 4C. Since an attractor corresponds to a high-probability state, it naturally appears as a potential well in the landscape.

      (7) Lines 92-93: 'this transition resulted for the interaction of an illuminated region with depolarized CM and an external wave train' - please consider rephrasing (it is not the region interacting with depolarized CM; and the external wave train could be explained more clearly).

      We rephrased our unclear sentence as follows:

      “This transition resulted from the interaction of depolarized cardiomyocytes in an illuminated region with an external wave train not originating from within the illuminated region.”

      (8) Figure 2 and elsewhere: When mentioning 'frequency', please state frequency values and not cycle lengths. Please also reconsider your distinction between high and low frequencies; 200 ms (5 Hz) is actually the normal heart rate for neonatal rats (300 bpm).

      In the revised version, we have clarified frequency values explicitly and included them alongside period values wherever frequency is mentioned, to avoid any ambiguity. We also emphasize that our use of "high" and "low" frequency is strictly a relative distinction within the context of our data, and not meant to imply a biological interpretation.

      (9) Lines 129-131: Why not record optical maps? Voltage dynamics in the transition zone between depolarised and non-depolarised regions might be especially interesting to look at?

      We would like to clarify that optical maps were recorded for every experiment, and all experimental traces of cardiac monolayer activity were derived from these maps. We agree with the reviewer that the voltage dynamics in the transition zone are particularly interesting. However, we selected the data representations that, in our view, best highlight the main mechanisms. When we analysed full voltage profiles, they didn’t add extra insights to this main mechanism. As the other reviewer noted, the manuscript already presents a wide range of regimes, so we decided not to introduce further complexity.

      (10) Lines 156-157: Why was the model not adapted to match the biophysical properties (e.g., kinetics, ion selectivity, light sensitivity) of Cheriff?

      The model was not adapted to the biophysical properties of Cheriff, because this would entail a whole new study involving extensive patch-clamping experiments, fitting, and calibration to model the correct properties of the ion channel. Beyond considerations of time efficiency, incorporating more specific modelling parameters would not change the essence of our findings. While numeric parameter ranges might shift, the core results would remain unchanged. This is a result of our experimental design where we applied constant illumination of long duration (6s or longer), thus making a difference in kinetical properties of an optogenetic tool irrelevant. In addition, we were able to observe qualitatively similar phenomena using many other depolarising optogenetic tools (e.g. ChR2, ReaChR, CatCh and more) in our in-vitro experiments. We ended up with Cheriff as our optotool-of-choice for the practical reasons of good light-sensitivity and a non-overlapping spectrum with our fluorescent dyes.

      Therefore, computationally using a more general depolarising ion channel hints at the more general applicability of the observed phenomena, supporting our claim of a universal mechanism  (demonstrated experimentally with CheRiff and computationally with ChR2).

      (11) Line 158: 1.7124 mW/mm^2 - While I understand that this is the specific intensity used as input in the model, I am convinced that the model is not as accurate to predict behaviour at this specific intensity (4 digits after the comma), especially given that the model has not been adapted to Cheriff (probably more light sensitive than ChR2). Can this be rephrased?

      We did not aim for quantitative correspondence between the computational model and the biological experiments, but rather for qualitative agreement and mechanistic insight (see line 157). Qualitative comparisons are computationally obtained in a whole range of different intensities, as demonstrated in the 3D diagram of Fig. 4C. We wanted to demonstrate that at one fixed light intensity (chosen to be 1.7124 mW/mm^2 for the most clear effect), it was possible for all three states (STA, OSC. TR. OSC.) to coexist depending on the number of pulses and their period. Therefore the specific intensity used in the computational model is correct, and for reproducibility, we have left it unchanged while clarifying that it refers specifically to the in silico model:

      “Simulating at a fixed constant illumination of 1.7124 𝑚𝑊∕𝑚𝑚<sup>2</sup> and a fixed number of 4 pulses, frequency dependency of collective bi-stability was reproduced in Figure 4A.”

      (12) Lines 160, 165, and elsewhere: 'Once again, Once more' - please delete or rephrase.

      We agree that we could have written these binding words better and reformulated them to:

      “Similar to the experimental observations, only intermediate electrical pacing frequencies (500-𝑚𝑠 period) caused transitions from collective stationary behaviour to collective oscillatory behaviour and ectopic pacemaker activity had periods (710 𝑚𝑠) that were different from the stimulation train period (500 𝑚𝑠). Figure 4B shows the accumulation of pulses necessary to invoke a transition from the collective stationary state to the collective oscillatory state at a fixed stimulation period (600 𝑚𝑠). Also in the in silico simulations, ectopic pacemaker activity had periods (750 𝑚𝑠) that were different from the stimulation train period (600 𝑚𝑠). Also for the transient oscillatory state, the simulations show frequency selectivity (Appendix 2 Figure 4B).”

      (13) Line 171: 'illumination strength': please refer to 'light intensity'.

      We have revised our formulation to now refer specifically to “light intensity”:

      “We previously identified three important parameters influencing such transitions: light intensity, number of pulses, and frequency of pulses.”

      (14) Lines 187-188: 'the illuminated region settles into this period of sending out pulses' - please rephrase, the meaning is not clear.

      We reformulated our sentence to make its content more clear to the reader:

      “For the conditions that resulted in stable oscillations, the green vertical lines in the middle and right slices represent the natural pacemaker frequency in the oscillatory state. After the transition from the stationary towards the oscillatory state, oscillatory pulses emerging from the illuminated region gradually dampen and stabilize at this period, corresponding to the natural pacemaker frequency.”

      (15) Figure 7: A)- please state in the legend which parameter is plotted on the y-axis (it is included in the main text, but should be provided here as well); C) The numbers provided in brackets are confusing. Why is (4) a high pulse number and (3) a low pulse number? Why not just state the number of pulses and add alpha, beta, gamma, and delta for the panels in brackets? I suggest providing the parameters (e.g., 800 ms cycle length, 2 pulses, etc) for all combinations, but not rate them with low, high, etc. (see also comment above).

      We appreciate the reviewer’s comments and have revised the caption for figure 7, which now reads as follows:

      “Figure 7. Phase plane projections of pulse-dependent collective state transitions. (A) Phase space trajectories (displayed in the Voltage – x<sub>r</sub> plane) of the NRVM computational model show a limit cycle (OSC) that is not lying around a stable fixed point (STA). (B) Parameter space slice showing the relationship between stimulation period and number of pulses for a fixed illumination intensity (1.72 𝑚𝑊 ∕𝑚𝑚2) and size of the illuminated area (67 pixels edge length). Letters correspond to the graphs shown in C. (C) Phase space trajectories for different combinations of stimulus train period and number of pulses (α: 800 ms cycle length + 2 pulses, β: 800 ms cycle length + 4 pulses, γ: 250 ms cycle length + 3 pulses, δ: 250 ms cycle length + 8 pulses). α and δ do not result in a transition from the resting state to ectopic pacemaker activity, as under these circumstances the system moves towards the stationary stable fixed point from outside and inside the stable limit cycle, respectively. However, for β and γ, the stable limit cycle is approached from outside and inside, respectively, and ectopic pacemaker activity is induced.”

      (16) Line 258: 'other dimensions by the electrotonic current' - not clear, please rephrase and explain.

      We realized that our explanation was somewhat convoluted and have therefore changed the text as follows:

      “Rather than producing oscillations, the system returns to the stationary state along dimensions other than those shown in Figure 7C (Voltage and x<sub>r</sub>), as evidenced by the phase space trajectory crossing itself. This return is mediated by the electrotonic current.”

      (17) Line 263: ‘increased too much’ – please rephrase using scientific terminology.

      We rephrased our sentence to:

      “However, this is not a Hopf bifurcation, because in that case the system would not return to the stationary state when the number of pulses exceeds a critical threshold.”

      (18) Line 275: 'stronger diffusion/electrotonic influence from the non-illuminated region' - not sure diffusion is the correct term here. Please explain by taking into account the membrane potential. Please make sure to use proper terminology. The same applies to lines 281-282.

      We appreciate this comment, which prompted us to revisit on our text. We realised that some sections could be worded more clearly, and we also identified an error in the legend of Supplementary Figure 7. The corresponding corrections are provided below:

      “However, repolarisation reserve does have an influence, prolonging the transition when it is reduced (Appendix 2 Figure 7). This effect can be observed either by moving further from the boundary of the illuminated region, where the electrotonic influence from the non-illuminated region is weaker, or by introducing ionic changes, such as a reduction in I<sub>Ks</sub> and/or I<sub>to</sub>. For example, because the electrotonic influence is weaker in the center of the illuminated region, the voltage there is not pulled down toward the resting membrane potential as quickly as in cells at the border of the illuminated zone.”

      “To add a multicellular component to our single cell model we introduced a current that replicates the effect of cell coupling and its associated electrotonic influence.”

      “Figure 7. The effect of ionic changes on the termination of pacemaker activity. The mechanism that moves the oscillating illuminated tissue back to the stationary state after high frequency pacing is dependent on the ionic properties of the tissue, i.e. lower repolarisation reserves (20% 𝐼<sub>𝐾𝑠</sub> + 50% 𝐼<sub>𝑡𝑜</sub>) are associated with longer transition times.”

      (19) Line 289: -58 mV (to be corrected), -20 mV, and +50 mV - please justify the selection of parameters chosen. This also applies elsewhere- the selection of parameters seems quite arbitrary, please make sure the selection process is more transparent to the reader.

      Our choice of parameters was guided by the dynamical properties of the illuminated cells as well as by illustrative purposes. The value of –58 mV corresponds to the stimulation threshold of the model. The values of 50 mV and –20 mV match those used for single-cell stimulation (Figure 8C2, right panel), producing excitable and bistable dynamics, respectively. We refer to this point in line 288 with the phrase “building on this result.” To maintain conciseness, we did not elaborate on the underlying reasoning within the manuscript and instead reported only the results.

      We also corrected the previously missed minus sign: -58 mV.

      (20) Figure 8 and corresponding text: I don't understand what stimulation with a voltage means. Is this an externally applied electric field? Or did you inject a current necessary to change the membrane voltage by this value? Please explain.

      Stimulation with a specific voltage is a standard computational technique and can be likened to performing a voltage-clamp experiment on each individual cell. In this approach, the voltage of every cell in the tissue is briefly forced to a defined value.

      (21) Figure 8C- panel 2: Traces at -20 mV and + 50 mV are identical. Is this correct? Please explain.

      Yes, that is correct. The cell responds similarly to a voltage stimulus of -20 mV or one of 50 mV, because both values are well above the excitation threshold of a cardiomyocyte.

      (22) Line 344 and elsewhere: 'diffusion current' - This is probably not the correct terminology for gap-junction mediated currents. Please rephrase.

      A diffusion current is a mathematical formulation for a gap junction mediated current here, so , depending on the background of the reader, one of the terms might be used focusing on different aspects of the results. In a mathematical modelling context one often refers to a diffusion current because cardiomyocytes monolayers and tissues can be modelled using a reaction-diffusion equation. From the context of fine-grain biological and biophysical details, one uses the term gap-junction mediated current. Our choice is motivated by the main target audience we have in mind, namely interdisciplinary researchers with a core background in the mathematics/physics/computer science fields.

      However, to not exclude our secondary target audience of biological and medical readers we now clarified the terminology, drawing the parallel between the different fields of study at line 79:

      “These waves resulted from the interplay between the diffusion current (also known in biology/biophysics as the gap junction mediated current) and the bi-stable state that was induced in the illuminated region.”

      (23) Lines 357-58: 'Such ectopic sources are typically initiated by high frequency pacing' - While this might be true during clinical testing, how would you explain this when not externally imposed? What could be biological high-frequency triggers?

      Biological high-frequency triggers could include sudden increases in heart rates, such as those induced by physical activity or emotional stress. Another possibility is the occurrence of paroxysmal atrial or ventricular fibrillation, which could then give rise to an ectopic source.

      (24) Lines 419-420: 'large ionic cell currents and small repolarising coupling currents'. Are coupling currents actually small in comparison to cellular currents? Can you provide relative numbers (~ratio)?

      Coupling currents are indeed small compared to cellular currents. This can be inferred from the I-V curve shown in Figure 8C1, which dips below 0 and creates bi-stability only because of the small coupling current. If the coupling current were larger, the system would revert to a monostable regime. To make this more concrete, we have now provided the exact value of the coupling current used in Figure 8C1.

      “Otherwise, if the hills and dips of the N-shaped steady-state IV curve were large (Figure 8C-1), they would have similar magnitudes as the large currents of fast ion channels, preventing the subtle interaction between these strong ionic cell currents and the small repolarising coupling currents (-0.103649 ≈ 0.1 pA).”

      (25) Line 426: Please explain how ‘voltage shocks’ were modelled.

      We would like to refer the reviewer to our response to comment (20) regarding how we model voltage shocks. In the context of line 426, a typical voltage shock corresponds to a tissue-wide stimulus of 50 mV. Independent of our computational model, line 426 also cites other publications showing that, in clinical settings, high-voltage shocks are unable to terminate ectopic sustained activity, consistent with our findings.

      (26) Lines 429 ff: 0.2pA/pF would correspond to 20 pA for a small cardiomyocyte of 100 pF, this current should be measurable using patch-clamp recordings.

      In trying to be succinct, we may have caused some confusion. The difference between the dips (-0.07 pA/pF) and hills (_≈_0.11 pA/pF) is approximately 0.18 pA/pF. For a small cardiomyocyte, this corresponds to deviations from zero of roughly ±10 pA. Considering that typical RMS noise levels in whole-cell patch-clamp recordings range from 2-10 pA , it is understandable that detecting these peaks and dips in an I-V curve (average current after holding a voltage for an extended period)  is difficult. Achieving statistical significance would therefore require patching a large number of cells.

      Given the already extensive scope of our manuscript in terms of techniques and concepts, we decided not to pursue these additional patch-clamp experiments.

      Reviewer #2 (Recommendations for the authors):

      Given the deluge of conditions to consider, there are several areas of improvement possible in communicating the authors' findings. I have the following suggestions to improve the manuscript.

      (1) Please change "pulse train" straight pink bar OR add stimulation marks (such as "*", or individual pulse icons) to provide better visual clarity that the applied stimuli are "short ON, long OFF" electrical pulses. I had significant initial difficulty understanding what the pulse bars represented in Figures 2, 3, 4A-B, etc. This may be partially because stimuli here could be either light (either continuous or pulsed) or electrical (likely pulsed only). To me, a solid & unbroken line intuitively denotes a continuous stimulation. I understand now that the pink bar represents the entire pulse-train duration, but I think readers would be better served with an improvement to this indicator in some fashion. For instance, the "phases" were much clearer in Figures 7C and 8D because of how colour was used on the Vm(t) traces. (How you implement this is up to you, though!)

      We have addressed the reviewer’s concern and updated the figures by marking each external pulse with a small vertical line (see below).

      (2) Please label the electrical stimulation location (akin to the labelled stimulation marker in circle 2 state in Figure 1A) in at least Figures 2 and 4A, and at most throughout the manuscript. It is unclear which "edge" or "pixel" the pulse-train is originating from, although I've assumed it's the left edge of the 2D tissue (both in vitro and silico). This would help readers compare the relative timing of dark blue vs. orange optical signal tracings and to understand how the activation wavefront transverses the tissue.

      We indicated the pacing electrode in the optical voltage recordings with a grey asterisk. For the in silico simulations, the electrode was assumed to be far away, and the excitation was modelled as a parallel wave originating from the top boundary, indicated with a grey zone.

      (3) Given the prevalence of computational experiments in this study, I suggest considering making a straightforward video demonstrating basic examples of STA, OSC, and TR.OSC states. I believe that a video visualizing these states would be visually clarifying to and greatly appreciated by readers. Appendix 2 Figure 3 would be the no-motion visualization of the examples I'm thinking of (i.e., a corresponding stitched video could be generated for this). However, this video-generation comment is a suggestion and not a request.

      We have included a video showing all relevant states, which is now part of the Supplementary Material.

      (4) Please fix several typos that I found in the manuscript:

      (4A) Line 279: a comma is needed after i.e. when used in: "peculiar, i.e. a standard". However, this is possibly stylistic (discard suggestion if you are consistent in the manuscript).

      (4B) Line 382: extra period before "(Figure 3C)".

      (4C) Line 501: two periods at end of sentence "scientific purposes.." .

      We would like to thank the reviewer for pointing out these typos. We have corrected them and conducted an additional check throughout the manuscript for minor errors.

    1. eLife Assessment

      The authors investigate arrestin2-mediated CCR5 endocytosis in the context of clathrin and AP2 contributions. Using an extensive set of NMR experiments, and supported by microscopy and other biophysical assays, the authors provide compelling data on the roles of AP2 and clathrin in CCR5 endocytosis. This important work will appeal to an audience beyond those studying chemokine receptors, including those studying GPCR regulation and trafficking. The distinct role of AP2 and not clathrin will be of particular interest to those studying GPCR internalization mechanisms.

    2. Reviewer #1 (Public review):

      Petrovic et al. investigate CCR5 endocytosis via arrestin2, with a particular focus on clathrin and AP2 contributions. The study is thorough and methodologically diverse. The NMR titration data clearly demonstrate chemical shift changes at the canonical clathrin-binding site (LIELD), present in both the 2S and 2L arrestin splice variants. To assess the effect of arrestin activation on clathrin binding, the authors compare: truncated arrestin (1-393), full-length arrestin, and 1-393 incubated with CCR5 phosphopeptides. All three bind clathrin comparably, whereas controls show no binding. These findings are consistent with prior crystal structures showing peptide-like binding of the LIELD motif, with disordered flanking regions. The manuscript also evaluates a non-canonical clathrin binding site specific to the 2L splice variant. Though this region has been shown to enhance beta2-adrenergic receptor binding, it appears not to affect CCR5 internalization.

      Similar analyses applied to AP2 show a different result. AP2 binding is activation-dependent and influenced by the presence and level of phosphorylation of CCR5-derived phosphopeptides. These findings are reinforced by cellular internalization assays.

      In sum, the results highlight splice-variant-dependent effects and phosphorylation-sensitive arrestin-partner interactions. The data argue against a (rapidly disappearing) one-size-fits-all model for GPCR-arrestin signaling and instead support a nuanced, receptor-specific view, with one example summarized effectively in the mechanistic figure.

      Weaknesses:

      Figure 1 shows regions alphaFold model that are intrinsically disordered without making it clear that this is not an expected stable position. The authors NMR titration data are n=1. Many figure panels require that readers pinch and zoom to see the data.

    3. Reviewer #2 (Public review):

      Summary:

      Based on extensive live cell assays, SEC, and NMR studies of reconstituted complexes, these authors explore the roles of clathrin and the AP2 protein in facilitating clathrin mediated endocytosis via activated arrestin-2. NMR, SEC, proteolysis, and live cell tracking confirm a strong interaction between AP2 and activated arrestin using a phosphorylated C-terminus of CCR5. At the same time a weak interaction between clathrin and arrestin-2 is observed, irrespective of activation.

      These results contrast with previous observations of class A GPCRs and the more direct participation by clathrin. The results are discussed in terms of the importance of short and long phosphorylated bar codes in class A and class B endocytosis.

      Strengths:

      The 15N,1H and 13C,methyl TROSY NMR and assignments represent a monumental amount of work on arrestin-2, clathrin, and AP2. Weak NMR interactions between arrestin-2 and clathrin are observed irrespective of activation of arrestin. A second interface, proposed by crystallography, was suggested to be a possible crystal artifact. NMR establishes realistic information on the clathrin and AP2 affinities to activated arrestin with both kD and description of the interfaces.

      Weaknesses:

      This reviewer has identified only minor weaknesses with the study.

      (1) I don't observe two overlapping spectra of Arrestin2 (1-393) +/- CLTC NTD in Supp Figure 1

      (2) Arrestin-2 1-418 resonances all but disappear with CCR5pp6 addition. Are they recovered with Ap2Beta2 addition and is this what is shown in Supp Fig 2D

      (3) I don't understand how methyl TROSY spectra of arrestin2 with phosphopeptide could look so broadened unless there are sample stability problems?

      (4) At one point the authors added excess fully phosphorylated CCR5 phosphopeptide (CCR5pp6). Does the phosphopeptide rescue resolution of arrestin2 (NH or methyl) to the point where interaction dynamics with clathrin (CLTC NTD) are now more evident on the arrestin2 surface?

      (5) Once phosphopeptide activates arrestin-2 and AP2 binds can phosphopeptide be exchanged off? In this case, would it be possible for the activated arrestin-2 AP2 complex to re-engage a new (phosphorylated) receptor?

      (6) I'd be tempted to move the discussion of class A and class B GPCRs and their presumed differences to the intro and then motivate the paper with specific questions.

      (7) Did the authors ever try SEC measurements of arrestin-2 + AP2beta2+CCR5pp6 with and without PIP2, and with and without clathrin (CLTC NTD? The question becomes what the active complex is and how PIP2 modulates this cascade of complexation events in class B receptors.

    4. Reviewer #3 (Public review):

      Summary:

      Overall, this is a well-done study, and the conclusions are largely supported by the data, which will be of interest to the field.

      Strengths:

      Strengths of this study include experiments with solution NMR that can resolve high-resolution interactions of the highly flexible C-terminal tail of arr2 with clathrin and AP2. Although mainly confirmatory in defining the arr2 CBL 376LIELD380 as the clathrin binding site, the use of the NMR is of high interest (Fig. 1). The 15N-labeled CLTC-NTD experiment with arr2 titrations reveals a span from 39-108 that mediates an arr2 interaction, which corroborates previous crystal data, but does not reveal a second area in CLTC-NTD that in previous crystal structures was observed to interact with arr2.

      SEC and NMR data suggest that full-length arr2 (1-418) binding with 2-adaptin subunit of AP2 is enhanced in the presence of CCR5 phospho-peptides (Fig. 3). The pp6 peptide shows the highest degree of arr2 activation, and 2-adaptin binding, compared to less phosphorylated peptide or not phosphorylated at all. It is interesting that the arr2 interaction with CLTC NTD and pp6 cannot be detected using the SEC approach, further suggesting that clathrin binding is not dependent on arrestin activation. Overall, the data suggest that receptor activation promotes arrestin binding to AP2, not clathrin, suggesting the AP2 interaction is necessary for CCR5 endocytosis.

      To validate the solid biophysical data, the authors pursue validation experiments in a HeLa cell model by confocal microscopy. This requires transient transfection of tagged receptor (CCR5-Flag) and arr2 (arr2-YFP). CCR5 displays a "class B"-like behavior in that arr2 is rapidly recruited to the receptor at the plasma membrane upon agonist activation, which forms a stable complex that internalizes onto endosomes (Fig. 4). The data suggest that complex internalization is dependent on AP2 binding not clathrin (Fig. 5).

      The addition of the antagonist experiment/data adds rigor to the study.

      Overall, this is a solid study that will be of interest to the field.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Petrovic et al. investigate CCR5 endocytosis via arrestin2, with a particular focus on clathrin and AP2 contributions. The study is thorough and methodologically diverse. The NMR titration data are particularly compelling, clearly demonstrating chemical shift changes at the canonical clathrin-binding site (LIELD), present in both the 2S and 2L arrestin splice variants. 

      To assess the effect of arrestin activation on clathrin binding, the authors compare: truncated arrestin (1-393), full-length arrestin, and 1-393 incubated with CCR5 phosphopeptides. All three bind clathrin comparably, whereas controls show no binding. These findings are consistent with prior crystal structures showing peptide-like binding of the LIELD motif, with disordered flanking regions. The manuscript also evaluates a non-canonical clathrin binding site specific to the 2L splice variant. Though this region has been shown to enhance beta2-adrenergic receptor binding, it appears not to affect CCR5 internalization. 

      Similar analyses applied to AP2 show a different result. AP2 binding is activation-dependent and influenced by the presence and level of phosphorylation of CCR5-derived phosphopeptides. These findings are reinforced by cellular internalization assays. 

      In sum, the results highlight splice-variant-dependent effects and phosphorylation-sensitive arrestin-partner interactions. The data argue against a (rapidly disappearing) one-size-fitsall model for GPCR-arrestin signaling and instead support a nuanced, receptor-specific view, with one example summarized effectively in the mechanistic figure. 

      We thank the referee for this positive assessment of our manuscript. Indeed, by stepping away from the common receptor models for understanding internalization (b2AR and V2R), we revealed the phosphorylation level of the receptor as a key factor in driving the sequestration of the receptor from the plasma membrane. We hope that the proposed mechanistic model will aid further studies to obtain an even more detailed understanding of forces driving receptor internalization.

      Reviewer #2 (Public review): 

      Summary: 

      Based on extensive live cell assays, SEC, and NMR studies of reconstituted complexes, these authors explore the roles of clathrin and the AP2 protein in facilitating clathrin-mediated endocytosis via activated arrestin-2. NMR, SEC, proteolysis, and live cell tracking confirm a strong interaction between AP2 and activated arrestin using a phosphorylated C-terminus of CCR5. At the same time, a weak interaction between clathrin and arrestin-2 is observed, irrespective of activation. 

      These results contrast with previous observations of class A GPCRs and the more direct participation by clathrin. The results are discussed in terms of the importance of short and long phosphorylated bar codes in class A and class B endocytosis. 

      Strengths: 

      The 15N,1H, and 13C, methyl TROSY NMR and assignments represent a monumental amount of work on arrestin-2, clathrin, and AP2. Weak NMR interactions between arrestin-2 and clathrin are observed irrespective of the activation of arrestin. A second interface, proposed by crystallography, was suggested to be a possible crystal artifact. NMR establishes realistic information on the clathrin and AP2 affinities to activated arrestin, with both kD and description of the interfaces. 

      We sincerely thank the referee for this encouraging evaluation of our work and appreciate the recognition of the NMR efforts and insights into the arrestin–clathrin–AP2 interactions.

      Weaknesses: 

      This reviewer has identified only minor weaknesses with the study.

      (1) Arrestin-2 1-418 resonances all but disappear with CCR5pp6 addition. Are they recovered with Ap2Beta2 addition, and is this what is shown in Supplementary Figure 2D? 

      We believe the reviewer is referring to Figure 3 - figure supplement 1. In this figure, the panels E and F show resonances of arrestin2<sup>1-418</sup> (apo state shown with black outline) disappear upon the addition of CCR5pp6 (arrestin2<sup>1-418</sup>•CCR5pp6 complex spectrum in red). The panels C and D show resonances of arrestin2<sup>1-418</sup> (apo state shown with black outline), which remain unchanged upon addition of AP2b2<sup>701-937</sup> (orange), indicating no complex formation. We also recorded a spectrum of the arrestin2<sup>1-418</sup> •CCR5pp6 complex under addition of AP2b2 <sup>701-937</sup>(not shown), but the arrestin2 resonances in the arrestin2<sup>1418</sup> •CCR5pp6 complex were already too broad for further analysis. This had been already explained in the text.

      “In agreement with the AP2b2 NMR observations, no interaction was observed in the arrestin2 methyl and backbone NMR spectra upon addition of AP2b2 in the absence of phosphopeptide (Figure 3-figure supplement 1C, D). However, the significant line broadening of the arrestin2 resonances upon phosphopeptide addition (Figure 3-figure supplement 1E, F) precluded a meaningful assessment of the effect of the AP2b2 addition on arrestin2 in the presence of phosphopeptide””.

      (2) I don't understand how methyl TROSY spectra of arrestin2 with phosphopeptide could look so broadened unless there are sample stability problems. 

      We thank the referee for this comment. We would like to clarify that in general a broadened spectrum beyond what is expected from the rotational correlation time does not necessarily correlate with sample stability problems. It is rather evidence of conformational intermediate exchange on the micro- to millisecond time scale.

      The displayed <sup>1</sup>H-<sup>15</sup> N spectra of apo arrestin2 already suffer from line broadening due to such intrinsic mobility of the protein. These spectra were recorded with acquisition times of 50 ms (<sup>15</sup>N) and 55 ms (<sup>1</sup>H) and resolution-enhanced by a 60˚-shifted sine-bell filter for <sup>15</sup>N and a 60˚-shifted squared sine-bell filter for <sup>1</sup>H, respectively, which leads to the observed resolution with still reasonable sensitivity. The <sup>1</sup>H-<sup>15</sup> resonances in Fig. 1b (arrestin2<sup>1-393</sup>) look particularly narrow. However, this region contains a large number of flexible residues. The full spectrum, e.g. Figure 1-figure supplement 2, shows the entire situation with a clear variation of linewidths and intensities. The linewidth variation becomes stronger when omitting the resolution enhancement filters.

      The addition of the CCR5pp6 phosphopeptide does not change protein stability, which we assessed by measuring the melting temperature of arrestin2<sup>1-418</sup> and arrestin2<sup>1-418</sup> •CCR5pp6 complex (Tm = 57°C in both cases). We believe that the explanation for the increased broadening of the arrestin2 resonances is that addition of the CCR5pp6, possibly due to the release of the arrestin2 strand b20, amplifies the mentioned intermediate timescale protein dynamics. This results in the disappearance of arrestin2 resonances. 

      We have now included the assessment of arrestin2<sup>1-418</sup> and arrestin2<sup>1-418</sup> •CCR5pp6 stability in the manuscript:

      “The observed line broadening of arrestin2 in the presence of phosphopeptide must be a result of increased protein motions and is not caused by a decrease in protein stability, since the melting temperature of arrestin2 in the absence and presence of phosphopeptide are identical (56.9 ± 0.1 °C)”.

      (3) At one point, the authors added an excess fully phosphorylated CCR5 phosphopeptide (CCR5pp6). Does the phosphopeptide rescue resolution of arrestin2 (NH or methyl) to the point where interaction dynamics with clathrin (CLTC NTD) are now more evident on the arrestin2 surface? 

      Unfortunately, when we titrate arrestin2 with CCR5pp6 (please see Isaikina & Petrovic et. al, Mol. Cell, 2023 for more details), the arrestin2 resonances undergo fast-to-intermediate exchange upon binding. In the presence of phosphopeptide excess, very few resonances remain, the majority of which are in the disordered region, including resonances from the clathrin-binding loop. Due to the peak overlap, we could not unambiguously assign arrestin2 resonances in the bound state, which precluded our assessment of the arrestin2-clathrin interaction in the presence of phosphopeptide. We have made this now clearer in the paragraph ‘The arrestin2-clathrin interaction is independent of arrestin2 activation’

      “Due to significant line broadening and peak overlap of the arrestin2 resonances upon phosphopeptide addition, the influence of arrestin activation on the clathrin interaction could not be detected on either backbone or methyl resonances”.

      (4) Once phosphopeptide activates arrestin-2 and AP2 binds, can phosphopeptide be exchanged off? In this case, would it be possible for the activated arrestin-2 AP2 complex to re-engage a new (phosphorylated) receptor?

      This would be an interesting mechanism. In principle, this should be possible as long as the other (phosphorylated) receptor outcompetes the initial phosphopeptide with higher affinity towards the binding site. However, we do not have experiments to assess this process directly. Therefore, we rather wish not to further speculate.

      (5) Did the authors ever try SEC measurements of arrestin-2 + AP2beta2+CCR5pp6 with and without PIP2, and with and without clathrin (CLTC NTD? The question becomes what the active complex is and how PIP2 modulates this cascade of complexation events in class B receptors. 

      We thank the referee for this question. Indeed, we tested whether PIP2 can stabilize the arrestin2•CCR5pp6•AP2 complex by SEC experiments. Unfortunately, the addition of PIP2 increased the formation of arrestin2 dimers and higher oligomers, presumably due to the presence of additional charges. The resolution of SEC experiments was not sufficient to distinguish arrestin2 in oligomeric form or in arrestin2•CCR5pp6•AP2 complex. We now mention this in the text: 

      “We also attempted to stabilize the arrestin2-AP2b2-phosphopetide complex through the addition of PIP2, which can stabilize arrestin complexes with the receptor (Janetzko et al., 2022). The addition of PIP2 increased the formation of arrestin2 dimers and higher oligomers, presumably due to the presence of additional charges. Unfortunately, the resolution of the SEC experiments was not sufficient to separate the arrestin2 oligomers from complexes with AP2b2”.

      Reviewer #3 (Public review): 

      Summary: 

      Overall, this is a well-done study, and the conclusions are largely supported by the data, which will be of interest to the field. 

      Strengths: 

      (1) The strengths of this study include experiments with solution NMR that can resolve high-resolution interactions of the highly flexible C-terminal tail of arr2 with clathrin and AP2. Although mainly confirmatory in defining the arr2 CBL 376LIELD380 as the clathrin binding site, the use of the NMR is of high interest (Figure 1). The 15N-labeled CLTC-NTD experiment with arr2 titrations reveals a span from 39-108 that mediates an arr2 interaction, which corroborates previous crystal data, but does not reveal a second area in CLTC-NTD that in previous crystal structures was observed to interact with arr2.

      (2) SEC and NMR data suggest that full-length arr2 (1-418) binding with the 2-adaptin subunit of AP2 is enhanced in the presence of CCR5 phospho-peptides (Figure 3). The pp6 peptide shows the highest degree of arr2 activation and 2-adaptin binding, compared to less phosphorylated peptides or not phosphorylated at all. It is interesting that the arr2 interaction with CLTC NTD and pp6 cannot be detected using the SEC approach, further suggesting that clathrin binding is not dependent on arrestin activation. Overall, the data suggest that receptor activation promotes arrestin binding to AP2, not clathrin, suggesting the AP2 interaction is necessary for CCR5 endocytosis. 

      (3) To validate the solid biophysical data, the authors pursue validation experiments in a HeLa cell model by confocal microscopy. This requires transient transfection of tagged receptor (CCR5-Flag) and arr2 (arr2-YFP). CCR5 displays a "class B"-like behavior in that arr2 is rapidly recruited to the receptor at the plasma membrane upon agonist activation, which forms a stable complex that internalizes into endosomes (Figure 4). The data suggest that complex internalization is dependent on AP2 binding, not clathrin (Figure 5). 

      We thank the referee for the careful and encouraging evaluation of our work. We appreciate the recognition of the solidity of our data and the support for our conclusions regarding the distinct roles of AP2 and clathrin in arrestin-mediated receptor internalization.

      Weaknesses:

      The interaction of truncated arr2 (1-393) was not impacted by CCR5 phospho-peptide pp6, suggesting the interaction with clathrin is not dependent on arrestin activation (Figure 2). This raises some questions.

      We thank the referee for raising this concern, as we were also surprised by the discovery that the interaction does not depend on arrestin activation. However, the NMR data clearly show at atomic resolution that arrestin activation does not influence the interaction with clathrin in vitro. Evolutionary, the arrestin-clathrin interaction appears not to be conserved as the visual arrestin completely lacks a clathrin-binding motif. For that reason, we believe that the weak arrestin-clathrin interaction provides more of a supportive role during the internalization rather than the regulatory interaction with AP2, which requires and quantitatively depends on the arrestin2 activation. We have reflected on this in the Discussion:

      “Although the generalization of this mechanism from CCR5 to other arr-class B receptors has to be explored further, it is indirectly corroborated in the visual rhodopsin-arrestin1 system. The arr-class B receptor rhodopsin (Isaikina et al., 2023) also undergoes CME (Moaven et al., 2013) with arrestin1 harboring the conserved AP2 binding motif, but missing the clathrinbinding motif (Figure 1-figure supplement 1A)”.

      Overall, the data are solid, but for added rigor, can these experiments be repeated without tagged receptor and/or arr2? My concern stems from the fact that the stability of the interaction between arr2 and the receptor may be related to the position of the tags.

      We thank the referee for this suggestion, which refers to the cellular experiments; the biophysical experiments were carried out without tags. To eliminate the possibility of tags contributing to receptor-arrestin2 binding in the cellular experiments, we also performed the experiments in the presence of CCR5 antagonist [5P12]CCL5 (Figure 4). These data show that in the case of inactive CCR5, arrestin2 is not recruited to CCR5, nor does it form internalization complexes, which would be the case if the tags were increasing the receptorarrestin interaction. In contrast, if the tags were decreasing the interaction, we would not expect such a strong internalization. As indicated below, we have also attempted to perform our cellular experiments using an N-terminally SNAP-tagged CCR5. Unfortunately, this construct did not express in HeLa cells indicating that SNAP-CCR5 was either toxic or degraded.

      Reviewing Editor Comments: 

      Overall, the reviewers did not suggest much by way of additional experiments. They do suggest several aspects of the manuscript that would benefit from further clarification. 

      Reviewer #1 (Recommendations for the authors): 

      (1) The distinction between arrestin 2S and arrestin 2L as relates to the canonical and non-canonical clathrin binding sites would benefit from clarification, particularly because the second binding site depends on the splice variant. This is something that some readers may not be familiar with (particularly young ones that are hopefully part of the intended readership).

      We thank the referee for this suggestion. We would like to emphasize that in our work, only the long arrestin2 splice variant was used, which contains both binding sites. We have now introduced the splice variants and their relation to the clathrin binding sites in the text. 

      In section ‘Localizing and quantifying the arrestin2-clathrin interaction by NMR spectroscopy’:

      “Clathrin and arrestin interact in their basal state (Goodman et al., 1996), and a structure of a complex between arrestin2 and the clathrin heavy chain N-terminal domain (residues 1-363, named clathrin-N in the following) has been solved by X-ray crystallography (PDB:3GD1) in the absence of an arrestin2-activating phosphopeptide (Kang et al., 2009). This structure (Figure 1-figure supplement 1B) suggests a 2:1 binding model between arrestin2 and clathrinN. The first interaction (site I) is observed between the <sup>376</sup>LIELD<sup>380</sup> clathrin-binding motif of the arrestin2 CBL and the edge of the first two β-sheet blades of clathrin-N, whereas the second interaction (site II) occurs between arrestin2 residues <sup>334</sup>LLGDLA<sup>339</sup> and the 4th and 5th blade of clathrin-N. The latter arrestin interaction site is not present in the arrestin2 splice variant arrestin2S (for short) where an 8-amino acid insert (residues 334-341) between β-strands 18 and 19 is removed (Kang et al., 2009)”.

      Section ‘The arrestin2-clathrin interaction is independent of arrestin2 activation’

      “Figure 2A (left) shows the intensity changes (full spectra in Figure 2-figure supplement 1A) of the clathrin-N <sup>1</sup>H-<sup>15</sup>N TROSY resonances [assignments transferred from BMRB, ID:25403 (Zhuo et al., 2015)] upon addition of a one-molar equivalent of arrestin2<sup>1-393</sup>. A significant intensity reduction due to line broadening is detected for clathrin-N residues 39-40, 48-50, 62-72, 83-90, 101-106, and 108. These residues form a clearly defined binding region at the edges of blade 1 and blade 2 of clathrin-N (Figure 2A, right), which corresponds to interaction site I in the 3GD1 crystal structure, involving the conserved arrestin2 <sup>376</sup>LIELD<sup>380</sup> motif. However, no significant signal attenuation was observed for clathrin-N residues in blade 4 and blade 5, which would correspond to the crystal interaction site II with arrestin2 residues <sup>334</sup>LLGDLA<sup>339</sup> that are absent in the arrestin2S splice variant. Thus only one arrestin2 binding site in clathrin-N is detected in solution, and site II of the crystal structure may be a result of crystal packing”.

      (2) Acronym density is high throughout. While many are standard in the clathrin literature, this could hinder accessibility for readers with a GPCR or arrestin focus.

      We agree with the referee. The acronyms were hard to avoid. The most non-obvious acronym seems ‘CLTC-NTD’ for the N-terminal domain of the clathrin heavy chain, which uses the non-obvious, but common gene name CLTC for the clathrin heavy chain. We have now replaced ‘CLTC-NTD’ by ‘clathrin-N’ and hope that this makes the text easier to follow.

      (3) The NMR section, while impressive in scope, had writing that was more difficult to follow than the rest. I am curious what percentage of resonance could be assigned. 

      We apologize if the NMR sections of this manuscript were unclear. We attempted to provide a very detailed description of the experimental setup and the spectral results. Being experienced NMR spectroscopists, we have tried very hard to obtain good 3D triple resonance spectra for assignments, but their sensitivity is very low. We believe that this is due to the microsecond dynamics present in the system, which makes the heteronuclear transfers inefficient. So far, we have been able to assign ~30% of the visible arrestin2 resonances. We are still validating the assignments and are working on the analysis and an explanation for this arrestin2 behavior. Therefore, at this point, we want to refrain from stronger statements besides that considerable intrinsic microsecond dynamics is impeding the assignment process.

      (4) It may be worth noting in the main text that truncated arrestins have slightly higher basal activation. I was curious why the truncated arrestin was not chosen for the AP2 NMR titrations. Presumably, an effect would be more likely to be seen.

      While some truncated arrestin2 variants (comprising residues 1-382 or 1-360) indeed show higher basal activity than the full-length arrestin2, they typically completely lack the b20 strand (residues 386-390), which is crucial for the formation of a parallel b-sheet with strand b1, and whose release governs arrestin activation. Our truncated arrestin2 construct comprises residues 1-393 and contains strand b20. In our experience, no significant difference in basal activity, as assessed by Fab30 binding, was detected for arrestin2<sup>1-393</sup> and arrestin2<sup>1-418</sup> (Author response image 1).

      Author response image 1.

      SEC profiles showing arrestin2<sup>1–393</sup> (left) and arrestin2<sup>1-418</sup> (right) activation by the CCR5pp6 phosphopeptide as assayed by Fab30 binding. The active ternary arrestin2-phosphopeptide-Fab30 complex elutes at a lower volume than the inactive apo arrestin2 or the binary arrestin2-phosphopeptide complex. Both arrestin2 constructs are activated by the phosphopeptide to a similar level as assessed by the integrated SEC volumes.

      We want to emphasize that we used full-length arrestin2<sup>1-418</sup> in order to assess the AP2 interaction, as the crystal structure of arrestin2 peptide-AP2 (PDB:2IV8) shows residues past the residue 393 involved in binding.

      PDB codes are currently not accompanied by corresponding literature citations throughout. Please add these. 

      Thank you for this suggestion. In the manuscript, we were careful to provide the full literature citation the first time each PDB code is mentioned. To avoid redundancy and maintain clarity, we rather do not want to repeat the citations with every subsequent mentioning of the PDB code.

      (5) The AlphaFold model could benefit from a more transparent discussion of prediction confidence and caveats. The younger crowd (part of the presumed intended readership) tends to be more certain that computational output is 'true'. Figure 1A shows long loops that are likely regions of low confidence in the prediction. Displaying expected disordered regions as transparent or color-coded would help highlight these as flexible rather than stable, especially for that same younger readership. 

      We need to explain that the AlphaFold model of arrestin2 was only used to visualize the clathrin-binding loop and the 344-loop of the arrestin2 C-domain, which are not detected in the available apo bovine (PDB:1G4M) and apo human (PDB:8AS4) arrestin2 crystal structures. However, the AlphaFold model of arrestin2 is basically identical to the crystal structures in the regions that are visible in the crystal structures. We have clarified this now in the caption to Figure 1.

      “The model was used to visualize the clathrin-binding loop and the 344-loop of the arrestin2 C-domain, which are not detected in the available crystal structures of apo arrestin2 [bovine: PDB 1G4M (Han et al., 2001), human: PDB 8AS4 (Isaikina et al., 2023)]. In the other structured regions, the model is virtually identical to the crystal structures”.

      (6) Several figure panels were difficult to interpret due to their small size. Especially microscopy insets, where I needed to simply trust that the authors were accurately describing the data. Enlarging panels is essential, and this may require separating them into different figures.

      We appreciate the referee’s concern regarding figure readability. However, we want to indicate that all our figures are provided as either high-resolution pixel or scalable vector graphics, which allow for zooming in to very fine detail, either electronically or in print. This ensures that microscopy insets and other small panels can be examined clearly when viewed appropriately. We believe the current layout of the figures is necessary to be able to efficiently compare the data between different conditions.

      Many figure panels had text size that was too small. Font inconsistencies across figures also stand out. 

      We apologize for this. We have now enlarged the font size in the figures and made the styles more consistent.

      For Fig. 1F, consider adding individual data points and error bars.

      Thank you for this suggestion. However, Figure 1F already contains the individual data points, with colored circles corresponding to the titration condition. As we did not have replicates of the titration, no error bars are shown. However, the close agreement of the theoretical fit with the individual measured data points stemming from different experiments shows that the statistical errors are indeed very small. We have estimated an overall error for the Kd (as indicated in panel F, right) by error propagation based on an estimate of the chemical shift error as obtained in the NMR software POKY (based on spectral noise). 

      Reviewer #2 (Recommendations for the authors):

      (1) I don't observe two overlapping spectra of Arrestin2 (1393) +/- CLTC NTD in Supplementary Figure 1.

      As explained above all the spectra are shown as scalable vector graphics. The overlapping spectra are visible when zoomed in.

      (2) I'd be tempted to move the discussion of class A and class B GPCRs and their presumed differences to the intro and then motivate the paper with specific questions.

      We appreciate the referee’s suggestion and had a similar idea previously. However, as we do not have data on other class-A or class-B receptors, we rather don’t want to motivate the entire manuscript by this question.

      Reviewer #3 (Recommendations for the authors): 

      (1) What happens with full-length arr2 (1-418) when the phospho-peptide pp6 is added to the reaction? It's unclear to me that 1-418 would behave the same as 1-393 because the arr2 tail of 1-393 is likely sufficiently mobile to accommodate binding to CLTC NTD. I suggest attempting this experiment for added rigor.

      We believe that there is a misunderstanding. The 1-393 and 1-418 constructs differ by the disordered C-terminal tail, which is not involved in the clathrin interaction with the arrestin2 376-380 (LIELD) residues. Accordingly, both 1-393 and 1-418 constructs show almost identical interactions with clathrin (Figure 2A and 2C). Moreover, the phospho-activated arrestin2<sup>1-393</sup> (Figure 2B) interacts identically with clathrin as inactive arrestin2<sup>1-393</sup> and inactive arrestin2<sup>1-418</sup>. We believe that this comparison is sufficient for the conclusion that arrestin activation does not play a role in arrestin-clathrin binding.

      (2) If the tags were moved to the N-terminus of the receptor and/or arr2, I wonder if the complex is as stable (Figure 4)? 

      We thank the referee for their suggestion. We have indeed attempted to perform our experiments using an N-terminally SNAP-tagged CCR5. Unfortunately, this construct did not express in the HeLa cells indicating that SNAP-CCR5 was either toxic or degraded. Unfortunately, as the lab is closing due to the retirement of the PI, we are not able to repeat these experiments with further differently positioned tags. We refer also to our answer above that the experiments with the antagonist [5P12]CCL5 present a certain control.

      (3) A biochemical assay to measure receptor internalization, in addition to the cell biological approach (Figure 5), would add additional rigor to the study and conclusions.

      We tried to measure internalization using a biochemical approach. We tried to pull-down CCR5 from HeLa cells and assess arrestin binding. Unfortunately, even using different buffer conditions, we found that CCR5 was aggregating once solubilized from membranes, preventing us from doing this analysis. We had a similar problem when we exogenously expressed CCR5 in insect cells for purification purposes. We have long experience with CCR5, and this receptor is very aggregation-prone due to extended charged surfaces, which interact with the chemokines.

      As an alternative, and in support of the cellular immunofluorescence assays, we also attempted to obtain internalization data via FACS using a CCR5 surface antibody (CD195 Monoclonal Antibody eBioT21/8). CD195 recognizes the N-terminus of the receptor. Unfortunately, the presence of the chemokine ligand (~ 8 kDa) interferes with antibody binding, precluding the quantitative biochemical assessment of the arrestin2 mutants on the CCR5 internalization.

      For these reasons, we were particularly careful to quantify CCR5 internalization from the immunofluorescence microscopy data using colocalization coefficients as well as puncta counting (Figure 4+5).

    1. eLife Assessment

      In this manuscript Taujale et al describe an interdisciplinary approach to mine the human channelome and further discover orthologues across diverse organisms. Further, this work provides evidence that supports a role for conserved residues in CALHM channel gating. Overall this important work presents findings that can be helpful to the ion channel community, as well as to those interested in improved methods for mining sequence space for their protein of interest. However, further validation of the improvements their approach shows over previous approaches is needed, making this a solid contribution to the literature in this field.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Identification and classification of ion-channels across the tree of life: Insights into understudied CALHM channels" Taujale et al describe an interdisciplinary approach to mine the human channelome and further discover orthologues across diverse organisms, culminating in delineating co-conserved patterns in an example ion channel: CALHM. Overall, this paper comes in two sections, one where 419 human ion channels and 48,000+ channels from diverse organisms are found through a multidisciplinary data mining approach, and a second where this data is used to find co-conserved sequences, whose functional significance is validated via experiments on CALHM1 and CALHM6. Overall, this is an intriguing data-first approach to better understand even understudied ion channels like CALHM6. However, more needs to be done to pull this story together into a single coherent narrative.

      Strengths:

      This manuscript takes advantage of modern-day LLM tools to better mine the literature for ion channel sequences in humans and other species with orthologous ion channel sequences. They explore the 'dark channome' of understudied ion channels to better reveal the information evolution has to tell us about our own proteins, and illustrate the information this provides access to in experimental studies in the final section of the paper. Finally, they provide a wealth of information in the supplementary tables (in the form of Excel spreadsheets and a dataset on Zenodo) for others to explore. Overall, this is a creative approach to a wide-reaching problem that can be applied to other families of proteins.

      Weaknesses:

      Overall, while a considerable amount of work has been done for this manuscript, the presentation, both in terms of writing and figures, still can use more work even after a first round of revisions. While they have improved their discussion to more clearly describe the need for a better-curated sequence database of ion channels, and how existing resources fall short, some aspects of this process and the motivation remain unclear, especially when it comes to the CALHM sequences.

      Overall, this manuscript is a valuable contribution to the field, but requires a few main things to make it truly useful. Namely, how has this approach really improved their ability to identify conserved residues in CALHM over a less-involved approach? And better organization of the first results section of the paper, which is critical to the downstream understanding of the paper, as well as some cosmetic improvements.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors defined the "channelome," consisting of 419 predicted human ion channels as well as 48,000 ion channel orthologs from other organisms. Using this information, the ion channels were clustered into groups, which can potentially be used to make predictions about understudied ion channels in the groups. The authors then focused on the CALHM ion channel family, mutating conserved residues and assessing channel function.

      Strengths:

      The curation of the channelome provides an excellent resource for researchers studying ion channels. Supplemental Table 1 is well organized with an abundance of useful information.

      Comments on revisions:

      The authors have thoroughly addressed my concerns and the manuscript is substantially improved. I have just a few suggestions regarding wording/clarification.

      In Supplemental Figure 4, the Western blots (n=3) were quantitated, but the surface biotinylation was not. While I suppose that it is fine to just show one representative experiment for the biotinylation assay, the authors should indicate in the legend how many times this was done. It is essential to know whether these data in Supplemental Figure 4E, F are reproducible as they are absolutely critical for interpretation of all of the data in Figure 5.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comments:

      (A) Revisions related to the first part, regarding data mining and curation:

      (1) One question that arises with the part of the manuscript that discusses the identification and classification of ion channels is whether these will be made available to the wider public. For the 419 human sequences, making a small database to share this result so that these sequences can be easily searched and downloaded would be desirable. There are a variety of acceptable formats for this: GitHub/figshare/zenodo/university website that allows a wider community to access their hard work. Providing such a resource would greatly expand the impact of this paper. The same question can be asked of the 48,000+ ion channels from diverse organisms.

      We thank the reviewer for providing this important feedback. While the long term plan is to provide access to these sequences and annotations through a knowledge base resource like Pharos, we agree with the comments that it would be beneficial to have these sequences made available with the manuscript as well. We have compiled 3 fasta files containing the following: 1) Full length sequences for the curated 419 ion channel sequences. 2) Pore containing domain sequences for the 343 pore domain containing human ion channel sequences. 3) All the identified orthologs for the human ion channels.

      For each sequence in these files, we have extended the ID line to include the most pertinent annotation information to make it readily available. For example, the id>sp|P48995|TRPC1_HUMAN|TRP:VGIC--TRP-TRPC|pore-forming|dom:387-637 provides the classification, unit and domain bounds for the human TRPC1 in the fasta file itself.

      These files have been uploaded to Zenodo and are available for download with doi 10.5281/zenodo.16232527. We have included this in the Data Availability statement of the manuscript as well.

      (2) Regarding the 48,000+ sequences, what checks have been done to confirm that they all represent bona fide, full-length ion channel sequences? Uniprot contains a good deal of unreviewed sequences, especially from single-celled organisms. The process by which true orthologues were identified and extraneous hits discarded should be discussed in more detail, and all inclusion criteria should be described and justified, clearly illustrating that the risk of gene duplicates and fragments in this final set of ion channel orthologues has been avoided. Related to this, does this analysis include or exclude isoforms?

      We thank the reviewer for raising this important point. Our selection of curated proteomes and the KinOrtho pipeline for orthology detection returns, up to an extent, reliable orthologous sequence sets. In brief, our database sequences are retrieved from full proteomes that only include proteins that are part of an official proteome release. Thus, they are mapped from a reference genome to ensure species-specific relevance and avoid redundancy. The >1500 proteomes in this analysis were selected based on their wider use in other orthology detection pipelines like OMA and InParanoid. Our orthology detection pipeline, KinOrtho, performs a fulllength and a domain-based orthology detection which ensures that the orthologous relationships are being defined based on the pore-domain sequence similarity. 

      But we agree with the reviewer that this might leave room for extraneous, fragments or misannotated sequences to be included in our results. Taking this into careful consideration, we have expanded our sequence validation pipeline to include additional checks such as checking the uniport entry type, protein existence evidence and sequence level checks such as evaluating the compositional bias, non-standard codons and sequence lengths. These validation steps are now described in detail in the Methods section under orthology analysis (lines 768-808). All the originally listed orthologous sequences passed this validation pipeline and thus provide additional confidence that they are bona fide full length ion channel sequences.

      We have also expanded this section (lines 758 – 766) to provide more details of the KinOrtho pipeline for orthology detection, which is a previously published method used for orthology detection in kinases by our lab.

      Finally, our orthology analysis excludes isoforms and only spans the primary canonical sequences that are part of the UniProt Proteomes annotated sequence set. The isoforms that are generally available in UniProt Proteomes in a separate file named *_additional.fasta were not included in this analysis.

      (3) The decision to show the families of ion channels in Figure 1 as pie charts within a UMAP embedding is intriguing but somewhat non-intuitive and difficult to understand. Illustrating these results with a standard tree-like visualization of the relationship of these channels to each other would be preferred.

      We appreciate the feedback provided by the reviewer, and understand that a standard tree-like visualization would be much easier to interpret and familiar than a bubble chart based on UMAP embeddings. However, we opted to use the bubble chart for the following reasons:

      Low sequence similarity: the 419 human ICs share very minimal sequence similarity, falling in the twilight zone or lower ( Dolittle, 1992; PMID:1339026). Thus, traditional multiple sequence alignment and phylogenetic reconstruction methods perform very poorly and generate unreliable or even misleading results. To explore the practicality of this option, we pursued performing a multiple sequence alignment of just 3 of the possibly related IC families as suggested by reviewer 2 (CALHM, Pannexins, and Connexins) using the state of the art structure based sequence alignment method Foldmason (doi: https://doi.org/10.1101/2024.08.01.606130). Even then, the sequence alignment and the resulting tree for just these 3 families were poor and unreliable, as illustrated in the attached Author response Image 2.

      Protein embeddings based clustering: Novel LLM based approaches such as the protein language model embeddings offer ways to overcome these limitations by capturing sequence, structure, function and evolutionary properties in a high-dimensional space. Thus, we employed this model using DEDAL followed by UMAP for dimensionality reduction, which preserves biologically meaningful local and global relationships.

      Abstraction at family level: In Figure 1, we aggregate individual channels into family bubbles with their positions representing the average UMAP coordinates of their members. This offers a balance between an intuitive view of how IC families are distributed in the embedding space and reflects potential functional and evolutionary proximities, while not being impeded by individual IC relationships across families.

      We have revised the figure legend (lines 1221 – 1234) with additional description of the visualization and the process used to generate it, and the manuscript text (lines 248-270) provides the rationale behind the selection of this method.

      (4) A strength of this paper is the visualization of 'dark' ion channels. However, throughout the paper, this could be emphasized more as the key advantage of this approach and how this or similar approaches could be used for other families of proteins. Specifically, in the initial statement describing 'light' vs 'dark channels', the importance of this distinction and the historical preference in science to study that which has already been studied can be discussed more, even including references to other studies that take this kind of approach. An example of a relevant reference here is to the Structural Genomics Consortium and its goals to achieve structures of proteins for which functions may not be well-characterized. Clarifying these motivations throughout the entire paper would strengthen it considerably.

      We thank the reviewer for this constructive comment and agree that highlighting the strength of visualizing “dark” channels and prioritizing them for future studies would strengthen the paper. As suggested, we have revised the text throughout the paper (lines 84-89, 176-180) to contextualize and emphasize this distinction. We have also added a reference for the Structural Genomics Consortium, which, along with resources like IDG, has provided significant resources for prioritizing understudied proteins.

      (5) Since the authors have generated the UMAP visualization of the channome, it would be interesting to understand how the human vs orthologue gene sets compare in this space.

      We appreciate the reviewer’s input. It is an interesting idea to explore the UMAP embedding space for the human ICs along with their orthologs. The large number of orthologous sequences (>37,000) would certainly impose a computational challenge to generate embeddings-based pairwise alignments across all of them. Downstream dimensionality reduction from such a large set and the subsequent visualization would also suffer from accuracy and interpretability concerns. However, to follow up on the reviewer’s comments, we selected orthologous sequences from a subset of 12 model organisms spanning all taxa (such as mouse, zebrafish, fruit fly, C. elegans, A. thaliana, S. cerevisiae, E. coli, etc.).This increased the number of sequences for analysis to 1094 from 343, which is still manageable for UMAP. Using the exact same method, we generated the UMAP embeddings plot for this set as shown below. 

      Author response image 1.

      UMAP embeddings of the human ICs alongside orthologs from 12 model organisms

      As shown above, we observed that each orthologous set forms tight, well-defined clusters, preserving local relationships among closely related sequences. For example, a large number of VGICs cluster more closely together compared to Supplementary Figure 1 (with only the human ICs). However, families that were previously distant from others now appear to be even more scattered or pushed further away, indicating a loss of global structure. This pattern suggests that while local distances are well preserved, the global topology of the embedding space could be compromised. Moreover, we find that the placement of ICs with respect to other families is highly sensitive to the parameter choices (e.g., n_neighbors and min_dist), an issue which we did not encounter when using only the human IC sequences. The inclusion of a large number of orthologous sequences that are highly similar to a single human IC but dissimilar to others skews the embedding space, emphasizing local structure at the expense of global relationships.

      Since UMAP and similar dimensionality reduction methods prioritize local over global structure, the resulting embeddings accurately reflect strong ortholog clustering but obscure broader interfamily relationships. Consequently, interpreting the spatial arrangement of human IC families with respect to one another becomes unreliable. We have made this plot available as part of this response, and anyone interested can access this in the response document.   

      (6) Figure 1 should say more clearly that this is an analysis of the human gene set and include more of the information in the text: 419 human ion channel sequences, 75 sequences previously unidentified, 4 major groups and 55 families, 62 outliers, etc. Clearer visualizations of these categories and numbers within the UMAP (and newly included tree) visualization would help guide the reader to better understand these results. Specifically, which are the 75 previously unidentified sequences?

      We thank the reviewer for the comments. To address this, we have revised Figure 1 and added more information, including a clear header that states that these are only human IC sets, numbers showing the total number of ICs, and the number of ICs in each group. We have further included new Supplementary Figure 2 and Supplementary Table 2, which show the overlap of IC sequences across the different resources. Supplementary Figure 2 is an upset plot that provides a snapshot of the overlap between curated human ICs in this study compared to KEGG, GtoP, and Pharos. Supplementary Table 2 provides more details on this overlap by listing, for each human IC, whether they are curated as an IC in the 3 IC annotation resources. We believe these additions should provide all the information, including the unidentified sequences we are adding to this resource.

      (7) Overall, the manuscript needs to provide a clearer description of the need for a better-curated sequence database of ion channels, as well as how existing resources fall short.

      We thank the reviewer for pointing out this important gap in the description. As suggested, we have revised the text thoroughly in the Introduction section to address this comment. Specifically, we have added sections to describe existing resources at sequence and structure levels that currently provide details and/or classification of human ion channels. Then, we highlight the facts that these resources are missing some characterized pore-containing ICs, do not include any information on auxiliary channels, and lack a holistic evolutionary perspective, which raises the need for a better-curated database of ion channels. Please refer to lines 57-63, 73-79, and 95 – 119 for these changes and additions.

      (8) Some of the analysis pipeline is unclear. Specifically, the RAG analysis seems critical, but it is unclear how this works - is it on top of the GPT framework and recursively inquires about the answer to prompts? Some example prompts would be useful to understand this.

      We thank the reviewer for highlighting this gap in explanation. We understand that the details provided in the Methods and Supplementary Figure 1 may not have sufficiently explained the pipeline, and are missing some important details. The RAG pipeline leverages vector-based retrieval integrated with OpenAI’s GPT-4o model to systematically search literature and generate evidence-based answers. The process is as follows:

      Literature sources (PubMed articles) relevant to the annotated ion channels were converted into vector representations stored in a Qdrant database.

      Queries constructed from the annotated IC dataset were submitted to the vector database, retrieving contextually relevant literature segments.

      Retrieved contexts served as inputs to the GPT-4o model, which produced structured JSON-formatted responses containing direct evidence regarding ion selectivity and gating mechanisms, along with associated confidence scores.

      To clarify this further, we have rewritten the relevant subsection in lines 649 - 718. Now, this section provides a detailed description of the RAG pipeline. Also, we have improved Supplementary Figure 1 to provide a clearer description of the pipeline. We have also provided an example prompt template to illustrate the query. These additions clarify how the pipeline functions and demonstrate its practical utility for IC annotation.

      (9) The existence of 76 auxiliary non-pore containing 'ion channel' genes in this analysis is a little confusing, as it seems a part of the pipeline is looking for pore-lining residues. Furthermore, how many of these are picked up in the larger orthologues search? Are these harder to perform checks on to ensure that they are indeed ion channel genes? A further discussion of the choice to include these auxiliary sequences would be relevant. This could just be further discussion of the literature that has decided to do this in the past.

      We thank the reviewer for this comment, and agree that further clarification of our selection and definition of auxiliary IC sequences would be helpful. As the reviewer has pointed out, one of the annotation pipeline steps is indeed looking for the pore-lining residues. Any sequences that do not have a pore-containing domain are then considered to be auxiliary, and we search for additional evidence of their binding with one of the annotated pore-containing ICs. If such evidence is not found in the literature, we remove them from our curated IC list. 

      In response to the above comment, we have revised the manuscript text to provide these details. In the Introduction section, we have added references to previous literature that have described auxiliary ICs and also pointed out that the existing ion channel resources do not account for such auxiliary channels (lines 73-79, 107-108,148-149). We have also expanded the Methods section to describe the selection and definition of auxiliary channels (lines 640-646).

      With regards to the orthology analysis, since auxiliary channels do not have a pore domain, and our orthology pipeline requires a pore domain similarity search and hit, we did not include them in this part of the analysis. We have clarified the text in the Results section to ensure this is communicated properly throughout the manuscript (lines 212-215, 260-263). 

      (10) Why are only evolutionary relationships between rat, mouse, and human shown in Figure 3A? These species are all close on the evolutionary timeline.

      We thank the reviewer for this comment. Figure 3A currently provides a high-level evolutionary relationship across the 6 human CALHM members as a pretext for the pattern based Bayesian analysis. However, since this analysis is based on a wider set of orthologs that span taxa, we agree that a larger tree that includes more orthologs is warranted.

      We have now revised Figure 3A to include an expanded tree that includes 83 orthologs from all 6 human CALHM members spanning 14 organisms from different taxa, ranging from mammals, fishes, birds, nematodes, and cnidarians. The overall structure of the tree is still consistent with 2 major clades as before, with CALHM 1 and 3 in the first clade and CALHM 2,4,5, and 6 in the second clade, with good branch support.

      (B) Revisions related to the second part, regarding the analysis of CAHLM channel mutations:

      (1) It would strengthen the manuscript if it included additional discussion and references to show that previous methods to analyze conserved residues in CALHM were significantly lacking. What results would previous methods give, and why was this not enough? Were there just not enough identified CALHM orthologues to give strong signals in conservation analysis? Also, the amino acid conservation between CLHM-1 and CALHM1 is extremely low. Thus, there are other CALHM orthologs that give strong signals in conservation analysis. There are ~6 papers that perform in-depth analysis of the role of conserved residues in the gating of CALHM channels (human and C. elegans) that were not cited (Ma et al, Am J Physiol Cell Physiol, 2025; Syrjanen et al, Nat Commun, 2023; Danielli et al, EMBO J, 2023; Kwon et al, Mol Cells, 2021; Tanis et al, Am J Physiol Cell Physiol, 2017; Tanis et al, J Neurosci, 2013; Ma et al, PNAS, 2013) - these data needs to be discussed in the context of the present work.

      We thank the reviewer for the comment and agree that these are excellent studies that have advanced understanding of conserved residues in CALHM gating. While their analyses compared a limited set of sequences, focusing on residues conserved in specific CALHM homologs or species like C. elegans, our analysis encompasses thousands of sequences across the entire CALHM family, allowing us to identify residues conserved across all family members over evolution. We also coupled this sequence analysis with hypotheses derived from our published structural studies (Choi et al., Nature, 2019), which highlighted the NTH/S1 region as a critical element in channel gating. Based on this, we focused on evolutionarily conserved residues in the S1–S2 linker and at the interface of S1 with the rest of the TMD, reasoning that if S1 movement is essential for gating, these two structural elements (acting as a hinge and stabilizing interface, respectively) would be key determinants of the conformational dynamics of S1. These regions have been largely overlooked in previous studies. As a result, the residues highlighted in our study do not overlap with those previously reported but instead provide complementary insights into gating mechanisms in this unique channel family. Together, our study and the published literature suggest that many regions and residues in CALHM proteins are critical for gating: while some are conserved across the entire family evolutionarily, others appear conserved only within certain species or subfamilies.

      To address the reviewer’s comment, and to highlight the points mentioned above, we have added a brief discussion of these studies and the relevant citations in the revised manuscript (lines 378– 385, 563–576).

      (2) Whereas the current-voltage relations for WT channels are clearly displayed, the data that is shown for the mutants does not allow for determining if their gating properties are indeed different than WT.

      First, the current amplitudes for the mutants were quantified at just one voltage, which makes it impossible to determine if their voltage-dependence was different than WT, which would be a strong indicator for an effect in gating. Current-voltage relations as done for the WT channels should be included for at least some key mutations, which should include additional relevant controls like the use of Gd3+ as an inhibitor to rule out the contribution of some endogenous currents.

      We thank the reviewer for this comment. To address this, we performed additional experiments using a multi-step pulse protocol to obtain current-voltage relations for WT CALHM1, CALHM1(I109W), WT CALHM6, and CALHM6(W113A). Our initial two-step protocol (−80 mV and +120 mV) covers both the physiological voltage range and the extended range commonly used in biophysical characterization of ion channels. Most mutants did not exhibit channel activation even within this broad range. We therefore focused on the three mutants that did show substantial activation to perform full I–V analysis as suggested. In all groups, currents activated at 37 °C were significantly inhibited by Gd<sup>3+</sup>, consistent with published reports (Ma et al., AJP 2025; Danielli et al., EMBO J 2023; Syrjänen et al., Nat Commun 2023). Notably, for CALHM6(Y51A), while this mutation did not significantly alter current amplitudes at positive membrane potentials, it markedly reduced currents at negative potentials, rendering the channel outwardly rectifying and altering its voltage dependence. These new data are incorporated into Figure 5 (panels A–O) and discussed in the manuscript. Figure 5 now also shows current amplitudes at both +120 mV and −80 mV in 0 mM Ca<sup>2+</sup> at 37 °C to facilitate direct comparison between WT and mutants. The previous data at 5 mM Ca<sup>2+</sup> and 0 mM Ca<sup>2+</sup> at 22 °C have been moved to Supplementary Figure 5 as requested.

      Second, it is unclear whether the three experimental conditions (5 mM Ca<sup>2+</sup>, and 0 Ca<sup>2+</sup>, at 22 and 37C) were measured in the same cell in each experiment, or if they represent different experiments. This should be clarified. If measurements at each condition were done in the same experiment, direct comparison between the three conditions within each individual experiment could further help identify mutations with altered gating.

      We thank the reviewer for pointing this out and apologize for the confusion. All three conditions (5 mM Ca<sup>2+</sup> at 22 °C, 0 mM Ca<sup>2+</sup> at 22 °C, and 0 mM Ca<sup>2+</sup> at 37 °C) were sequentially measured in the same cell within each experiment. The currents were then averaged across cells and plotted for each group.

      Third, in line 334, the authors state that "expression levels of wild-type proteins and mutants are comparable." However, Western blots showing CALHM protein abundance (Supplementary Fig. 3) are not of acceptable quality; in the top blot, WT CALHM1 appears too dim, representative blots were not shown for all mutants, and individual data points should be included on the group data quantitation of the blots, together with a statistical test comparing mutants with the WT control.

      We thank the reviewer for the comment and agree that representative blots were not shown for all mutants. Supplementary Figure 4 (previously Supplementary Figure 3) has been updated to include representative blots for all mutants, individual data points in the quantification, and statistical tests comparing each mutant to the WT control.

      A more serious concern is that the total protein quantitation is not very informative about the functional impact of mutations in ion channels, because mutations can severely impact channel localization in the plasma membrane without reducing the total protein that is translated. In mammalian cells, CALHM6 is localized to intracellular compartments and only translocates to the plasma membrane in response to an activating stimulus (Danielli et al, EMBO J, 2023). Thus, if CALHM6 is only intracellular, the protein amount would not change, but the measured current would. Abundant intracellular CALHM1 has also been observed in mammalian cells transfected with this protein (Dreses-Werringloer et al., Cell, 2008). Quantitation of surface-biotinylated channels would provide information on whether there are differences between the constructs in relation to surface expression rather than gating. An alternative approach to biotinylation would be to express GFP-tagged constructs in Xenopus oocytes and look for surface expression. This is what has been done in previous CALHM channel studies.

      Without evidence for the absence of defects in localization or clear alterations in gating properties, it is not possible to conclude whether mutant channels have altered activity. Does the analysis of sequences provide any testable hypotheses about substitutions with different side chains at the same position in the sequence?

      We thank the reviewer for this very important comment. We agree that total protein levels alone do not distinguish between intracellular retention and proper trafficking to the plasma membrane. To address this, we performed surface biotinylation assays for all WT and mutant CALHM1 and CALHM6 constructs to assess their plasma membrane localization. The results show that mutants have either comparable or substantially higher surface expression levels than WT, consistent with the Western blot data. Together, these findings support our original interpretation that the observed differences in electrophysiological currents are not due to trafficking defects but reflect functional effects. These new data are presented in Supplementary Figure 5.

      (3) Line 303 - 13 aligned amino acids were conserved across all CALHM homologs - are these also aligned in related connexin and pannexin families? It is likely that cysteines and proline in TM2 are since CALHM channels overall share a lot of similarities with connexins and pannexins (Siebert et al, JBC, 2013). As in line 207, it would be expected that pannexins, connexins, and CALHM channel families would group together. Related to this, see Line 406 - in connexins, there is also a proline kink in TM2 that may play a role in mediating conformational changes between channel states (Ri et al, Biophysical Journal, 1999). This should be discussed.

      We thank the reviewer for the suggestion. We attempted a structure based sequence alignment of representative structures from all 3 families (CALHM, connexins and pannexins), but the resulting alignments are very poor and have a lot of gapped regions, making it very difficult to comment on the similarities mentioned in this comment. This is actually expected, as although CALHM, connexins, and pannexins are all considered “large-pore” channels, the TMD arrangement and conformation of CALHM are distinct from those of connexins and pannexins. Below, we have included a snapshot of the alignment at the conserved cysteine regions of the CALHM homologs, along with the resulting tree, which has very low support values and has difficulty placing the connexins properly, making it difficult to interpret.

      Author response image 2.

      Structure based sequence alignment and phylogenetic analysis of available crystal structures of members from the CALHM, Pannexin and Connexin families. Top: The resulting sequence alignment is very sparse and does not show conservation of residues in the TM regions. The CPC motif with conserved cysteines in CALHM family is shown. Bottom: Phylogenetic tree based on the alignment has low support values making it difficult to interpret.

      (4) Line 36 - This work does not have experimental evidence to show that the selected evolutionarily conserved residues alter gating functions.

      Our electrophysiology data demonstrate that the selected evolutionarily conserved residues have a major impact on CALHM1 and CALHM6 gating. As shown in Figure 5, mutations at these residues produce two distinct phenotypes: (1) nonconductive channels, and (2) altered voltage dependence, resulting in outward rectification. Importantly, these functional changes occur despite normal total expression and surface trafficking, as confirmed by Western blotting and surface biotinylation (Supplementary Figure 4). These findings indicate that the affected residues are critical for the conformational dynamics underlying channel gating rather than for protein expression or localization.

      (5) Line 296-297 - This could also be put in the context of what we already know about CALHM gating. While all cryo EM structures of CALHM channels are in the open state, we still do understand some things about gating mechanism (Tanis et al Am J Physiol Cell Physiol, Cell Physiol 2017; Ma et al Am J Physiol Cell Physiol, Cell Physiol 2025) with the NT modulating voltage dependence and stabilizing closed channel states and the voltage dependent gate being formed by proximal regions of TM1.

      Thank you for providing this suggestion. As suggested, we have revised the text to place our findings in the context of current knowledge about CALHM gating and have added the relevant citations (lines 370-373).

      (6) Lines 314-315 - Just because residues are conserved does not mean that they play a role in channel gating. These residues could also be important for structure, ion selectivity, etc.

      We agree that evolutionary conservation alone does not imply a role in gating. However, our hypothesis derives from the positioning of these conserved residues, and previous studies that have indicated the importance of the NTH/S1 region for channel gating function. More importantly, our electrophysiology data indicate that these conserved residues specifically impact channel gating in CALHM1 and CALHM6. We have revised the text in lines 404-406 to clarify this further.

      (7) Line 333 - while CALHM6 is less studied than CALHM1, there is knowledge of its function and gating properties. Should CALHM6 be considered a "dark" channel? The IDG development level in Pharos is Tbio. There have been multiple papers published on this channel (ex: Ebihara et al, J Exp Med, 2010; Kasamatsu et al, J Immunol 2014; Danielli et al, EMBO J, 2023).

      We thank the reviewer for noting this important discrepancy. We have updated the text and labels related to CALHM6 to reflect its status as Tbio in the manuscript.

      (8) Please cite Jeon et al., (Biochem Biophys Res Commun, 2021), who have already shown temperature-dependence of CALHM1.

      Thank you for the comment. We have added the citation.  

      (9) It would be helpful to have a schematic showing amino acid residues, TM domains, highlighted residues mutated, etc.

      Thank you for the suggestion. We have revised the figure and added labels for the TM domains, and highlighted the mutated residues.

      Reviewer #1 (Recommendations for the authors):

      (1) Why in the title is 'ion-channels' hyphenated but in the text it is not?

      This has been changed.

      (2) Line 78: 'Cryo-EM' is not defined before the acronym is used.

      This has been fixed.

      (3) Typo in line 519: KinOrthto.

      This has been fixed.

      (4) Capitalizing 'Tree of Life' is a bit strange in section 2 of the results and the Discussion.

      We have removed the capitalization as suggested.

      (5) In Figure 3 and Supplementary Figure 4A, the gene names in the tree are CAHM and not CALHM - I assume this is an error.

      This has been made consistent to CALHM.

      (6) Font sizes throughout all figures, with the exception of Figure 1, need to be more legible. The X-axis labels in Figure 2A are hard to read, for example (though I can see that there is also the CAHM/CALHM typo here...). A good rule of thumb is that they should be the same size as the manuscript text. Furthermore, the grey backgrounds of Figure 4 and Figure 5 are off-putting; just having a white background here should be sufficient.

      This has been addressed. We have increased the font size in all figures with these revisions. The styling for Figure 4 and 5 has also been made consistent with other figures.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 36 - This work does not have experimental evidence to show that the selected evolutionarily conserved residues alter gating functions.

      Addressed in comment #4 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (2) Line 168 - should also be Supplemental Table 1.

      This has been addressed.

      (3) Line 170 - 419 human ion channel sequences were identified and this was an increase of 75 sequences over previous number. Which 75 proteins are these?

      This is now shown in Supplementary Figure 2 and Supplementary Table 2. Supplementary Figure 2 shows an Upset plot with the number of sequences that overlap across databases and the novel sequences that we have added as part of this study. The 75 specifically refers to the sequences that were not included in Pharos, which was chosen to refer to this number since it has the highest number of ICs listed out of all the other resources. Further, Supplementary Table 2 now provides a list of individual ICs and whether they were present in each of the 3 databases compared.

      (4) Line 289 - Ca2+ (not Ca); other similar mistakes throughout the manuscript

      These have been fixed.

      (5) Line 291-292 - Please include more about functions for CALHM channels; ex. CALHM1 regulates cortical neuron excitability (Ma et al, PNAS 2012), CLHM-1 regulates locomotion and induces neurodegeneration in C. elegans (Tanis et al. Journal of Neuroscience 2013); see above for references on CALHM6 function.

      We have added the functions as suggested.

      (6) Line 296-297 - This could also be put in the context of what we already know about CALHM gating. While all cryo EM structures of CALHM channels are in the open state, we still do understand some things about gating mechanism (Tanis et al Am J Physiol Cell Physiol, Cell Physiol 2017; Ma et al Am J Physiol Cell Physiol, Cell Physiol 2025) with the NT modulating voltage dependence and stabilizing closed channel states and the voltage dependent gate being formed by proximal regions of TM1.

      Addressed in comment #5 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (7) Lines 314-315 - Just because residues are conserved does not mean that they play a role in channel gating. These residues could also be important for structure, ion selectivity, etc.

      Addressed in comment #6 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (8) Line 333 - While CALHM6 is less studied than CALHM1, there is knowledge of its function and gating properties. Should CALHM6 be considered a "dark" channel? The IDG development level in Pharos is Tbio. There have been multiple papers published on this channel (ex: Ebihara et al, J Exp Med, 2010; Kasamatsu et al, J Immunol 2014; Danielli et al, EMBO J, 2023).

      Addressed in comment #7 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (9) Line 627 - Do you mean that 5 mM CaCl2 was replaced with 5 mM EGTA in 0 Ca2+ solution?

      This is correct.  

      (10) Why are only evolutionary relationships between rat, mouse, and human shown in Figure 3A? These species are all close on the evolutionary timeline.

      Addressed in comment #10 for Part A Revisions related to the first part, regarding data mining and curation above.

      (11) Figure 5 - no need to show the currents at room temperature in the main text since there are robust currents at 37 degrees; this could go into the supplement. Also, please cite Jeon et al. (Biochem Biophys Res Commun, 2021), who have already shown temperature-dependence of CALHM1.

      Addressed in comment #8 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (12) It would be helpful to have a schematic showing amino acid residues, TM domains, highlighted residues mutated etc.

      Addressed in comment #9 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (13) Use of S1-S4 to refer to the transmembrane "segments" is not standard; rather, TM1-TM4 would generally be used to refer to transmembrane domains.

      We have used the S1–S4 helix notation to maintain consistency with the nomenclature employed in our previous study (Choi et al., Nature, 2019).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The major limitation of the manuscript lies in the framing and interpretation of the results, and therefore the evaluation of novelty. Authors claim for an important and unique role of beliefs-of-other-pain in altruistic behavior and empathy for pain. The problem is that these experiments mainly show that behaviors sometimes associated with empathy-for-pain can be cognitively modulated by changing prior beliefs. To support the notion that effects are indeed relating to pain processing generally or empathy for pain specifically, a similar manipulation, done for instance on beliefs about the happiness of others, before recording behavioural estimation of other people's happiness, should have been performed. If such a belief-about-something-else-than-pain would have led to similar results, in terms of behavioural outcome and in terms of TPJ and MFG recapitulating the pattern of behavioral responses, we would know that the results reflect changes of beliefs more generally. Only if the results are specific to a pain-empathy task, would there be evidence to associate the results to pain specifically. But even then, it would remain unclear whether the effects truly relate to empathy for pain, or whether they may reflect other routes of processing pain.

      We thank Reviewer #1's for these comments/suggestions regarding the specificity of belief effects on brain activity involved in empathy for pain. Our paper reported 6 behavioral/EEG/fMRI experiments that tested effects of beliefs of others’ pain on empathy and monetary donation (an empathy-related altruistic behavior). We showed not only behavioral but also neuroimaging results that consistently support the hypothesis of the functional role of beliefs of others' pain in modulations of empathy (based on both subjective and objective measures as clarified in the revision) and altruistic behavior. We agree with Reviewer 1# that it is important to address whether the belief effect is specific to neural underpinnings of empathy for pain or is general for neural responses to various facial expressions such as happy, as suggested by Reviewer #1. To address this issue, we conducted an additional EEG experiment (which can be done in a limited time in the current situation), as suggested by Reviewer #1. This new EEG experiment tested (1) whether beliefs of authenticity of others’ happiness influence brain responses to perceived happy expressions; (2) whether beliefs of happiness modulate neural responses to happy expressions in the P2 time window as that characterized effects of beliefs of pain on ERPs.

      Our behavioral results in this experiment (as Supplementary Experiment 1 reported in the revision) showed that the participants reported less feelings of happiness when viewing actors who simulate others' smiling compared to when viewing awardees who smile due to winning awards (see the figure below). Our ERP results in Supplementary Experiment 1 further showed that lack of beliefs of authenticity of others’ happiness (e.g., actors simulate others' happy expressions vs. awardees smile and show happy expressions due to winning an award) reduced the amplitudes of a long-latency positive component (i.e., P570) over the frontal region in response to happy expressions. These findings suggest that (1) there are possibly general belief effects on subjective feelings and brain activities in response to facial expressions; (2) beliefs of others' pain or happiness affect neural responses to facial expressions in different time windows after face onset; (3) modulations of the P2 amplitude by beliefs of pain may not be generalized to belief effects on neural responses to any emotional states of others. We reported the results of this new ERP experiment in the revision as Supplementary Experiment 1 and also discussed the issue of specificity of modulations of empathic neural responses by beliefs of others' pain in the revised Discussion (page 49-50).

      Figure Supplementary Experiment Figure 1. EEG results of Supplementary Experiment 1. (a) Mean rating scores of happy intensity related to happy and neutral expressions of faces with awardee or actor/actress identities. (b) ERPs to faces with awardee or actor/actress identities at the frontal electrodes. The voltage topography shows the scalp distribution of the P570 amplitude with the maximum over the central/parietal region. (c) Mean differential P570 amplitudes to happy versus neutral expressions of faces with awardee or actor/actress identities. The voltage topographies illustrate the scalp distribution of the P570 difference waves to happy (vs. neutral) expressions of faces with awardee or actor/actress identities, respectively. Shown are group means (large dots), standard deviation (bars), measures of each individual participant (small dots), and distribution (violin shape) in (a) and (c).

      In the revised Introduction we cited additional literatures to explain the concept of empathy, behavioral and neuroimaging measures of empathy, and how, similar to previous research, we studied empathy for others' pain using subjective (self reports) and objective (brain responses) estimation of empathy (page 6-7). In particular, we mentioned that subjective estimation of empathy for pain depends on collection of self-reports of others' pain and ones' own painful feelings when viewing others' suffering. Objective estimation of empathy for pain relies on recording of brain activities (using fMRI, EEG, etc.) that differentially respond to painful or non-painful stimuli applied to others. fMRI studies revealed greater activations in the ACC, AI, and sensorimotor cortices in response to painful or non-painful stimuli applied to others. EEG studies showed that event-related potentials (ERPs) in response to perceived painful stimulations applied to others' body parts elicited neural responses that differentiated between painful and neutral stimuli over the frontal region as early as 140 ms after stimulus onset (Fan and Han, 2008; see Coll, 2018 for review). Moreover, the mean ERP amplitudes at 140–180 ms predicted subjective reports of others' pain and ones' own unpleasantness. Particularly related to the current study, previous research showed that pain compared to neutral expressions increased the amplitude of the frontal P2 component at 128–188 ms after stimulus onset (Sheng and Han, 2012; Sheng et al., 2013; 2016; Han et al., 2016; Li and Han, 2019) and the P2 amplitudes in response to others' pain expressions positively predicted subjective feelings of own unpleasantness induced by others' pain and self-report of one's own empathy traits (e.g., Sheng and Han, 2012). These brain imaging findings indicate that brain responses to others' pain can (1) differentiate others' painful or non-painful emotional states to support understanding of others' pain and (2) predict subjective feelings of others' pain and one's own unpleasantness induced by others' pain to support sharing of others' painful feelings. These findings provide effective subjective and objective measures of empathy that were used in the current study to investigate neural mechanisms underlying modulation of empathy and altruism by beliefs of others’ pain.

      In addition, we took Reviewer #1’s suggestion for VPS analyses which examined specifically how neural activities in the empathy-related regions identified in the previous research (Krishnan et al., 2016, eLife) were modulated by beliefs of others’ pain. The results (page 40) provide further evidence for our hypothesis. We also reported new results of RSA analyses(page 39) that activities in the brain regions supporting affective sharing (e.g., insula), sensorimotor resonance (e.g., post-central gyrus), and emotion regulation (e.g., lateral frontal cortex) provide intermediate mechanisms underlying modulations of subjective feelings of others' pain intensity due to lack of BOP. We believe that, putting all these results together, our paper provides consistent evidence that empathy and altruistic behavior are modulated by BOP.

      Reviewer #2 (Public Review):

      [...] 1. In laying out their hypotheses, the authors write, "The current work tested the hypothesis that BOP provides a fundamental cognitive basis of empathy and altruistic behavior by modulating brain activity in response to others' pain. Specifically, we tested predictions that weakening BOP inhibits altruistic behavior by decreasing empathy and its underlying brain activity whereas enhancing BOP may produce opposite effects on empathy and altruistic behavior." While I'm a little dubious regarding the enhancement effects (see below), a supporting assumption here seems to be that at baseline, we expect that painful expressions reflect real pain experience. To that end, it might be helpful to ground some of the introduction in what we know about the perception of painful expressions (e.g., how rapidly/automatically is pain detected, do we preferentially attend to pain vs. other emotions, etc.).

      Thanks for this suggestion! We included additional details about previous findings related to processes of painful expressions in the revised Introduction (page 7-8). Specifically, we introduced fMRI and ERP studies of pain expressions that revealed structures and temporal procedure of neural responses to others' pain (vs. neutral) expressions. Moreover, neural responses to others' pain (vs. neutral) expressions were associated with self-report of others' feelings, indicating functional roles of pain-expression induced brain activities in empathy for pain.

      1. For me, the key takeaway from this manuscript was that our assessment of and response to painful expressions is contextually-sensitive - specifically, to information reflecting whether or not targets are actually in pain. As the authors state it, "Our behavioral and neuroimaging results revealed critical functional roles of BOP in modulations of the perception-emotion-behavior reactivity by showing how BOP predicted and affected empathy/empathic brain activity and monetary donations. Our findings provide evidence that BOP constitutes a fundamental cognitive basis for empathy and altruistic behavior in humans." In other words, pain might be an incredibly socially salient signal, but it's still easily overridden from the top down provided relevant contextual information - you won't empathize with something that isn't there. While I think this hypothesis is well-supported by the data, it's also backed by a pretty healthy literature on contextual influences on pain judgments (including in clinical contexts) that I think the authors might want to consider referencing (here are just a few that come to mind: Craig et al., 2010; Twigg et al., 2015; Nicolardi et al., 2020; Martel et al., 2008; Riva et al., 2015; Hampton et al., 2018; Prkachin & Rocha, 2010; Cui et al., 2016).

      Thanks for this great suggestion! Accordingly, we included an additional paragraph in the revised Discussion regarding how social contexts influence empathy and cited the studies mentioned here (page 46-47).

      1. I had a few questions regarding the stimuli the authors used across these experiments. First, just to confirm, these targets were posing (e.g., not experiencing) pain, correct? Second, the authors refer to counterbalancing assignment of these stimuli to condition within the various experiments. Was target gender balanced across groups in this counterbalancing scheme? (e.g., in Experiment 1, if 8 targets were revealed to be actors/actresses in Round 2, were 4 female and 4 male?) Third, were these stimuli selected at random from a larger set, or based on specific criteria (e.g., normed ratings of intensity, believability, specificity of expression, etc.?) If so, it would be helpful to provide these details for each experiment.

      We'd be happy to clarify these questions. First, photos of faces with pain or neutral expressions were adopted from the previous work (Sheng and Han, 2012). Photos were taken from models who were posing but not experience pain. These photos were taken and selected based on explicit criteria of painful expressions (i.e., brow lowering, orbit tightening, and raising of the upper lip; Prkachin, 1992). In addition, the models' facial expressions were validated in independent samples of participants (see Sheng and Han, 2012). Second, target gender was also balanced across groups in this counterbalancing scheme. We also analyzed empathy rating score and monetary donations related to male and female target faces and did not find any significant gender effect (see our response to Point 5 below). Third, because the face stimuli were adopted from the previous work and the models' facial expressions were validated in independent samples of participants regarding specificity of expression, pain intensity, etc (Sheng and Han, 2012), we did not repeat these validation in our participants. Most importantly, we counterbalanced the stimuli in different conditions so that the stimuli in different conditions (e.g., patient vs. actor/actress conditions) were the same across the participants in each experiment. The design like this excluded any potential confound arising from the stimuli themselves.

      1. The nature of the charitable donation (particularly in Experiment 1) could be clarified. I couldn't tell if the same charity was being referenced in Rounds 1 and 2, and if there were multiple charities in Round 2 (one for the patients and one for the actors).

      Thanks for this comment! Yes, indeed, in both Rounds 1 and 2, the participants were informed that the amount of one of their decisions would be selected randomly and donated to one of the patients through the same charity organization (we clarified these in the revised Method section, page 55-56). We made clear in the revision that after we finished all the experiments of this study, the total amount of the participants' donations were subject to a charity organization to help patients who suffer from the same disease after the study.

      1. I'm also having a hard time understanding the authors' prediction that targets revealed to truly be patients in the 2nd round will be associated with enhanced BOP/altruism/etc. (as they state it: "By contrast, reconfirming patient identities enhanced the coupling between perceived pain expressions of faces and the painful emotional states of face owners and thus increased BOP.") They aren't in any additional pain than they were before, and at the outset of the task, there was no reason to believe that they weren't suffering from this painful condition - therefore I don't see why a second mention of their pain status should increase empathy/giving/etc. It seems likely that this is a contrast effect driven by the actor/actress targets. See the Recommendations for the Authors for specific suggestions regarding potential control experiments. (I'll note that the enhancement effect in Experiment 2 seems more sensible - here, the participant learns that treatment was ineffective, which may be painful in and of itself.)

      Thanks for comments on this important point! Indeed, our results showed that reassuring patient identities in Experiment 1 or by noting the failure of medical treatment related to target faces in Experiment 2 increased rating scores of others' pain and own unpleasantness and prompted more monetary donations to target faces. The increased empathy rating scores and monetary donations might be due to that repeatedly confirming patient identity or knowing the failure of medical treatment increased the belief of authenticity of targets' pain and thus enhanced empathy. However, repeatedly confirming patient identity or knowing the failure of medical treatment might activate other emotional responses to target faces such as pity or helplessness, which might also influence altruistic decisions. We agree with Reviewer #2 that, although our subjective estimation of empathy in Exp. 1 and 2 suggested enhanced empathy in the 2nd_round test, there are alternative interpretations of the results and these should be clarified in future work. We clarified these points in the revised Discussion (page 41-42).

      1. I noted that in the Methods for Experiment 3, the authors stated "We recruited only male participants to exclude potential effects of gender difference in empathic neural responses." This approach continues through the rest of the studies. This raises a few questions. Are there gender differences in the first two studies (which recruited both male and female participants)? Moreover, are the authors not concerned about target gender effects? (Since, as far as I can tell, all studies use both male and female targets, which would mean that in Experiments 3 and on, half the targets are same-gender as the participants and the other half are other-gender.) Other work suggests that there are indeed effects of target gender on the recognition of painful expressions (Riva et al., 2011).

      Thanks for raising this interesting question! Therefore, we reanalyzed data in Exp. 1 by including participants' gender or face gender as an independent variable. The three-way ANOVAs of pain intensity scores and amounts of monetary donations with Face Gender (female vs. male targets) × Test Phase (1st vs. 2nd_round) × Belief Change (patient-identity change vs. patient-identity repetition) did not show any significant three-way interaction (F(1,59) = 0.432 and 0.436, p = 0.514 and 0.512, ηp2 = 0.007 and 0.007, 90% CI = (0, 0.079) and (0, 0.079), indicating that face gender do not influence the results (see the figure below). Similarly, the three-way ANOVAs with Participant Gender (female vs. male participants) × Test Phase × Belief Change did not show any significant three-way interaction (F(1,58) = 0.121 and 1.586, p = 0.729 and 0.213, ηp2 = 0.002 and 0.027, 90% CI = (0, 0.055) and (0, 0.124), indicating no reliable difference in empathy and donation between men and women. It seems that the measures of empathy and altruistic behavior in our study were not sensitive to gender of empathy targets and participants' sexes.

      image Figure legend: (a) Scores of pain intensity and amount of monetary donations are reported separately for male and female target faces. (b) Scores of pain intensity and amount of monetary donations are reported separately for male and female participants.

      1. I was a little unclear on the motivation for Experiment 4. The authors state "If BOP rather than other processes was necessary for the modulation of empathic neural responses in Experiment 3, the same manipulation procedure to assign different face identities that do not change BOP should change the P2 amplitudes in response to pain expressions." What "other processes" are they referring to? As far as I could tell, the upshot of this study was just to demonstrate that differences in empathy for pain were not a mere consequence of assignment to social groups (e.g., the groups must have some relevance for pain experience). While the data are clear and as predicted, I'm not sure this was an alternate hypothesis that I would have suggested or that needs disconfirming.

      Thanks for this comment! We feel sorry for not being able to make clear the research question in Exp. 4. In the revised Results section (page 27-28) we clarified that the learning and EEG recording procedures in Experiment 3 consisted of multiple processes, including learning, memory, identity recognition, assignment to social groups, etc. The results of Experiment 3 left an open question of whether these processes, even without BOP changes induced through these processes, would be sufficient to result in modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces with different identities. In Experiment 4 we addressed this issue using the same learning and identity recognition procedures as those in Experiment 3 except that the participants in Experiment 4 had to learn and recognize identities of faces of two baseball teams and that there is no prior difference in BOP associated with faces of beliefs of the two baseball teams. If the processes involved in the learn and reorganization procedures rather than the difference in BOP were sufficient for modulation of the P2 amplitude in response to pain (vs. neutral) expressions of faces, we would expect similar P2 modulations in Experiments 4 and 3. Otherwise, the difference in BOP produced during the learning procedure was necessary for the modulation of empathic neural responses, we would not expect modulations of the P2 amplitude in response to pain (vs. neutral) expressions in Experiment 4. We believe that the goal and rationale of Exp. 4 are clear now.

    1. Author Response:

      We thank the editors and the reviewers for their careful reading and rigorous evaluation of our manuscript. We thank them for their positive comments and constructive feedback, which led us to add further lines of evidence in support of our central hypothesis that intrinsic neuronal resonance could stabilize heterogeneous grid-cell networks through targeted suppression of low-frequency perturbations. In the revised manuscript, we have added a physiologically rooted mechanistic model for intrinsic neuronal resonance, introduced through a slow negative feedback loop. We show that stabilization of patterned neural activity in a heterogeneous continuous attractor network (CAN) model could be achieved with this resonating neuronal model. These new results establish the generality of the stabilizing role of neuronal resonance in a manner independent of how resonance was introduced. More importantly, by specifically manipulating the feedback time constant in the neural dynamics, we establish the critical role of the slow kinetics of the negative feedback loop in stabilizing network function. These results provide additional direct lines of evidence for our hypothesis on the stabilizing role of resonance in the CAN model employed here. Intuitively, we envisage intrinsic neuronal resonance as a specific cellular-scale instance of a negative feedback loop. The negative feedback loop is a well-established network motif that acts as a stabilizing agent and suppresses the impact of internal and external perturbations in engineering applications and biological networks.

      Reviewer #1 (Public Review):

      The authors succeed in conveying a clear and concise description of how intrinsic heterogeneity affects continuous attractor models. The main claim, namely that resonant neurons could stabilize grid-cell patterns in medial entorhinal cortex, is striking.

      We thank the reviewer for their time and effort in evaluating our manuscript, and for their rigorous evaluation and positive comments on our study.

      I am intrigued by the use of a nonlinear filter composed of the product of s with its temporal derivative raised to an exponent. Why this particular choice? Or, to be more specific, would a linear bandpass filter not have served the same purpose?

      Please note that the exponent was merely a mechanism to effectively tune the resonance frequency of the resonating neuron. In the revised manuscript, we have introduced a new physiologically rooted means to introduce intrinsic neuronal resonance, thereby confirming that network stabilization achieved was independent of the formulation employed to achieve resonance.

      The magnitude spectra are subtracted and then normalized by a sum. I have slight misgivings about the normalization, but I am more worried that, as no specific formula is given, some MATLAB function has been used. What bothers me a bit is that, depending on how the spectrogram/periodogram is computed (in particular, averaged over windows), one would naturally expect lower frequency components to be more variable. But this excess variability at low frequencies is a major point in the paper.

      We have now provided the specific formula employed for normalization as equation (16) of the revised manuscript. We have also noted that this was performed to account for potential differences in the maximum value of the homogeneous vs. heterogeneous spectra. The details are provided in the Methods subsection “Quantitative analysis of grid cell temporal activity in the spectral domain” of the revised manuscript. Please note that what is computed is the spectra of the entire activity pattern, and not a periodogram or a scalogram. There was no tiling of the time-frequency plane involved, thus eliminating potential roles of variables there on the computation here.

      In addition to using variances of normalized differences to quantify spectral distributions, we have also independently employed octave-based analyses (which doesn’t involve normalized differences) to strengthen our claims about the impact of heterogeneities and resonance on different bands of frequency. These octave-based analyses also confirm our conclusions on the impact of heterogeneities and neuronal resonance on low-frequency components.

      Finally, we would like to emphasize that spectral computations are the same for different networks, with networks designed in such a way that there was only one component that was different. For instance, in introducing heterogeneities, all other parameters of the network (the specific trajectory, the seed values, the neural and network parameters, the connectivity, etc.) remained exactly the same with the only difference introduced being confined to the heterogeneities. Computation of the spectral properties followed identical procedures with activity from individual neurons in the two networks, and comparison was with reference to identically placed neurons in the two networks. Together, based on the several routes to quantifying spectral signatures, based on the experimental design involved, and based on the absence of any signal-specific tiling of the time-frequency plane, we argue that the impact of heterogeneities or the resonators on low-frequency components is not an artifact of the analysis procedures.

      We thank the reviewer for raising this issue, as it helped us to elaborate on the analysis procedures employed in our study.

      Which brings me to the main thesis of the manuscript: given the observation of how heterogeneities increase the variability in the low temporal frequency components, the way resonant neurons stabilize grid patterns is by suppressing these same low frequency components.

      I am not entirely convinced that the observed correlation implies causality. The low temporal frequeny spectra are an indirect reflection of the regularity or irregularity of the pattern formation on the network, induced by the fact that there is velocity coupling to the input and hence dynamics on the network. Heterogeneities will distort the pattern on the network, that is true, but it isn't clear how introducing a bandpass property in temporal frequency space affects spatial stability causally.

      Put it this way: imagine all neurons were true oscillators, only capable of oscillating at 8 Hz. If they were to synchronize within a bump, one will have the field blinking on and off. Nothing wrong with that, and it might be that such oscillatory pattern formation on the network might be more stable than non-oscillatory pattern formation (perhaps one could even demonstrate this mathematically, for equivalent parameter settings), but this kind of causality is not what is shown in the manuscript.

      The central hypothesis of our study was that intrinsic neuronal resonance could stabilize heterogeneous grid-cell networksthrough targeted suppression of low-frequency perturbations.

      In the revised manuscript, we present the following lines of evidence in support of this hypothesis (mentioned now in the first paragraph of the discussion section of the revised manuscript):

      1. Neural-circuit heterogeneities destabilized grid-patterned activity generation in a 2D CAN model (Figures 2–3).

      2. Neural-circuit heterogeneities predominantly introduced perturbations in the lowfrequency components of neural activity (Figure 4).

      3. Targeted suppression of low-frequency components through phenomenological (Figure 5C) or through mechanistic (new Figure 9D) resonators resulted in stabilization of the heterogeneous CAN models (Figure 8 and new Figure 11). We note that the stabilization was achieved irrespective of the means employed to suppress low-frequency components: an activity-independent suppression of low-frequencies (Figure 5) or an activity-dependent slow negative feedback loop (new Figure 9).

      4. Changing the feedback time constant τm in mechanistic resonators, without changes to neural gain or the feedback strength allowed us to control the specific range of frequencies that would be suppressed. Our analyses showed that a slow negative feedback loop, which results in targeted suppression of low-frequency components, was essential in stabilizing grid-patterned activity (new Figure 12). As the slow negative feedback loop and the resultant suppression of low frequencies mediates intrinsic resonance, these analyses provide important lines of evidence for the role of targeted suppression of low frequencies in stabilizing grid patterned activity.

      5. We demonstrate that the incorporation of phenomenological (Figure 13A–C) or mechanistic (new Figure panels 13D–F) resonators specifically suppressed lower frequencies of activity in the 2D CAN model.

      6. Finally, the incorporation of resonance through a negative feedback loop allowed us to link our analyses to the well-established role of network motifs involving negative feedback loops in inducing stability and suppressing external/internal noise in engineering and biological systems. We envisage intrinsic neuronal resonance as a cellular-scale activitydependent negative feedback mechanism, a specific instance of a well-established network motif that effectuates stability and suppresses perturbations across different networks (Savageau, 1974; Becskei and Serrano, 2000; Thattai and van Oudenaarden, 2001; Austin et al., 2006; Dublanche et al., 2006; Raj and van Oudenaarden, 2008; Lestas et al., 2010; Cheong et al., 2011; Voliotis et al., 2014). A detailed discussion on this important link to the stabilizing role of this network motif, with appropriate references to the literature is included in the new discussion subsection “Slow negative feedback: Stability, noise suppression, and robustness”.

      We thank the reviewer for their detailed comments. These comments helped us to introducing a more physiologically rooted mechanistic form of resonance, where we were able to assess the impact of slow kinetics of negative feedback on network stability, thereby providing more direct lines of evidence for our hypothesis. This also allowed us to link resonance to the wellestablished stability motif: the negative feedback loop. We also note that our analyses don’t employ resonance as a route to introducing oscillations in the network, but as a means for targeted suppression of low-frequency perturbations through a negative feedback loop. Given the strong quantitative links of negative feedback loops to introducing stability and suppressing the impact of perturbations in engineering applications and biological networks, we envisage intrinsic neuronal resonance as a stability-inducing cellular-scale activity-dependent negative feedback mechanism.

      Reviewer #2 (Public Review):

      [...] The pars construens demonstrates that similar networks, but comprised of units with different dynamical behavior, essentially amputated of their slowest components, do not suffer from the heterogeneities - they still produce grids. This part proceeds through 3 main steps: a) defining "resonator" units as model neurons with amputated low frequencies (Fig. 5); b) showing that inserted into the same homogeneous CAN network, "resonator" units produce the same grids as "integrator" units (Figs. 6,7); c) demonstrating that however the network with "resonator" units is resistant to heterogeneities (Fig. 8). Figs. 9 and 10 help understand what has produced the desired grid stabilization effect. This second part is on the whole also well structured, and its step c) is particularly convincing.

      We thank the reviewer for their time and effort in evaluating our manuscript, and for their rigorous evaluation and positive comments on our study.

      Step b) intends to show that nothing important changes, in grid pattern terms, if one replaces the standard firing rate units with the ad hoc defined units without low frequency behavior. The exact outcome of the manipulation is somewhat complex, as shown in Figs. 6 and 7, but it could be conceivably summed up by stating that grids remain stable, when low frequencies are removed. What is missing, however, is an exploration of whether the newly defined units, the "resonators", could produce grid patterns on their own, without the CAN arising from the interactions between units, just as a single-unit effect. I bet they could, because that is what happens in the adaptation model for the emergence of the grid pattern, which we have studied extensively over the years. Maybe with some changes here and there, but I believe the CAN can be disposed of entirely, except to produce a common alignment between units, as we have shown.

      Step a), finally, is the part of the study that I find certainly not wrong, but somewhat misleading. Not wrong, because what units to use in a model, and what to call them, is a legitimate arbitrary choice of the modelers. Somewhat misleading, because the term "resonator" evokes a more specific dynamical behavior that than obtained by inserting Eqs. (8)-(9) into Eq. (6), which amounts to a brute force amputation of the low frequencies, without any real resonance to speak of. Unsurprisingly, Fig. 5, which is very clear and useful, does not show any resonance, but just a smooth, broad band-pass behavior, which is, I stress legitimately, put there by hand. A very similar broad band-pass would result from incorporating into individual units a model of firing rate adaptation, which is why I believe the "resonator" units in this study would generate grid patterns, in principle, without any CAN.

      We thank the reviewer for these constructive comments and questions, as they were extremely helpful in (i) formulating a new model for rate-based resonating neurons that is more physiologically rooted; (ii) demonstrating the stabilizing role of resonance irrespective of model choices that implemented resonance; and (iii) mechanistically exploring the impact of targeted suppression of low frequency components in neural activity. We answer these comments of the reviewer in two parts, the first addressing other models for grid-patterned activity generation and the second addressing the reviewer’s comment on “brute force amputation of the low frequencies” in the resonator neuron presented in the previous version of our manuscript.

      I. Other models for grid-patterned activity generation.

      In the adaptation model (Kropff and Treves, 2008; Urdapilleta et al., 2017; Stella et al., 2020), adaptation in conjunction with place-cell inputs, Hebbian synaptic plasticity, and intrinsic plasticity (in gain and threshold) to implement competition are together sufficient for the emergence of the grid-patterned neural activity. However, the CAN model that we chose as the substrate for assessing the impact of neural circuit heterogeneities on functional stability is not equipped with the additional components (place-cell inputs, synaptic/intrinsic plasticity). Therefore, we note that decoupling the single unit (resonator or integrator) from the network does not yield grid-patterned activity.

      However, we do agree that a resonator neuron endowed with additional components from the adaptation model would be sufficient to elicit grid-patterned neural activity. This is especially clear with the newly introduced mechanistic model for resonance through a slow feedback loop (Figure 9). Specifically, resonating conductances such as HCN and M-type potassium channels can effectuate spike-frequency adaptation. One of the prominent channels that is implicated in introducing adaptation, the calcium-activated potassium channels implement a slow activitydependent negative feedback loop through the slow calcium kinetics. Neural activity drives calcium influx, and the slow kinetics of the calcium along with the channel-activation kinetics drive a potassium current that completes a negative feedback loop that inhibits neural activity. Consistently, one of the earliest-reported forms of electrical resonance in cochlear hair cells was shown to be mediated by calcium-activated potassium channels (Crawford and Fettiplace, 1978, 1981; Fettiplace and Fuchs, 1999). Thus, adaptation realized as a slow negative-feedback loop, in conjunction with place-cell inputs and intrinsic/synaptic plasticity would elicit gridpatterned neural activity as demonstrated earlier (Kropff and Treves, 2008; Urdapilleta et al., 2017; Stella et al., 2020).

      There are several models for the emergence of grid-patterned activity, and resonance plays distinct roles (compared to the role proposed through our analyses) in some of these models (Giocomo et al., 2007; Kropff and Treves, 2008; Burak and Fiete, 2009; Burgess and O'Keefe, 2011; Giocomo et al., 2011b; Giocomo et al., 2011a; Navratilova et al., 2012; Pastoll et al., 2012; Couey et al., 2013; Domnisoru et al., 2013; Schmidt-Hieber and Hausser, 2013; Yoon et al., 2013; Schmidt-Hieber et al., 2017; Urdapilleta et al., 2017; Stella et al., 2020; Tukker et al., 2021). However, a common caveat that spans many of these models is that they assume homogeneous networks that do not account for the ubiquitous heterogeneities that span neural circuits. Our goal in this study was to take a step towards rectifying this caveat, towards understanding the impact of neural circuit heterogeneities on network stability. We chose the 2D CAN model for grid-patterned activity generation as the substrate for addressing this important yet under-explored question on the role of biological heterogeneities on network function. As we have mentioned in the discussion section, this choice implies that our conclusions are limited to the 2D CAN model for grid patterned generation; these conclusions cannot be extrapolated to other networks or other models for grid-patterned activity generation without detailed analyses of the impact of neural circuit heterogeneities in those models. As our focus here was on the stabilizing role of resonance in heterogeneous neural networks, with 2D CAN model as the substrate, we have not implemented the other models for grid-patterned generation. The impact of biological heterogeneities and resonance on each of these models should be independently addressed with systematic analyses similar to our analyses for the 2D CAN model. As different models for grid-patterned activity generation are endowed with disparate dynamics, and have different roles for resonance, it is conceivable that the impact of biological heterogeneities and intrinsic neuronal resonance have differential impact on these different models. We have mentioned this as a clear limitation of our analyses in the discussion section, also presenting future directions for associated analyses(subsection: “Future directions and considerations in model interpretation”).

      II. Brute force amputation of the low frequencies in the resonator model.

      We completely agree with the reviewer on the observation that the resonator model employed in the previous version of our manuscript was rather artificial, with the realization involving brute force amputation of the lower frequencies. To address this concern, in the revised manuscript, we constructed a new mechanistic model for single-neuron resonance that matches the dynamical behavior of physiological resonators. Specifically, we noted that physiological resonance is elicited by a slow activity-dependent negative feedback (Hutcheon and Yarom, 2000). To incorporate resonance into our rate-based model neurons, we mimicked this by introducing a slow negative feedback loop into our single-neuron dynamics (the motivations are elaborated in the new results subsection “Mechanistic model of neuronal intrinsic resonance: Incorporating a slow activity-dependent negative feedback loop”). The singleneuron dynamics of mechanistic resonators were defined as follows:

      Diagram

      Here, S governed neuronal activity, τ defined the feedback state variable, g represented the integration time constant, Ie was the external current, and g represented feedback strength. The slow kinetics of the negative feedback was controlled by the feedback time constant (τm). In order to manifest resonance, τm > τ (Hutcheon and Yarom, 2000). The steady-state feedback kernel (m∞) of the negative feedback is sigmoidally dependent on the output of the neuron (S), defined by two parameters: half-maximal activity (S1/2) and slope (k). The single-neuron dynamics are elaborated in detail in the methods section (new subsection: Mechanistic model for introducing intrinsic resonance in rate-based neurons).

      We first demonstrate that the introduction of a slow-negative feedback loop introduce resonance into single-neuron dynamics (new Figure 9D–E). We performed systematic sensitivity analyses associated with the parameters of the feedback loop and characterized the dependencies of intrinsic neuronal resonance on model parameters (new Figure 9F–I). We demonstrate that the incorporation of resonance through a negative feedback loop was able to generate grid-patterned activity in the 2D CAN model employed here, with clear dependencies on model parameters (new Figure 10; new Figure 10-Supplements1–2). Next, we incorporated heterogeneities into the network and demonstrated that the introduction of resonance through a negative feedback loop stabilized grid-patterned generation in the heterogeneous 2D CAN model (new Figure 11).

      The mechanistic route to introducing resonance allowed us to probe the basis for the stabilization of grid-patterned activity more thoroughly. Specifically, with physiological resonators, resonance manifests only when the feedback loop is slow (new Figure 9I; Hutcheon and Yarom, 2000). This allowed us an additional mechanistic handle to directly probe the role of resonance in stabilizing the grid patterned activity. We assessed the emergence of grid-patterned activity in heterogeneous CAN models constructed with networks constructors with neurons with different τm values (new Figure 12). Strikingly, we found that when τm value was small (resulting in fast feedback loops), there was no stabilization of gridpatterned activity in the CAN model, especially with the highest degree of heterogeneities (new Figure 12). With progressive increase in τm, the patterns stabilized with grid score increasing with τm=25 ms (new Figure 12) and beyond (new Figure 11B; τm=75 ms). Finally, our spectral analyses comparing frequency components of homogeneous vs. heterogeneous resonator networks (new Figure panels 13D–F) showed the suppression of low-frequency perturbations in heterogeneous CAN networks.

      We gratefully thank the reviewer for raising the issue with the phenomenological resonator model. This allowed us to design the new resonator model and provide several new lines of evidence in support of our central hypothesis. The incorporation of resonance through a negative feedback loop also allowed us to link our analyses to the well-established role of network motifs involving negative feedback loops in inducing stability and suppressing external/internal noise in engineering and biological systems. We envisage intrinsic neuronal resonance as a cellular-scale activity-dependent negative feedback mechanism, a specific instance of a well-established network motif that effectuates stability and suppresses perturbations across different networks (Savageau, 1974; Becskei and Serrano, 2000; Thattai and van Oudenaarden, 2001; Austin et al., 2006; Dublanche et al., 2006; Raj and van Oudenaarden, 2008; Lestas et al., 2010; Cheong et al., 2011; Voliotis et al., 2014). A detailed discussion on this important link to the stabilizing role of this network motif, with appropriate references to the literature is included in the new discussion subsection “Slow negative feedback: Stability, noise suppression, and robustness”.

    1. eLife Assessment

      Antibodies that selectively bind distinct amyloid-beta variants are vital tools for Alzheimer's disease research. This valuable manuscript aims to delineate the epitope specificity in a panel of anti-amyloid-beta antibodies, including some with clinical relevance. The experiments were rigorously conducted, employing an interesting combination of established and state-of-the-art methodologies, yielding convincing findings.

    2. Reviewer #1 (Public review):

      The manuscript by Ivan et al aimed to identify epitopes on the Abeta peptide for a large set of anti-Abeta antibodies, including clinically relevant antibodies. The experimental work was well done and required a major experimental effort including peptide mutational scanning, affinity determinations, molecular dynamics simulations, IP-MS, WB and IHC. The first part of the work is focused on an assay in which peptides (15-18-mers) based on the human Abeta sequence, including some containing known PTMs, are immobilized, thus preventing aggregation and for this reason provide limited biologically-relevant information. Although some results are in agreement with previous experimental structural data (e.g. for 3D6), and some responses to disease-associated mutations were different when compared to wild-type sequences (e.g. in the case of Aducanumab) - which may have implications for personalized treatment. On the other hand, the contribution of conformation (as in oligomers and large aggregates) in antibody recognition patterns was took into consideration in the second part of the study, in which both full-length Abeta in monomeric or aggregated forms and human CSF was employed to investigate the differential epitope interaction between Aducanumab, donanemab and lecanemab. Interestingly, these results confirmed the expected preference of these antibodies for aggregated Abeta. Overall, I understand that the work is of interest to the field.

      Comments on revisions:

      I have no additional recommendations.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      The manuscript by Ivan et al aimed to identify epitopes on the Abeta peptide for a large set of anti-Abeta antibodies, including clinically relevant antibodies. The experimental work was well done and required a major experimental effort, including peptide mutational scanning, affinity determinations, molecular dynamics simulations, IP-MS, WB, and IHC. Therefore, it is of clear interest to the field. The first part of the work is mainly based on an assay in which peptides (15-18-mers) based on the human Abeta sequence, including some containing known PTMs, are immobilized, thus preventing aggregation. Although some results are in agreement with previous experimental structural data (e.g. for 3D6), and some responses to diseaseassociated mutations were different when compared to wild-type sequences (e.g. in the case of Aducanumab) - which may have implications for personalized treatment - I have concerns about the lack of consideration of the contribution of conformation (as in small oligomers and large aggregates) in antibody recognition patterns. The second part of the study used fulllength Abeta in monomeric or aggregated forms to further investigate the differential epitope interaction between Aducanumab, Donanemab, and Lecanemab (Figures 5-7). Interestingly, these results confirmed the expected preference of these antibodies for aggregated Abeta, thus reinforcing my concerns about the conclusions drawn from the results obtained using shorter and immobilized forms of Abeta. Overall, I understand that the work is of interest to the field and should be published without the need for additional experimental data. However, I recommend a thorough revision of the structure of the manuscript in order to make it more focused on the results with the highest impact (second part).

      We thank the reviewer for highlighting this critical aspect. Our rationale for beginning with the high-resolution, aggregation-independent peptide microarray was to systematically dissect sequence requirements, including PTMs, truncations, and elongations, at single–amino acid resolution. This platform defines linear epitope preferences without the confounding influence of aggregation and enabled analyses that would not have been technically feasible with fulllength Aβ. This rationale is now clarified in the Introduction (lines 72–77).

      At the same time, the physiological relevance of antibody binding can only be assessed in the context of aggregation. Prompted by the reviewer’s comments, we restructured the manuscript to foreground the full-length, aggregation-dependent data (Figures 5–7). These assays demonstrate that Aducanumab preferentially recognizes aggregated peptide over monomers and that pre-adsorption with fibrils, but not monomers, blocks tissue reactivity (lines 585–599; Fig. 5B). They also show that Lecanemab can capture soluble Aβ in CSF by IP-MS (lines 544–547; Fig. 4B, Fig. 6–Supplement 1), and that Donanemab strongly binds low-molecular-weight pyroGlu-Aβ while also recognizing highly aggregated Aβ1-42 (lines 668–684; Fig. 7).

      The revised Conclusion now explicitly states the complementarity of the two approaches: microarrays for precise sequence and modification mapping, and full-length aggregation assays for context and physiological relevance (lines 705–714).

      Finally, prompted by the reviewer’s feedback, we refined the discussion of therapeutic antibodies to move beyond a descriptive dataset and provide mechanistic clarity. Specifically, the dimerization-supported, valency-dependent binding mode of Aducanumab and the additional structural contributions required for Lecanemab binding to aggregated Aβ are now integrated into the reworked Conclusion (lines 725–741).

      Reviewer #2 (Public review):  

      This paper investigates binding epitopes of different anti-Abeta antibodies. Background information on the clinical outcome of some of the antibodies in the paper, which might be important for readers to know, is lacking. There are no references to clinical outcomes from antibodies that have been in clinical trials. This paper would be much more complete if the status of the antibodies were included. The binding characteristics of aducanumab, donanemab, and Lecanemab should be compared with data from clinical phase 3 studies. 

      Aducanumab was identified at Neurimmune in Switzerland and licensed to Biogen and Eisai. Aducanumab was retracted from the market due to a very high frequency of the side-effect amyloid-related imaging abnormalities-edema (ARIA-E). Gantenerumab was developed by Roche and had two failed phase 3 studies, mainly due to a high frequency of ARIA-E and low efficacy of Abeta clearance. Lecanemab was identified at Uppsala University, humanized by BioArctic, and licensed to Eisai, who performed the clinical studies. Eisai and Biogen are now marketing Lecanemab as Leqembi on the world market. Donanemab was developed by Ely Lilly and is sold in the US as Kisunla. 

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have included a concise overview of the clinical status and outcomes of the therapeutic antibodies in the Introduction. This new section (lines 81–99) summarizes the origins, phase 3 trial outcomes, and current regulatory status of Aducanumab, Lecanemab, and Donanemab, as well as mentioning Gantenerumab as a comparator. Key aspects such as ARIA-E incidence, amyloid clearance efficacy, and regulatory decisions are now referenced to provide the necessary clinical context.

      These additions directly link our epitope mapping data with the clinical performance and safety profiles of the antibodies, thereby making the translational implications of our results clearer for both research and therapeutic applications.

      Limitations: 

      (1) Conclusions are based on Abeta antigens that may not be the primary targets for some conformational antibodies like aducanumab and Lecanemab. There is an absence of binding data for soluble aggregated species.

      We thank the reviewer for raising this important point. To address the absence of data on soluble aggregated species, we added IP-MS experiments using pooled human CSF as a physiologically relevant source of endogenous Aβ. Lecanemab enriched several endogenous soluble Aβ variants (Aβ1–40, Aβ1–38, Aβ1–37, Aβ1–39, and Aβ1–42), whereas Aducanumab did not yield detectable signals (Figure 4B; lines 544–547). These results directly distinguish between synthetic and patient-derived Aβ and highlight Lecanemab’s capacity to capture soluble Aβ species under biologically relevant conditions.

      (2) Quality controls and characterization of different Abeta species are missing. The authors need to verify if monomers remain monomeric in the blocking studies for Figures 5 and 6. 

      We thank the reviewer for this comment. In Figure 5 we show that pre-adsorption with monomeric Aβ1–42 does not prevent Aducanumab binding, whereas fibrillar Aβ1–42 completely abolishes staining, consistent with Aducanumab’s avidity-driven preference for higher-order aggregates.

      For Lecanemab (Figure 6), we observed a partial preference for aggregated Aβ1–42 over HFIP-treated monomeric and low-n oligomeric forms. We note, as now stated in the revised manuscript (lines 622–623), that monomeric preparations may partially re-aggregate under blocking conditions, which represents an inherent limitation of such experiments.

      To further address this, we performed additional blocking experiments using shorter Aβ peptides, which are less prone to aggregation. These peptides did not block immunohistochemical staining (Figure 6 – Supplement 1), underscoring that both epitope length and conformational state contribute to Lecanemab binding. This conclusion is also consistent with recent data presented at AAIC 2023.

      (3) The authors should discuss the limitations of studying synthetic Abeta species and how aggregation might hide or reveal different epitopes. 

      We thank the reviewer for this important comment. We now explicitly discuss the limitations of using synthetic Aβ peptides, including that aggregation state can mask or expose epitopes in ways that differ from endogenous species. This discussion has been added in the revised manuscript (lines 737–742).

      As noted in our replies to Points (2) and (4) here, and to Reviewer #1, we addressed this experimentally by complementing the high-resolution, aggregation-independent mapping with blocking studies using aggregated and monomeric Aβ preparations, and by validating key findings with IP-MS of human CSF as a physiologically relevant source of soluble Aβ. Together, these complementary approaches mitigate the limitations of synthetic peptides and provide a more comprehensive picture of antibody–Aβ interactions

      (4) The authors should elaborate on the differences between synthetic Abeta and patientderived Abeta. There is a potential for different epitopes to be available. 

      We thank the reviewer for this comment. In the revised manuscript we now discuss how comparisons between synthetic and patient-derived Aβ species reveal additional, likely conformational epitopes that are not accessible in short or monomeric synthetic forms. To address this directly, we performed IP-MS with pooled human CSF. Lecanemab enriched a diverse set of endogenous soluble Aβ1–X species (Aβ1–40, Aβ1–38, Aβ1–37, Aβ1–39, and Aβ1–42), whereas Aducanumab did not yield measurable pull-down (Figure 4B; lines 544– 547). These results emphasize that patient-derived Aβ displays distinct aggregation dynamics and epitope accessibility.

      We have expanded on this point in the Conclusion (lines 737–742), underscoring the

      importance of integrating both synthetic and native Aβ sources to capture the full range of antibody targets. 

      Reviewer #1 (Recommendations for the authors): 

      This revision should prioritize the presentation of results obtained using the full-length Abeta peptide, given its more direct relevance to expected antibody recognition patterns in physiological contexts, and discuss the evidence for using synthetic Abeta. 

      We thank the reviewer for this recommendation. The revised manuscript now places stronger emphasis on results obtained with full-length Aβ peptides, particularly in Figures 5–7, which analyze binding preferences across monomeric, oligomeric, and fibrillar states (lines 585–599, 609–623, 668–684). We also expanded the Discussion to outline both the rationale and the limitations of using synthetic Aβ. The microarray approach provides high-resolution, aggregation-independent sequence and modification mapping, but must be complemented by experiments with full-length Aβ1–42 under physiologically relevant conditions, such as IP-MS from CSF (lines 544–547) and blocking in IHC (lines 585–599, 622–623, 684), to capture conformational epitopes and validate functional relevance.

      Figure 6. = Please review/better explain the following statement "Lecanemab recognized Aβ140, Aβ1-42, Aβ3-40, Aβ-3-40 and phosphorylated pSer8-Aβ1-40 on CIEF-immunoassay and Bicine-Tris SDS-PAGE/ Western blot, indicating that the Lecanemabbs epitope is located in the N-terminal region of the Aβ sequence". Is it possible that N-truncated peptides do not form aggregates as efficiently as (or conformationally distinct from) full-length ones? 

      In the revised text we now clarify that Lecanemab recognized Aβ1-40, Aβ1-42, Aβ3-40, Aβ-340, and phosphorylated pSer8-Aβ1-40 on CIEF-immunoassay (Figure 6A; lines 612–619) and Bicine-Tris SDS-PAGE/Western blot (Figure 6C; lines 639–640). In contrast, shorter Ntruncated variants such as Aβ4-40 and Aβ5-40 did not generate detectable signals under the tested conditions. This is consistent with our initial microarray data (Figure 1), which indicated that Lecanemab binding depends on residues 3–7 of the N-terminus.

      On gradient Bistris SDS-PAGE/Western blot, Lecanemab showed a partial but not exclusive preference for aggregated Aβ1-42 over monomeric or low-n oligomeric forms in the HFIPtreated preparation (Figure 6B; lines 632–633). Immunohistochemical detection of Aβ deposits in AD brain sections was efficiently blocked by pre-adsorption with monomerized, oligomeric, or fibrillar Aβ1-42 (Figure 6E; lines 643–645), but not by shorter synthetic peptides such as Aβ1-16, Aβ1-34, or Aβ1-38 (Figure 6 – Supplement 1; lines 654–663).

      We also note, as now stated in the Results, that re-aggregation of HFIP-treated Aβ1-42 monomers during incubation cannot be entirely excluded (lines 622–623). Taken together, these experiments indicate that both N-terminal sequence length and conformational context are critical for Lecanemab binding, and that truncated peptides may indeed fail to reproduce the aggregate-associated conformations required for full recognition.

      Reviewer #2 (Recommendations for the authors): 

      Introduction: 

      (1) Include examples of Lecanemab, donanemab, and gantenerumab, along with relevant references. 

      We expanded the clinical-context paragraph that already covers Aducanumab, Lecanemab, and Donanemab (lines 81–96) and added Gantenerumab. 

      (2) Address why gantenerumab was not included in the study. 

      Due to the focus of our current study on antibodies with recently approved or late-stage clinical use (Aducanumab, Donanemab, Lecanemab), Gantenerumab was not included. 

      (3) Table 1: Correct the reference for Lecanemab, should be reference 44. 

      Table 1 has been updated to correct the Lecanemab reference.

      (4) Line 84: Add Uppsala University and Eisai alongside Biogen for Lecanemab. 

      Line 84 has been revised to acknowledge Uppsala University and Eisai alongside Biogen for the development of Lecanemab (lines 90–96).

      (5) Line 539: Include the reference: "Lecanemab, Aducanumab, and Gantenerumab - Binding Profiles to Different Forms of Amyloid-Beta Might Explain Efficacy and Side Effects in Clinical Trials for Alzheimer's Disease. doi: 10.1007/s13311-022-01308-6. 

      We thank the reviewer for drawing attention to this important reference (now cited as Ref. 83) provides a state-of-the-art comparison of binding profiles of Lecanemab, Aducanumab, and Gantenerumab, and we have now properly incorporated it into our manuscript. 

      (6) Line 657-659: State that the findings are also applicable to Lecanemab. 

      Discrepancies between analysis of the short synthetic fragments and the full-length Abeta are now resolved for Aducanumab and Lecanemab and put into context in the results section and the conclusion lines 725-740. 

      (7) Figures 5 and 6: Discuss how to ensure that monomers remain monomers under the study conditions, considering the aggregation-prone nature of Abeta1-42. This aggregation could impact Lecanemab's binding to "monomers." To our knowledge, Lecanemab does not bind to monomers. The binding properties observed diverge from previously described properties for Lecanemab. Explore reasons for these discrepancies and suggest conducting complementary experiments using a solution-based assay, as per Söderberg et al, 2023. In Figure 6, note that Lecanemab is strongly avidity-driven, potentially causing densely packed monomers to expose Abeta as aggregated, affecting binding interpretation on SDS-PAGE. 

      We thank the reviewer for this important point. In the revised Results and Discussion we explicitly note that HFIP-treated Aβ1–42 monomers may partially re-aggregate during incubation, which cannot be fully excluded (lines 622–623).

      To complement these data, we show that Lecanemab successfully enriched soluble endogenous Aβ species (Aβ1–40, Aβ1–38, Aβ1–37, Aβ1–39, and Aβ1–42) in IP-MS from pooled CSF (lines 544–547; Fig. 4B), demonstrating its ability to bind soluble Aβ under physiologically relevant conditions.

      We also now cite the Söderberg et al. (2023, PMID: 36253511) study, which reported weak but detectable binding of Lecanemab to monomeric Aβ (their Fig. 1 and Table 6). This supports our interpretation that Lecanemab is aggregation-sensitive rather than strictly aggregationdependent, in contrast to Aducanumab.

      To further address sequence and conformational contributions, we performed blocking experiments with shorter, non-HFIP-treated Aβ peptides (Aβ1–16, Aβ1–34, Aβ1–38). These peptides did not block Lecanemab staining in IHC (lines 654–657; Fig. 6 – Supplement 1), indicating that both extended sequence and conformational context are necessary for recognition.

      Finally, our findings are in line with preliminary data by Yamauchi et al. (AAIC 2023, DOI: 10.1002/alz.065104), who proposed that Lecanemab recognizes either a conformational epitope spanning the N-terminus and mid-region, or a structural change in the mid-region induced by the N-terminus.

    1. eLife Assessment

      This valuable work provides new insights into the role of lysine acetylation of alpha-synuclein, the protein involved in Parkinson's Disease. The evidence is mostly solid, but the claims around the potential disease relevance based on seeding assays and structural work need to be toned down, or else supported by additional experimental evidence. Overall, the work will be of interest to researchers in the fields of protein biophysics and post-translational modifications, as well as Parkinson's Disease.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes experiments with alpha-synuclein (aS) with acetylated lysines (acK) at various positions. Their findings on how to use non-canonical amino acid (ncAA) mutagenesis to generate aS with acetylated lysines are valuable. The paper then continues with a range of experiments to characterise the acetylated alpha-synuclein constructs at different positions, with the aim of providing insights into which sites are relevant to disease or their function inside cells. The paper concludes these experiments with the suggestion that inhibiting the Zn2+-dependent histone deacetylase HDAC8 to potentially increase acetylation at lysine 80 may have therapeutic benefit. However, the relevance of most of these experiments is unclear, mainly as the filaments that form from these constructs are different from those observed in human disease (but see below for more details). Moreover, using the recombinantly produced acetylated versions of alpha-synuclein to normalise mass-spectrometry data, the authors themselves report that acetylation of alpha-synuclein does not differ between individuals with Parkinson's disease or healthy controls.

      Strengths:

      The authors report difficulties with chemical synthesis, and then decide to make these constructs using non-canonical amino acid (ncAA) mutagenesis, which seems to work reasonably well (yields vary somewhat). In the Conclusion section, the authors report that they used these recombinant proteins to obtain quantitative insights into the levels of acetylation of lysines in individuals with PD versus healthy controls, for which they find no significant differences. This part of the work is valuable.

      Weaknesses:

      The authors then use circular dichroism to show that aSyn with acK at position 43 has less alpha-helical content. From this result, they deduce that "only this site could potentially perturb aS function in neurotransmitter trafficking", but no experiments on neurotransmitter trafficking were performed.

      Subsequently, they measure the aggregation speed of the variants in seeded aggregation experiments with preformed fibrils (PFFs) from WT aSyn, and conclude that acK at positions 12, 43, and 80 yields slower aggregation. They reach similar conclusions when measuring seeded aggregation in primary cultures. As far as I understand it, the seeding experiments in cells use seeds that are assembled from partially acetylated alpha-synuclein, but that are made of non-acetylated wildtype alpha-synuclein, and the alpha-synuclein that is endogenous in the cells is also non-acetylated (or at least not beyond what happens in these cells at endogenous levels). It is therefore unclear how the cellular seeding experiments relate to the in vitro aggregation assays with (partially) acetylated substrates. Anyway, both aggregation experiments ignore that the structures of aSyn filaments in Parkinson's disease (PD) or multiple system atrophy (MSA) are different from those formed in these experiments, and that, therefore, the observed aggregation kinetics are likely irrelevant for the speed with which disease-relevant filaments form in the brain.

      NMR and FCS experiments show that acK at positions 12 and 43 may reduce binding to vesicles, which then leaves only acK80.

      Finally, the authors describe the cryo-EM structure of mixtures of acK80:WT aSyn filaments, which are predominantly made of WT aSyn, with a previously described structure. Filaments made of only acK80 aSyn have a modified arrangement of this structure, where the now neutral side chain of residue 80 packs inside a hydrophobic pocket. The authors discuss differences between the acK80 structures and those of other structures from in vitro assembled aSyn filaments, none of which are the same as those observed from PD or MSA brains, nor are any attempts made to transfer observations from the in vitro experiments to the structures of disease. The relevance of the cryo-EM structures for human disease, therefore, remains unclear.

      The Conclusion on p.20 mentions an interesting and valuable result: the authors used the acetylated recombinant proteins to determine the extent of acetylation within human protein samples by quantitative liquid chromatography MS (SI, Figures S41-S49). Their conclusion is that "The level of acetylation was variable - no clear trend was observed between healthy control and patients - nor between patients of different diseases (SI, Table S4, Supplementary Data 1)" This result implies that acetylation of aS is not directly related to its pathogenicity, which again adds doubts on the disease-relevance of the results described in the rest of the paper.

    3. Reviewer #2 (Public review):

      Summary:

      Shimogawa et al. studied the effect of lysine acetylation at different sites in the alpha-synuclein (aS) sequence on the protein-membrane affinity, seeding capacity in the test tube and in cells, and on the structure of fibrils, using a range of biophysical methods. They use non-canonical amino acid (ncAA) mutagenesis to prepare aS lysine acetylated variant at different sites.

      Strengths:

      The major strength of this paper is the approach used for the production of site-specific lysine acetylated variants of aS using ncAA mutagenesis, as well as the combination of a range of biophysical methods together with cellular assays and structure biology to decipher the effect of lysine acetylation on aS-membrane binding, seeding propensity, and fibril structure. This approach allowed the author to find that lysine acetylation at positions 12, 43, and 80 led to lower seeding capacity of aS in the test tube and in cells, but only acetylation at lysine 80 did not affect aS-membrane interaction. These results suggest that lysine acetylation at position 80 may be protective against aggregation without perturbing the proposed functional role of aS in synaptic plasticity.

      Weaknesses:

      SDS is not a good membrane model to investigate the effect of lysine acetylation on aS membrane-binding because it is a harsh detergent and solubilizes membranes. Negatively charged vesicles or vesicles made of a mixture of lipids mimicking the lipid composition of synaptic vesicles are more accepted in the field to study aS-membrane interactions. The authors used such vesicles for the FCS experiments, and they could be used for the initial screening of the 12 lysine acetylated variants of aS.

      It would help the reader to have the experimental details (e.g., buffer, protein/lipid concentrations) for the different assays written in the figure legend.

      The authors use an assay consisting of mixing 10% fibrils + 90% monomer to investigate the effect of lysine acetylation on aS. However, the assay only probes fibril elongation and/or secondary processes. The current wording can be misleading, and the term aggregation could be replaced by seeding capacity for clarity. For example, the authors state that lysine acetylation at sites 12, 43, and 80 each inhibits aggregation, but this statement is not supported by the data. Instead, the data show that the acetylation at these sites slows down the fibril elongation and thus decreases the seeding capacity of aS fibrils. In order to state that lysine acetylation has an effect on aS aggregation, fibril formation, the author should use an assay where the de novo formation of fibrils is assessed, such as in the presence of lipid vesicles or under shaking conditions.

      It is not clear from the EM data that the structures of the different lysine acetylated variants are different, unlike what is stated in the text.

    4. Reviewer #3 (Public review):

      Shimogawa et al. describe the generation of acetylated aSyn variants by genetic code expansion to elucidate effects on vesicle binding, aggregation, and seeding effects. The authors compared a semi-synthetic approach to obtain acetylated aSyn variants with genetic code expansion and concluded that the latter was more efficient in generating all 12 variants studied here, despite the low yields for some of them. Selected acetylated variants were used in advanced NMR, FCS, and cryo-EM experiments to elucidate structural and functional changes caused by acetylation of aSyn. Finally, site-specific differences in deacetylation by HDAC 8 were identified.

      The study is of high scientific quality, andthe results are convincingly supported by the experimental data provided. The challenges the authors report regarding semi-synthetic access to aSyn are somewhat surprising, as this protein has been made by a variety of different semi-synthesis strategies in satisfactory yields and without similar problems being reported.

      The role of PTMs such as acetylation in neurodegenerative diseases is of high relevance for the field, and a particular strength of this study is the use of authentic acetylated aSyn instead of acetylation-mimicking mutations. The finding that certain lysine acetylations can slow down aggregation even when present only at 10-25% of total aSyn is exciting and bears some potential for diagnostics and therapeutic intervention.

    5. Author response:

      We thank you for your efforts in reviewing our manuscript.  We sincerely appreciate that the reviewers were all enthusiastic about our comparison of native chemical ligation (NCL) and non-canonical amino acid (ncAA) mutagenesis methods for installing acetyl lysine (AcK) in alpha-synuclein, as well as the wide variety of biochemical experiments enabled by our ncAA approach.  We respond to the critiques specific to each reviewer here.

      Reviewer #1:

      Expressed concern that in vitro studies of effects on membrane binding were not followed up with neurotransmitter trafficking experiments.  While we certainly think that such studies would be interesting, they would presumably require the use of acetylation mimic mutants (Lys-to-Gln mutations), which we would want to validate by comparison to our semi-synthetic proteins with authentic AcK.  Such experiments are planned for a follow-up manuscript, and we will investigate the reviewer’s suggested experiment at that time.

      Reviewer #1 Noted that the method of in vitro seeding really reports on the impact of acetylation on the elongation phase of aggregation.  We will clarify this in our revisions.  They also expressed concern that this was different than the role that acetylation would play in seeding cellular aggregation with pre-acetylated fibrils.  We will also acknowledge and clarify this in our revisions.  Having the monomer population acetylated in cells presents technical challenges that might also be addressed with Gln mutant mimics, and we plan to pursue such experiments in the follow-up manuscript noted above.

      Reviewer #1 Criticized the fact that the pre-formed fibrils used in seeding would not have the same polymorph as PD or MSA fibrils derived from patient material.  They were also critical of how our cryo-EM structure of AcK80 fibrils related to the PD and MSA polymorphs.  Finally, while the reviewer liked the MS experiments used to quantify acetylation levels from patient samples, they felt that our findings then threw the physiological relevance of our structural and biochemical experiments into question.  We believe that all of these critiques can be addressed by clarifying our purpose.  We are not necessarily trying to claim that our AcK80 fold is populated in health or disease, but that by driving Lys80 acetylation, one could push fibrils to adopt this conformation, which is less aggregation-prone.  A similar argument has been made in investigations of alpha-synuclein glycosylation and phosphorylation.  Our results in Figure 9 imply that this could be done with HDAC8 inhibition.  We will revise the manuscript to make these ideas clearer, while being sure to acknowledge the limitations noted by Reviewer #1.

      Reviewer #2:

      Expressed concern over our use of SDS micelles for initial investigation of the 12 AcK variants, rather than the phospholipid vesicles used in later FCS and NMR experiments.  We will note this shortcoming in revisions of our manuscript, but we do not believe that using vesicles instead would change the conclusions of these experiments (that only AcK43 produces an effect, and a modest one at that).

      We will add additional detail to the figure captions, as requested by Reviewer #2.

      Reviewer #2 shared some of the concerns of Reviewer #1 regarding the distinctions of which phase of aggregation we were investigating in our in vitro experiments.  As noted above, we will clarify this language.

      Finally, Reviewer #2 stated that “It is not clear from the EM data that the structures of the different lysine acetylated variants are different.”  We feel that it is quite clear from structures in Figure 8 and the EM density maps in Figure S38 that the AcK80 fold is indeed different.  Although the overall polymorphs are somewhat similar to WT, the position of K80 clearly changes upon acetylation, altering the local fold significantly and the global fold more moderately.

      Reviewer #3:

      Found the results convincing, including the potential therapeutic implications.  The only concern noted was that they found the difficulties in semi-synthesis of AcK-modified alpha-synuclein surprising given that it has been made many times before through NCL.  Indeed, our own laboratory has made alpha-synuclein through NCL, and the yields reported here are in keeping with our own previous results.  However, since NCL did not give higher yields than ncAA methods, and it is significantly easier to scan AcK positions using ncAAs, we felt that ncAAs are the method of choice in this case.  We will clarify this position in the revised manuscript.

      In conclusion, on behalf of all authors, I again thank the reviewers for both their positive and negative observations in helping us to improve our manuscript.  We will revise it to strive for greater clarity as we have noted in this letter.

    1. eLife Assessment

      This study seeks to expand the understanding of insulin and glucose responses in the brain, specifically by implicating a family of protein kinases responsive to insulin. The significance of the study to the field is valuable, given this study is very emblematic of the new field of interoception (Brain-Body physiology). The evidence supporting the conclusions about brain glucose utilization is convincing and is relevant to many age-related diseases, such as Alzheimer's disorder.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Akita B. Jaykumar et al. explored an interesting and relevant hypothesis whether serine/threonine With-No-lysine (K) kinases (WNK)-1, -2, -3, and -4 engage in insulin-dependent glucose transporter-4 (GLUT4) signaling in the murine central nervous system. The authors especially focused on the hippocampus as this brain region exhibits high expression of insulin and GLUT4. Additionally, disrupted glucose metabolism in the hippocampus has been associated with anxiety disorders, while impaired WNK signaling has been linked to hypertension, learning disabilities, psychiatric disorders or Alzheimer's disease. The study took advantage of selective pan-WNK inhibitor WNK 643 as the main tool to manipulate WNK 1-4 activity both in vivo by daily, per-oral drug administration to wild-type mice, and in vitro by treating either adult murine brain synaptosomes, hippocampal slices, primary cortical cultures, and human cell lines (HEK293, SH-SY5Y). Using a battery of standard behavior paradigms such as open field test, elevated plus maze test, and fear conditioning, the authors convincingly demonstrate that the inhibition of WNK1-4 results in behavior changes, especially in enhanced learning and memory of WNK643-treated mice. To shed light on the underlying molecular mechanism, the authors implemented multiple biochemical approaches including immunoprecipitation, glucose-uptake assay, surface biotylination assay, immunoblotting, and immunofluorescence. The data suggest that simultaneous insulin stimulation and WNK1-4 inhibition results in increased glucose uptake and the activity of insulin's downstream effectors, phosphorylated Akt and phosphorylated AS160. Moreover, the authors demonstrate that insulin treatment enhances the physical interaction of the WNK effector OSR1/SPAK with Akt substrate AS160. As a result, combined treatment with insulin and the WNK643 inhibitor synergistically increases the targeting of GLUT4 to the plasma membrane. Collectively, these data strongly support the initial hypothesis that neuronal insulin- and WNK-dependent pathways do interact and engage in cognitive functions.

      In response to our initial comments, the authors mildly revised the manuscript, which did not improve the weaknesses to a sufficient level. Our follow-up comments are labeled under "Revisions 1".

      Strengths:

      The insulin-dependent signaling in the central nervous system is relatively understudied. This explorative study delves into several interesting and clinically relevant possibilities, examining how insulin-dependent signaling and its crosstalk with WNK kinases might affect brain circuits involved in memory formation and/or anxiety. Therefore, these findings might inspire follow-up studies performed in disease models for disorders that exhibit impaired glucose metabolism, deficient memory, or anxiety, such as Diabetes mellitus, Alzheimer's disease, or most of psychiatric disorders.

      The graphical presentation of the figures is of high quality, which helps the reader to obtain a good overview and to easily understand the experimental design, results, and conclusions.

      The behavioral studies are well conducted and provide valuable insights into the role of WNK kinases in glucose metabolism and their effect on learning and memory. Additionally, the authors evaluate the levels of basal and induced anxiety in Figures 1 and 2, enhancing our understanding of how WNK signaling might engage in cognitive function and anxiety-like behavior, particularly in the context of altered glucose metabolism.

      The data presented in Figures 3 and 4 are notably valuable and robust. The authors effectively utilize a variety of in vivo and in vitro models, combining different treatments in a clear manner. The experimental design is well-controlled, efficiently communicated, and well-executed, providing the reader with clear objectives and conclusions. Overall, these data represent particularly solid and reproducible evidence on the enhanced glucose uptake, GLUT4 targeting, and downstream effectors' activation upon insulin and WNK/OSR1 signaling crosstalk.

      Weaknesses:

      (1) The study used a WNK643 inhibitor as the only tool to manipulate WNK1-4 activity. This inhibitor seems selective; however, it has been reported that it exhibits different efficiency in inhibiting the individual WNK kinases among each other (e.g. PMID: 31017050, PMID: 36712947). Additionally, the authors do not analyze nor report the expression profiles or activity levels of WNK1, WNK2, WNK3, and WNK4 within the relevant brain regions (i.e. hippocampus, cortex, amygdala). Combined, these weaknesses raise concerns about the direct involvement of WNK kinases within the selected brain regions and behavior circuits. It would be beneficial if the authors provided gene profiling for WNK1, 2, 3, and -4 (e.g. using Allen brain atlas). To confirm the observations, the authors should either add results from using other WNK inhibitors or, preferentially, analyze knock-down or knock-out animals/tissue targeting the single kinases.

      Revisions 1: The authors added Fig. S1A during the revisions to show expression of Wnt1-4. While the expression data from humans is interesting, the experimental part of the study is performed in mice. It would be more informative for the authors to add expression profiles from mice or overview the expression pattern with suitable references in the introduction to address this point. The authors did not add data from knock down or knockout tissue targeting the single kinases.

      (2) The authors do not report any data on whether the global inhibition of WNKs affects insulin levels as such. Since the authors demonstrate the synergistic effect of simultaneous insulin treatment and WNK1-4 inhibition, such data are missing.

      Revisions 1: The authors added Fig. S5A to address this point. It is appreciated that authors performed the needed experiment. Unfortunately, no significant change was found, therefore, the authors still cannot conclude that they demonstrate a synergistic effect of simultaneous insulin treatment and WNT1-4 inhibition. It is a missed opportunity that the authors did not measure insulin in the CSF or tissue lysate to support the data.

      (3) The study discovered that the Sortilin receptor binds to OSR1, leading the authors to speculate that Sortilin may be involved in the insulin-dependent GLUT4 surface trafficking. The authors conclude in the result section that "WNK/OSR1/SPAK influences insulin-sensitive GLUT4 trafficking by balancing GLUT4 sequestration in the TGN via regulation of Sortilin with GLUT4 release from these vesicles upon insulin stimulation via regulation of AS160." However, the authors do not provide any evidence supporting Sortilin's involvement in such regulation, thus, this conclusion should be removed from the section. Accordingly, the first paragraph of the discussion should be also rephrased or removed.

      Revisions 1: The authors added Fig. 5M-N to address this point. The new experiment is appreciated. However, the authors still do not show that sortilin is involved in insulin or WNK-dependent GLUT4 trafficking in their set up since the authors do not demonstrate any changes in GLUT4 sorting or binding. The conclusions should therefore be rephrased or included purely in the discussion. Moreover, the discussion was not adjusted either, leading to over interpretation based on the available data.

      (4) The background relevant to Figure 5, as well as the results and conclusions presented in Figure 5 are quite challenging to follow due to the lack of a clear introduction to the signaling pathways. Consequently, understanding the conclusions drawn from the data is also difficult. It would be beneficial if the authors addressed this issue with either reformulations or additional sections in the introduction. Furthermore, the pulldown experiments in this figure lack some of the necessary controls.

      Revisions 1: The Authors insufficiently addressed this point during the revisions and did not rewrite the introduction as suggested.

      (5) The authors lack proper independent loading controls (e.g. GAPDH levels) in their immunoblots throughout the paper, and thus their quantifications lack this important normalization step. The authors also did not add knock-out or knock-down controls in their co-IPs. This is disappointing since these improvements were central and suggested during the revision process.

      (6) The schemes that represent only hypotheses (Fig. 1K, 4A) are unnecessary and confusing and thus should be omitted or placed at the end of each figure if the conclusions align.

      (7) Low-quality images, such as Fig. 5H should be replaced with high-resolution photos, moved to the supplementary, or omitted.

    3. Reviewer #2 (Public review):

      This study by Jaykumar and colleagues seeks to expand the field's appreciation of insulin responses in the brain, specifically by implicating WNK kinase function in various neuronal responses, ranging from behavioral / memory changes to GLUT4 trafficking to the cell surface with subsequent glucose uptake. This revised study is now comprehensive and presents a logical and reasonably documented cascade of molecular interactions responsible in part for GLUT4 trafficking under the regulation of WKK and insulin. Additional data allow the authors to dissect a plausible WNK/OSR1/SPAK-sortilin pathway for the modulation of GLUT4 trafficking, in part by capitalizing on a overlay of various techniques and systems. The data - much of it in vivo or ex vivo - showing a potential role for WNK function in brain glucose utilization remains a compelling part of the story, with the dissection of the signaling cascade and a potential role for sortilin in mediating WNK function via effects on GLUT4 cellular localization now more convincing.

      Initially, the group shows that oral WNK463 treatment - an inhibitor of WNKs broadly - in mice augments a number of memory readouts. These findings fit within the context of the overall story the authors present: that WNK function is critical to brain glucose utilization, which impacts learning. Multiple approaches are used to show that WNK463 treatment, i.e. inhibition of WNKs, increases glucose uptake, including labeled 2-deoxyglucose uptake in vivo in the brain and in isolated synaptosome, and uptake in ex vivo hippocampal slices. These findings are solid and consistent. With the exception of some relatively minor comments regarding the data presentation made to the authors and now fully addressed, the findings showing that WNK463 treatment increases GLUT4-mediated glucose uptake and surface localization of GLUT4 are reasonable, with the hippocampal slice data being particularly relevant.

      While the details of the WNK signaling cascade is dense, in the revised application one clearly appreciates the molecular interrogation and interactions the group is dissecting, supported by the use of multiple models. With the additional findings, these systems and the data now reinforce each other, presenting a strongly documented overall story.

      A limitation of the study with the initial submission was the authors' reliance upon a single pharmacological tool (WNK463) to inhibit WNK kinases. WNK463 apparently has substantial specificity for WNKs and WNK463 treatment lessened OSR1 phosphorylation (a WNK substrate). Nevertheless, the cohesiveness of the findings in terms of the broader pathway engagement (GLUT4 trafficking, glucose uptake) is consistent with the author's proposed mechanisms and conclusions. The authors have additionally addressed this concern in the revised manuscript with more information supporting the specificity of WNK463 as well as the multiple approaches to confirm the effect of WNK463 on the WNK signaling pathway of interest.

      The final few paragraphs of the discussion that weave the author's findings into the field more broadly, including Sortilin function and neurological disorders, are appreciated. Additional clarity in the Methods section is also helpful.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The major issues are the need for more information concerning WNK expression in brain regions and additional confirmation of the role of sortilin on WNT signaling. There is a lack of sufficient evidence supporting sortilin's involvement in insulin- and WNK-dependent GLUT4 regulation. The recommendation is to examine what WNK kinase is selectively expressed in the region of interest and then explore its engagement with the sortilin and GLUT4 pathways. Further identification of components of the WNK/OSr1/SPAK-sortilin pathway that regulate GLUT4 in brain slices or primary neurons will be helpful in confirming the results. The use of knock-down or knock-out models would be helpful to explore the direct interaction of the pathways. Immortalized and primary cells also represent useful models.

      Together our results indicate that one or more WNK family members regulate insulin sensitivity.  As all WNK family members are expressed in relevant brain regions, whether the results are due to actions of a single WNK family member or more likely due to their combined impact will be an important question to ask in the future.  

      There are multiple publications describing how sortilin is involved in insulin-dependent Glut4 trafficking; thus, we did not further address that issue.  We have added data on an additional action of WNK463 which indicates that it can block association of OSR1 with sortilin.  While these results do not delve further into how sortilin works, they support the conclusion that WNK/OSR1/SPAK can influence insulin-dependent glucose transport via distinct cellular events (AS160, sortilin, Akt) which are WNK463 sensitive.  

      Altogether we added 12 new panels of data from new and previously performed experiments and we modified 3 existing subfigures in response to comments.

      Weaknesses:

      (1) The study used a WNK643 inhibitor as the only tool to manipulate WNK1-4 activity. This inhibitor seems selective; however, it has been reported that it exhibits different efficiency in inhibiting the individual WNK kinases among each other (e.g. PMID: 31017050, PMID: 36712947). Additionally, the authors do not analyze nor report the expression profiles or activity levels of WNK1, WNK2, WNK3, and WNK4 within the relevant brain regions (i.e. hippocampus, cortex, amygdala). Combined, these weaknesses raise concerns about the direct involvement of WNK kinases within the selected brain regions and behavior circuits. It would be beneficial if the authors provided gene profiling for WNK1, 2, 3, and -4 (e.g. using Allen brain atlas). To confirm the observations, the authors should either add results from using other WNK inhibitors or, preferentially, analyze knock-down or knock-out animals/tissue targeting the single kinases.

      Thank you for the excellent suggestion to include mRNA data for the four WNKs. We have included a supplementary figure showing expression of WNK1-4 mRNAs in prefrontal cortex and the hippocampus curated from the Allen Brain Atlas. As per the Allen Brain Atlas, all four WNKs are detected in these regions with WNK4 mRNA the most highly expressed followed by WNK2, WNK3 and then WNK1 (Figure S1A).   

      With regard to the use of WNK463, we continue to use WNK463 because we have examined its actions in cell lines that only express WNK1, e.g. A549 (Haman Center lung cancer RNA-seq data), and in A549 with WNK1 deleted using CRISPR in which we saw no effects of WNK463 on several assays we use for WNK1 including suppression of autophagy.  WNK463 was reported in the literature to inhibit only the four WNKs out of more than 400 kinases tested, indicating more selectivity than many small molecules used to target other enzymes.  In other cell lines, we also use WNK1 knockdown which replicates the effect of WNK463 (Figure S7A-D). However, in SHSY5Y cells, WNK1 knockdown did not replicate the effect of WNK463 on pAKT levels (Figure S7E-F), suggesting a cooperativity among other WNK family members in neuronal cells. This makes WNK463 an ideal tool to test our hypotheses in this study as it targets all 4 WNKs (WNK1-4).  

      (2) The authors do not report any data on whether the global inhibition of WNKs affects insulin levels. Since the authors wish to demonstrate the synergistic effect of simultaneous insulin treatment and WNK1-4 inhibition, such data are missing.

      Thank you for this comment. To obtain this information, we treated C57BL/6J mice with WNK463 for 3 days once daily at a dose of 6 mg/kg and then fasted overnight. Plasma insulin levels were measured. Results showed that the plasma insulin levels trended upwards in the WNK463 treated animals compared to the vehicle treated groups but failed to reach any statistical significance. We have now included these data in supplementary figure S5A.

      The study discovered that the Sortilin receptor binds to OSR1, leading the authors to speculate that Sortilin may be involved in the insulin-dependent GLUT4 surface trafficking. However, the authors do not provide any evidence supporting Sortilin's involvement in insulin- or WNKdependent GLUT4 trafficking. Thus, this conclusion should be qualified, rephrased, or additional data included.

      Work from several groups have shown that sortilin is involved in insulin-dependent GLUT4 trafficking, for example [9-11,135-139] as we noted in the manuscript. We now show that WNK463 blocks co-immunoprecipitation of Flag-tagged sortilin with endogenous OSR1 in HEK293T cells. This result supports our model for WNK/OSR1/SPAK- insulin mediated regulation of sortilin.  We included these data in figures 5M, 5N.

      Minor issues:

      (1) The method and result sections lack information regarding the gender and age of mice used in the behavioral experiments. This information should be added.

      Thank you for pointing this out. We apologize for the omission. The requested information has now been added in the methods section.

      (2) The authors present an analysis of relative protein levels in Figure 1B and Figure 4B, however, the original immunoblots (?) are not included in the study. These data should be added to provide complete and transparent evidence for the analysis.

      Thank you for this request. The blots have now been included in the supplementary figure S2A and Figure 4B, respectively.  

      (3) The basis for Figure 3A needs to be explained and supported with suitable references either in the background or in the result section.

      Thank you for pointing this out. Figure 3A has been moved to Figure 3H as it represents the model summary of the data presented in Figure 3. Other figure numbers have been changed accordingly.  This figure 3A (now 3H) and the model diagram of Figure 5 (now Figure 5O) are now cited in the Discussion, where the results are considered in detail.      

      (4) Figure 4E should be labeled as 'Primary cortical neurons' for clarity, as the major focus is on the hippocampus. To increase consistency, the authors should consider performing the same experiment on hippocampal cultures or explaining using cortical neurons.

      Thank you for the suggestion. Figure 4E (now 4F) has been labelled as Primary cortical neurons for clarity. The major focus of this study is to understand the regulation of WNKmediated regulation of insulin signaling in the areas of the brain that are insulin sensitive such as the hippocampus and the prefrontal cortex. Therefore, we included cortical neurons to test this hypothesis.  

      (5) Figure 5B: The use of whole brain extracts is inconsistent with the rest of the study, especially considering the indication of differing insulin activity in selected brain regions. The authors should explain why they could not use only hippocampal tissue.

      In this manuscript, we are trying to test our hypothesis in insulin-sensitive neuronal cells which includes, but not limited to, the hippocampus. Figure 5B used whole brain extracts, which contain brain regions that are insulin-sensitive as well as insulin-insensitive regions, to show the association between OSR1 and AS160. However, this observation was replicated in the insulin-sensitive SH-SY5Y cell model suggesting that association of OSR1 and AS160 is modulated in the presence of insulin as shown in Figure 5B, 5C. We added data from SH-SY5Y cells showing effects of WNK463. These data support the concept that this is an interaction that is modulated by WNKs and will occur as long as both OSR1/SPAK and AS160 are expressed.

      (6) Figure 5B-C - Knock-out or knock-down condition should be included in the co-IP experiment. This is especially straightforward to generate in the SH-SY5Y cells. Moreover, these figures lack loading controls.

      If we understand correctly, the issue with regard to including knockdown conditions stems from the issues raised regarding specificity of the antibody which we have addressed in point 10 below. We have now included input blots for both AS160 and OSR1 which serve as the loading control for the IP experiment in figure 5B and 5C.

      (7) Figure 5C-D - A condition with WNK463 inhibition alone is missing. This condition is necessary for evaluating the effects of WNK643 inhibition with and without insulin stimulation.

      Thank you for this observation. We have now added the data for that condition.  The aim of this experiment in Figure 5C (now 5B and 5C) is to show that insulin is important to facilitate interaction between OSR1 and AS160 in differentiated SHSY5Y cells and the effect of WNK463 to diminish this insulin-dependent interaction. With only WNK463, there was minimal interaction between AS160 and OSR1 as now shown in Figure 5B, 5C.

      (8) Figure 5G - This figure shows the overexpression of plasmids in HEK cells, however, it lacks samples that overexpress the plasmid individually (single expression). Such data should be added, especially when the addition of the blocking peptide does not fully disable the interaction between AS160 and SPAK. Additionally, this figure also lacks a loading control, which is essential for validating the results.

      Thank you for this comment. Figure 5G (now Figure 5F, 5G) is an in vitro IP in which we have mixed a purified Flag-SPAK fragment residues 50-545 with a lysate from cells expressing Myc-AS160 (residues 193-446). This is essentially an in vitro IP; because it is not an IP experiment from cell lysates where we overexpressed these plasmids which would require a loading control. The lysates were divided in half and one half did not receive the blocking peptide while the other half did, creating a control. From our experience, this blocking peptide does not completely block interactions between SPAK/OSR1 and NKCC2 fragments which are well-characterized interacting partners [a]. The reason for the partial block in interactions could also be attributed to the multivalent nature of interaction between these proteins. This confusion in our methodology used has been noted and we have tried to explain it with more clarity in the methods, results and the figure legend section. Our Commun. Biol. paper [134] that describes this assay and uses it extensively is now available online.

      (a) Piechotta K, Lu J, Delpire E. Cation chloride cotransporters interact with the stressrelated kinases Ste20-related proline-alanine-rich kinase (SPAK) and oxidative stress response 1 (OSR1) J Biol Chem. 2002;277:50812–50819. doi: 10.1074/jbc.M208108200.

      (9) Figure 5J, L - These figures are missing negative controls. The authors should add Sortilin knock-down or knock-out conditions for the immunoprecipitation experiments. Also, the figures lack loading controls. Moreover, the labeling "Control" should be specified, as it is unclear what this condition represents.

      Thank you for noting the lack of clarity in the controls provided. Controls in Figure 5J and 5L refer to IgG Control which serves as the negative control in this case. This has now been specified in the figures (and added Figures 5M and 5N, as well). The issue with OSR1 and sortilin antibody specificity and cross-reaction has been addressed in point 10.

      (10) Figure 5I - The fluorescent signals for the individual channels of OSR1 and Sortilin appear identical (even within the background signal). This raises concerns about potential antibody cross-reaction. One potential solution would be to include additional stainings with different antibodies and perform staining of each protein alone to ensure the specificity of the colocalization.

      Thank you for pointing this out and giving us an opportunity to provide better images that will address the issues raised regarding antibody cross-reaction and antibody specificity. We realize that the images that we originally provided appeared to show all the puncta colocalize which could give rise to the concern about potential antibody cross-reaction. We have replaced them with more appropriate representative images that clearly show some selected regions of common staining as well as regions where there is no overlap.  

      (11) Figures 5D, 5F, 5H, 5L, 5M: These analyses should be first normalized to the loading control such as GAPDH.

      In Figure 5F (now 5E), the analysis has been normalized to the total AS160 protein levels. Because we are reporting changes in pAS160 protein, normalizing it to the total AS160 gives a better idea about the changes in the phosphorylated AS160 form compared to the whole protein and this is more appropriate compared to other loading controls such as GAPDH.  

      In Figure 5H (now Figure 5G), the analysis is an in vitro IP assay using purified protein fragments. Therefore, using GAPDH as a control is not applicable in this case. Please refer to our response to comment 8 for details.

      In Figures 5L, 5M and 5D (now 5K, 5L, 5C) shown, the IP proteins have been normalized to the input protein levels serving as a loading control for the IP experiment. 

      (12) Figure 5K: The significance/meaning of the red star is unclear. It should be explained in the figure legend.

      Thank you for the opportunity to enhance the readability of our manuscript. The meaning of red star denotes the condition in the yeast two-hybrid assay which shows the binding of CCT of OSR1 with C-terminus of sortilin. This has now been clarified in the figure legend.

      (13) Differences in WNK643 dosage and administration periods can affect the results. There is a lack of explanation with regard to the divergent WNK643 treatments of mice across different behavior conditions of fear conditioning, the novel object test, and the elevated plus maze test. This should be considered.

      Thank you for pointing out that the explanation regarding the WNK463 dosage and times are unclear. WNK463 was dosed 3 days before the start of the behavior experiment daily at a dose of 6 mg/kg and continued throughout the test protocol. This is the same protocol used for all experiments.  The text describing the protocol has been reworded with more clarity on dosage and times in methods and result section.

    1. eLife Assessment

      This study provides valuable data on the role of Hsd17b7, a gene involved in cholesterol biosynthesis, as a potential regulator of mechanosensory hair cell function. The authors used both zebrafish and the HEI cell line to examine the effects of deletion of Hsd17b7 on hair cell function and survival. While the results do show a reduction in hair cells in the lateral line neuromasts of Hsd17b7 mutant fish, the reduction was limited. The findings are considered incomplete, with additional experiments required to confirm the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies HSD17B7 as a cholesterol biosynthesis gene enriched in sensory hair cells, with demonstrated importance for auditory behavior and potential involvement in mechanotransduction. Using zebrafish knockdown and rescue experiments, the authors show that loss of hsd17b7 reduces cholesterol levels and impairs hearing behavior. They also report a heterozygous nonsense variant in a patient with hearing loss. The gene mutation has a complex and somewhat inconsistent phenotype, appearing to mislocalize, reduce mRNA and protein levels, and alter cholesterol distribution, supporting HSD17B7 as a potential deafness gene.

      While the study presents an interesting candidate and highlights an underexplored role for cholesterol in hair cell function, several important claims are insufficiently supported, and the mechanistic interpretations remain somewhat murky.

      Strengths:

      (1) HSD17B7 is a new candidate deafness gene with plausible biological relevance.

      (2) Cross-species RNAseq convincingly shows hair-cell enrichment.

      (3) Lipid metabolism, particularly cholesterol homeostasis, is an emerging area of interest in auditory function.

      (4) The connection between cholesterol levels and MET is potentially impactful and, if substantiated, would represent a significant advance.

      Weaknesses:

      (1) The pathogenic mechanism of the E182STOP variant is unclear: The mutant protein presumably does not affect WT protein localization, arguing against a dominant-negative effect. Yet, overexpression of HSD17B7-E182* alone causes toxicity in zebrafish, and it binds and mislocalizes cholesterol in HEI-OC1 cells, suggesting some gain-of-function or toxic effect. In addition, the mRNA of the variant has a low expression level, suggesting nonsense-mediated decay. This complexity and inconsistency need clearer explanation.

      (2) The link to human deafness is based on a single heterozygous patient with no syndromic features. Given that nearly all known cholesterol metabolism disorders are syndromic, this raises concerns about causality or specificity. The term "novel deafness gene" is premature without additional cases or segregation data.

      (3) The localization of HSD17B7 should be clarified better: In HEI-OC1 cells, HSD17B7 localizes to the ER, as expected. In mouse hair cells, the staining pattern is cytosolic and almost perfectly overlaps with the hair cell marker used, Myo7a. This needs to be discussed. Without KO tissue, HSD17B7 antibody specificity remains uncertain.

    3. Reviewer #2 (Public review):

      A summary of what the authors were trying to achieve.

      The authors aim to determine whether the gene Hsb17b7 is essential for hair cell function and, if so, to elucidate the underlying mechanism, specifically the HSB17B7 metabolic role in cholesterol biogenesis. They use animal, tissue, or data from zebrafish, mouse, and human patients.

      Strengths:

      (1) This is the first study of Hsb17b7 in the zebrafish (a previous report identified this gene as a hair cell marker in the mouse utricle).

      (2) The authors demonstrate that Hsb17b7 is expressed in hair cells of zebrafish and the mouse cochlea.

      (3) In zebrafish larvae, a likely KO of the Hsb17b7 gene causes a mild phenotype in an acoustic/vibrational assay, which also involves a motor response.

      (4) In zebrafish larvae, a likely KO of the Hsb17b7 gene causes a mild reduction in lateral line neuromast hair cell number and a mild decrease in the overall mechanotransduction activity of hair cells, assayed with a fluorescent dye entering the mechanotransduction channels.

      (5) When HSB17B7 is overexpressed in a cell line, it goes to the ER, and an increase in Cholesterol cytoplasmic puncta is detected. Instead, when a truncated version of HSB17B7 is overexpressed, HSB17B7 forms aggregates that co-localize with cholesterol.

      (6) It seems that the level of cholesterol in crista and neuromast hair cells decreases when Hsb17b7 is defective (but see comment below).

      Weakness:

      (1) The statement that HSD17B7 is "highly" expressed in sensory hair cells in mice and zebrafish seems incorrect for zebrafish:

      (a) The data do not support the notion that HSB17B7 is "highly expressed" in zebrafish. Compared to other genes (TMC1, TMIE, and others), the HSB17B7 level of expression in neuromast hair cells is low (Figure 1F), and by extension (Figure 1C), also in all hair cells. This interpretation is in line with the weak detection of an mRNA signal by ISH (Figure 1G I"). On this note, the staining reported in I" does not seem to label the cytoplasm of neuromast hair cells. An antisense probe control, along with a positive control (such as TMC1 or another), is necessary to interpret the ISH signal in the neuromast.

      (b) However, this is correct for mouse cochlear hair cells, based on single-cell RNA-seq published databases and immunostaining performed in the study. However, the specificity of the anti-HSD17B7 antibody used in the study (in immunostaining and western blot) is not demonstrated. Additionally, it stains some supporting cells or nerve terminals. Was that expression expected?

      (2) A previous report showed that HSD17B7 is expressed in mouse vestibular hair cells by single-cell RNAseq and immunostaining in mice, but it is not cited:

      Spatiotemporal dynamics of inner ear sensory and non-sensory cells revealed by single-cell transcriptomics.

      Jan TA, Eltawil Y, Ling AH, Chen L, Ellwanger DC, Heller S, Cheng AG.

      Cell Rep. 2021 Jul 13;36(2):109358. doi: 10.1016/j.celrep.2021.109358.

      (3) Overexpressed HSD17B7-EGFP C-terminal fusion in zebrafish hair cells shows a punctiform signal in the soma but apparently does not stain the hair bundles. One limitation is the consequence of the C-terminal EGFP fusion to HSD17B7 on its function, which is not discussed.

      (4) A mutant Zebrafish CRISPR was generated, leading to a truncation after the first 96 aa out of the 340 aa total. It is unclear why the gene editing was not done closer to the ATG. This allele may conserve some function, which is not discussed.

      (5) The hsd17b7 mutant allele has a slightly reduced number of genetically labeled hair cells (quantified as a 16% reduction, estimated at 1-2 HC of the 9 HC present per neuromast). On a note, it is unclear what criteria were used to select HC in the picture. Some Brn3C:mGFP positive cells are apparently not included in the quantifications (Figure 2F, Figure 5A).

      (6) The authors used FM4-64 staining to evaluate the hair cell mechanotransduction activity indirectly. They found a 40% reduction in labeling intensity in the HCs of the lateral line neuromast. Because the reduction of hair cell number (16%) is inferior to the reduction of FM4-64 staining, the authors argue that it indicates that the defect is primarily affecting the mechanotransduction function rather than the number of HCs. This argument is insufficient. Indeed, a scenario could be that some HC cells died and have been eliminated, while others are also engaged in this path and no longer perform the MET function. The numbers would then match. If single-cell staining can be resolved, one could determine the FM4-64 intensity per cell. It would also be informative to evaluate the potential occurrence of cell death in this mutant. On another note, the current quantification of the FM4-64 fluorescence intensity and its normalization are not described in the methods. More importantly, an independent and more direct experimental assay is needed to confirm this point. For example, using a GCaMP6-T2A-RFP allele for Ca2+ imaging and signal normalization.

      (7) The authors used an acoustic startle response to elicit a behavioral response from the larvae and evaluate the "auditory response". They found a significative decrease in the response (movement trajectory, swimming velocity, distance) in the hsd17b7 mutant. The authors conclude that this gene is crucial for the "auditory function in zebrafish".

      This is an overstatement:

      (a) First, this test is adequate as a screening tool to identify animals that have lost completely the behavioral response to this acoustic and vibrational stimulation, which also involves a motor response. However, additional tests are required to confirm an auditory origin of the defect, such as Auditory Evoked Potential recordings, or for the vestibular function, the Vestibulo-Ocular Reflex.

      (b) Secondly, the behavioral defects observed in the mutant compared to the control are significantly different, but the differences are slight, contained within the Standard Deviation (20% for velocity, 25% for distance). To this point, the Figure 2 B and C plots are misleading because their y-axis do not start at 0.

      (8) Overexpression of HSD17B7 in cell line HEI-OC1 apparently "significantly increases" the intensity of cholesterol-related signal using a genetically encoded fluorescent sensor (D4H-mCherry). However, the description of this quantification (per cell or per surface area) and the normalization of the fluorescent signal are not provided.

      (9) When this experiment is conducted in vivo in zebrafish, a reduction in the "DH4 relative intensity" is detected (same issue with the absence of a detailed method description). However, as the difference is smaller than the standard deviation, this raises questions about the biological relevance of this result.

      (10) The authors identified a deaf child as a carrier of a nonsense mutation in HSB17B7, which is predicted to terminate the HSB17B7 protein before the transmembrane domain. However, as no genetic linkage is possible, the causality is not demonstrated.

      (11) Previous results obtained from mouse HSD17B7-KO (citation below) are not described in sufficient detail. This is critical because, in this paper, the mouse loss-of-function of HSD17B7 is embryonically lethal, whereas no apparent phenotype was reported in heterozygotes, which are viable and fertile. Therefore, it seems unlikely that heterozygous mice exhibit hearing loss or vestibular defects; however, it would be essential to verify this to support the notion that the truncated allele found in one patient is causal.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T, Pakarinen P, Sainio K, Poutanen M.<br /> Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      (12) The authors used this truncated protein in their startle response and FM4-64 assays. First, they show that contrary to the WT version, this truncated form cannot rescue their phenotypes when overexpressed. Secondly, they tested whether this truncated protein could recapitulate the startle reflex and FM4-64 phenotypes of the mutant allele. At the homozygous level (not mentioned by the way), it can apparently do so to a lesser degree than the previous mutant. Again, the differences are within the Standard Deviation of the averages. The authors conclude that this mutation found in humans has a "negative effect" on hearing, which is again not supported by the data.

      (13) The authors looked at the distribution of the HSB17B7 in a cell line. The WT version goes to the ER, while the truncated one forms aggregates. An interesting experiment consisted of co-expressing both constructs (Figure S6) to see whether the truncated version would mislocalize the WT version, which could be a mechanism for a dominant phenotype. However, this is not the case.

      (14) Through mass spectrometry of HSB17B7 proteins in the cell line, they identified a protein involved in ER retention, RER1. By biochemistry and in a cell line, they show that truncated HSB17B7 prevents the interaction with RER1, which would explain the subcellular localization.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T, Pakarinen P, Sainio K, Poutanen M.<br /> Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      (15) Information and specificity validation of the HSB17B7 antibody are not presented. It seems that it is the same used on mice by IF and on zebrafish by Western. If so, the antibody could be used on zebrafish by IF to localize the endogenous protein (not overexpression as done here). Secondly, the specificity of the antibody should be verified on the mutant allele. That would bring confidence that the staining on the mouse is likely specific.

    1. eLife Assessment

      Zebra finches are a prominent model system for vocal learning and auditory system function, yet little is known about the functional development of the auditory system. Here, the authors convincingly show that newly hatched zebra finches lack detectable auditory brainstem responses and that auditory neural signals emerge only days after hatching, challenging influential claims of prenatal acoustic communication in altricial birds. This important work clarifies the developmental timeline for auditory communication and highlights the value of neuroscientific methods for validating and complementing behavioral ecological studies of animal perception.

    2. Reviewer #1 (Public review):

      This work by Antonnen et al. was triggered by claims of auditory-mediated effects on altricial avian embryos, which were published without any direct evidence that the relevant parental vocalizations were actually heard. I agree with Anttonen et al. that, based on the available evidence about avian auditory development, those claims are highly speculative and therefore necessitate more direct experimental verification.

      Attonen et al. have embarked on a comprehensive series of experiments to:

      (1) Better characterize acoustically the relevant parental vocalizations (heat whistles; in a separate preprint, not reviewed here)

      (2) Characterize the auditory sensitivity of zebra finches at various stages of their posthatching development. Despite the long-standing importance of the zebra finch as a songbird model in neuroethology of learned vocalizations, the auditory development of the species has not been studied so far.

      (3) Explore an alternative hypothesis of how the parental vocalizations might be perceived.

      The principal method used here is the non-invasive recording of ABR (auditory brainstem response), a standard neurophysiological method in auditory research. The click-evoked ABR provides a quick and objective assessment of basic hearing sensitivity that does not require animal training. Weaknesses of the technique include its limited frequency specificity and low signal-to-noise ratio. The authors are experienced with ABR measurements and well aware of those issues. ABR responses in zebra finches are shown to gradually appear during the first week posthatching and to mature in subsequent weeks, consistent with the auditory development in other altricial bird species studied previously. When matching the acoustic properties of parental heat whistles and auditory sensitivities, hearing of the parental heat whistles by zebra finch hatchlings was convincingly excluded. Although not directly measured, this also convincingly extrapolates to zebra finch embryos. Finally, the authors tested the hypothesis that parental heat whistles could induce perceptible vibrations of the egg and thus stimulate the embryo via a different modality. The method used here was laser doppler vibrometry, an appropriate, state-of-the-art technique that the authors also have proven experience with. The induced vibrations were shown to be several orders of magnitude below known vibrotactile sensitivities in mammals and birds. Thus, although zebra finch vibrotactile thresholds were not obtained directly, the hypothesis of vibrotactile perception of parental heat whistles by zebra finch embryos could also be rejected convincingly.

      In summary, even when considering some weaknesses of the techniques (which the authors are aware of), the conclusions of the paper are well supported: Auditory and/or vibration perception of parental heat whistles can be excluded as an explanation for previous reports of developmental programming for high ambient temperatures. As a constructive suggestion towards resolving the apparent paradox, the authors recommend repeating some of the crucial, previous playback experiments at lower sound levels that better match the natural parental vocalizations.

    3. Reviewer #2 (Public review):

      This study by Anttonen, Christensen-Dalsgaard, and Elemans describes the development of hearing thresholds in an altricial songbird species, the zebra finch. The results are very clear and along what might have been expected for altricial birds: at hatch (2 days post-hatch), the chicks are functionally deaf. Auditory evoked activity in the form of auditory brainstem responses (ABR) can start to be detected at 4 days post-hatch, but only at very loud sound levels. The study also shows that ABR response matures rapidly and reaches adult-like properties around 25 days post-hatch. The functional development of the auditory system is also frequency dependent, with a low-to-high frequency time course. All experiments are very well performed. The careful study throughout development and with the use of multiple time-points early in development is important to further ensure that the negative results found right after hatching are not the result of the experimental manipulation. The results themselves could be classified as somewhat descriptive, but, as the authors point out, they are particularly relevant and timely. Since 2016, there have been a series of studies published in high-profile journals that have presumably shown the importance of prenatal acoustic communication in altricial birds, mostly in zebra finches. This early acoustic communication would serve various adaptive functions. Although acoustic communication between embryos in the egg and parents has been shown in precocial birds (and crocodiles), finding an important function for prenatal communication in altricial birds came as a surprise. Unfortunately, none of those studies performed a careful assessment of the chicks' hearing abilities. This is done here, and the results are clear: zebra finches at 2 and 6 days post-hatch are functionally deaf. Since it is highly improbable that the hearing in the egg is more developed than at birth, one can only conclude that zebra finches in the egg (or at birth) cannot hear the heat whistles. The paper also ruled out the detection on egg vibrations as an alternative path. The prior literature will have to be corrected, or further studies conducted to solve the discrepancies. For this purpose, the "companion" paper on bioRxiv that studies the bioacoustical properties of heat calls from the same group will be particularly useful. Researchers from different groups will be able to precisely compare their stimuli.

      Beyond the quality of the experiments, I also found that the paper was very well written. The introduction was particularly clear and complete (yet concise).

      Weaknesses:

      My only minor criticism is that the authors do not discuss potential differences between behavioral audiograms and ABRs. Optimally, one would need to repeat the work of Okanoya and Dooling with your setup and using the same calibration. The ~20dB difference might be real, or it might be due to SPL measured with different instruments, at different distances, etc. Either way, you could add a sentence in the discussion that states that even with the 20 dB difference in audiogram heat whistles would not be detected during the early days post-hatch. But adding a (novel) behavioral assay in young birds could further resolve the issue.

      More Minor Points:

      (1) As mentioned in the main text, the duration of pips (from pips to bursts) affects the effective bandwidth of the stimulus. I believe that the authors could give an estimate of this effective bandwidth, given what is known from bird auditory filters. I think that this estimate could be useful to compare to the effective bandwidth of the heat-call, which can now also be estimated.

      (2) Figure 5b. Label the green and pink areas as song and heat-call spectrum. Also note that in the legend the authors say: "Green and red areas display the frequency windows related to the best hearing sensitivity of zebra finches and to heat calls, respectively". I don't think this is what they meant. I agree that 1-4 kHz is the best frequency sensitivity of zebra finches, but they probably meant green == "song frequency spectrum" and pink == "heat call spectrum". In either case, the figure and the legend need clarification.

      (3) Figure 5c. Here also, I would change the song and heat-call labels to "song spectrum", "heat call spectrum". The authors would not want readers to think that they used song and heat calls in these experiments (maybe next time?). For the same reason, maybe in 5a you could add a cartoon of the oscillogram of a frequency sweep next to your speaker.

      (4) Methods. In the description of the stimulus, the authors describe "5ms long tone bursts", but these are the tone pips in the main part of the manuscript. Use the same terms.

    4. Reviewer #3 (Public review):

      Summary

      Following recent findings that exposure to natural sounds and anthropogenic noise before hatching affects development and fitness in an altricial songbird, this study attempts to estimate the hearing capacities of zebra finch nestlings and the perception of high frequencies in that species. It also tries to estimate whether airborne sound can make zebra finch eggs vibrate, although this is not relevant to the question.

      Strength

      That prenatal sounds can affect the development of altricial birds clearly challenges the long-held assumption that altricial avian embryos cannot hear. However, there is currently no data to support that expectation. Investigating the development of hearing in songbirds is therefore important, even though technically challenging. More broadly, there is accumulating evidence that some bird species use sounds beyond their known hearing range (especially towards high frequencies), which also calls for a reassessment of avian auditory perception.

      Weaknesses

      Rather than following validated protocols, the study presents many experimental flaws and two major methodological mistakes (see below), which invalidate all results on responses to frequency-specific tones in nestlings and those on vibration transmission to eggs, as well as largely underestimating hearing sensitivity. Accordingly, the study fails to detect a response in the majority of individuals tested with tones, including adults, and the results are overall inconsistent with previous studies in songbirds. The text throughout the preprint is also highly inaccurate, often presenting only part of the evidence or misrepresenting previous findings (both qualitatively and quantitatively; some examples are given below), which alters the conclusions.

      Conclusion and impact

      The conclusion from this study is not supported by the evidence. Even if the experiment had been performed correctly, there are well-recognised limitations and challenges of the method that likely explain the lack of response. The preprint fails to acknowledge that the method is well-known for largely underestimating hearing threshold (by 20-40dB in animals) and that it may not be suitable for a 1-gram hatchling. Unlike what is claimed throughout, including in the title, the failure to detect hearing sensitivity in this study does not invalidate all previous findings documenting the impacts of prenatal sound and noise on songbird development. The limitations of the approach and of this study are a much more parsimonious explanation. The incorrect results and interpretations, and the flawed representation of current knowledge, mean that this preprint regrettably creates more confusion than it advances the field.

      Detailed assessment

      For brevity, only some references are included below as examples, using, when possible, those cited in the preprint (DOI is provided otherwise). A full review of all the studies supporting the points below is beyond the scope of this assessment.

      (A) Hearing experiment

      The study uses the Auditory Brainstem Response (ABR), which measures minute electrical signals transmitted to the surface of the skull from the auditory nerve and nuclei in the brainstem. ABR is widely used, especially in humans, because it is non-invasive. However, ABR is also a lot less sensitive than other methods, and requires very specific experimental precautions to reliably detect a response, especially in extremely small animals and with high-frequency sounds, as here.

      (1) Results on nestling frequency sensitivity are invalid, for failing to follow correct protocols:

      The results on frequency testing in nestlings are invalid, since what might serve as a positive control did not work: in adults, no response was detected in a majority of individuals, at the core of their hearing range, with loud 95dB sounds (Figure S1), when testing frequency sensitivity with "tone burst".

      This is mostly because the study used a stimulation duration 5 times larger than the norm. It used 25ms tone bursts, when all published avian studies (in altricial or precocial birds) used stimulation of 5ms or less (when using subdermal electrodes as here; e.g., cited: Brittan-Powell et al 2004; not cited: Brittan-Powell et al 2002 (doi: 10.1121/1.1494807), Henry & Lucas 2008 (doi: 10.1016/j.anbehav.2008.08.003)). Long stimulations do not make sense and are indeed known to interfere with the detection of an ABR response, especially at high frequencies, as, for example, explicitly tested and stated in Lauridsen et al 2021 (cited).

      Adult response was then re-tested with a correct 5ms tone duration ("tone-pip"), which showed that, for the few individuals that responded to 25ms tones, thresholds were abnormally high (c.a. by 30dB; Figure 2C).<br /> Yet, no nestlings were retested with a correct protocol. There is therefore no valid data to support any conclusion on nestling frequency hearing. Under these circumstances, the fact that some nestlings showed a response to 25ms tones from day 8 would argue against them having very low sensitivity to sound.

      (2) Responses to clicks underestimate hearing onset by several days:

      Without any valid nestling responses to tones (see # 1), establishing the onset of hearing is not possible based on responses to clicks only, since responses to clicks occur at least 4 days after responses to tones during development (Saunders et al, 1973). Here, 60% of 4-day-old individuals responding to clicks means most would have responded to tones at and before 2 days post-hatch, had the experiment been done correctly.<br /> Responses to tones are indeed observed in other songbirds at 1day post-hatch (see #6).

      In budgerigars, hearing onset occurs before 5 days post hatch, since responses to both clicks and tones were detectable at the first age tested at 5dph (Brittan-Powell et al, 2004).

      (3) Experimental parameters chosen lower ABR detectability, specifically in younger birds:

      Very fast stimulus repetition rate inhibits the ABR response, especially in young:

      (a) The stimulus presentation rate (25 stim/ sec) is 6 times faster than zebra finch heat-calls, and 5 to 25 times faster than most previous studies in young birds (e.g., cited: Saunders et al 1973, 1974: 1 stim/sec or less; Katayama 1985: 3.3 clicks/sec; Brittan-Powell et al 2004: 4 stim/sec). Faster rates saturate the neurons and accordingly are known to decrease ABR amplitude and increase ABR latency, especially in younger animals with an immature nervous system. In birds, this occurs especially in the range from 5 to 30 stim/sec (e.g., cited: Saunder et al 1973, Brittan-Powell et al 2004). Values here with 25 rather than 1-4 stim/min are therefore underestimating true sensitivity.

      (b) Averaging over only 400 measures is insufficient to reliably detect weak ABR signals:

      The study uses 2 to 3 times fewer measures per stimulation type than the recommended value of 1,000 (e.g., Brittan-Powell et al 2002, 2024; Henry & Lucas 2008). This specifically affects the detection of weak signals, as in small hatchlings with tiny brains (adult zebra finches are 12-14g).

      (c) Body temperature is not specified and strongly affects the ABR:

      Controlling the body temperature of hatchlings of 1-4 grams (with a temperature probe under a 5mm-wide wing) would be very challenging. Low body temperature entirely eliminates the ABR, and even slight deviance from optimal temperature strongly increases wave latency and decreases wave amplitude (e.g., cited: Katayama 1985).

      (d) Other essential information is missing on parameters known to affect the ABR:

      This includes i) the weight of the animals, ii) whether and how the response signal was amplified and filtered, iii) how the automatised S/N>2 criteria compared to visual assessment for wave detection, and iv) what measures were taken to allow the correct placement of electrodes on hatchlings less than 5 grams.

      (4) Results in adults largely underestimate sensitivity at high frequencies, and are not the correct reference point:

      (a) Thresholds measured here at high frequencies for adults (using the correct stimulus duration, only done on adults) are 10-30dB higher than in all 3 other published ABR studies in adult zebra finches (cited: Zevin et al 2004; Amin et al 2007; not cited: Noirot et al 2011 (10.1121/1.3578452)), for both 4 and 6 kHz tone pips.

      (b) The underlying assumption used throughout the preprint that hearing must be adult-like to be functional in nestlings does not make sense. Slower and smaller neural responses are characteristic of immature systems, but it does not mean signals are not being perceived.

      (5) Failure to account for ABR underestimation leads to false conclusions:

      (a) Whether the ABR method is suitable to assess hearing in very small hatchlings is unknown. No previous avian study has used ABR before 5 days post-hatch, and all have used larger bird species than the zebra finch.

      (b) Even when performed correctly on large enough animals, the ABR systematically underestimates actual auditory sensitivity by 20-40 dB, especially at high frequencies, compared to behavioural responses (e.g., none cited: Brittan-Powell et al 2002, Henry & Lucas 2008, Noirot et al 2011). Against common practice, the preprint fails to account for this, leading to wrong interpretations. For example, in Figure 1G (comparing to heat call levels), actual hearing thresholds would be 30-40dB below those displayed. In addition, the "heat whistle" level displayed here (from the same authors) is 15dB lower than their second measure that they do not mention, and than measures obtained by others (unpublished data). When these two corrections are made - or even just the first one - the conclusion that heat-call sound levels are below the zebra finch hearing threshold does not hold.

      (c) Rather than making appropriate corrections, the preprint uses a reference in humans (L180), where ABR is measured using a much more powerful method (multi-array EEG) than in animals, and from a larger brain. The shift of "10-20dB" obtained in humans is not applicable to animals.

      (6) Results are inconsistent with previous findings in developing songbirds:

      As expected from all of the above, results and conclusions in the preprint are inconsistent with findings in other songbirds, which, using other methods, show for example, auditory sensitivity in:

      (a) zebra finch embryos, in response to song vs silence (not cited: Rivera et al 2018, doi: 10.1097/WNR.0000000000001187)

      (b) flycatcher hatchlings at 2-3d post hatch (first age tested), across a wide range of frequencies (0.3 to 5kHz), at low to moderate sound levels (45-65dB) (cited: Aleksandrov and Dmitrieva 1992, not cited: Korneeva et al 2006 (10.1134/S0022093006060056)).

      (c) songbird nestlings at 2-6d post hatch, which discriminate and behaviourally respond to relevant parental calls or even complex songs. This level of discrimination requires good hearing across frequencies (e.g., not cited: Korneeva et al 2006; Schroeder & Podos 2023 (doi: 10.1016/j.anbehav.2023.06.015)).

      (d) zebra finch nestlings at 13d post-hatch, which show adult-like processing of songs in the auditory cortex (CNM) (Schroeder & Remage‐Healey 2021, doi: 10.1002/dneu.22802).

      (e) zebra finch juveniles, which are able to perceive and learn song syllables at 5-7kHz (fundamental frequency) with very similar acoustic properties to heat calls, and also produced during inspiration (Goller & Daley 2001, doi: 10.1098/rspb.2001.1805).

      NONE of these results - which contradict results and claims in the preprint - are mentioned. Instead, the preprint focuses on very slow-developing species (parrots and owls), which take 2-4 times longer than songbirds to fledge (cited: Brittan-Powell et al 2004; Köppl & Nickel 2007; Kraemer et al 2017).

      (7) Results in figures are misreported in the text, and conclusions in the abstract and headers are not supported by the data:

      For example:

      (a) The data on Figure 1E shows that at 4 days old, 8 out of 13 nestlings (60%) responded to clicks, but the text says only 5/13 responded (L89). When 60% (4dph) and 90% (6dph) of individuals responded, the correct term would be that "most animals", rather than "some animals" responded (L89). Saying that ABR to loud sound appeared "in the majority only after one week" (L93) is also incorrect, given the data. It follows that the title of the paragraph is also erroneous.

      (b) The hearing threshold is underestimated by 40dB at 6 and 8Kz on Fig 2C, not by "10-20dB" as reported in the text (L178).

      (B) Egg vibration experiment

      (8) Using airborne sound to vibrate eggs is biologically irrelevant:

      The measurement of airborne sound levels to vibrate eggs misunderstands bone conduction hearing and is not biologically meaningful: zebra finch parents are in direct contact with the eggs when producing heat calls during incubation, not hovering in front of the nest. This misunderstanding affects all extrapolations from this study to findings in studies on prenatal communication.

      (C) Misrepresentation of current knowledge

      (9) Values from published papers are misreported, which reverses the conclusions:

      Most critical examples:

      (a) Preprint: "Zebra finch most sensitive hearing range of 1-to-4 kHz (Amin et al., 2007; Okanoya and Dooling, 1987; Yeh et al., 2023)" (L173).<br /> Actual values in the studies cited are:

      1-to-7kHz, in Amin et al 2007 (threshold [=50dB with ABR] is the same at 7kHz and 1KHz).

      1-to-6 kHz, in Okanoya and Dooling (the threshold [=30dB with behaviour] is actually lower at 6kHz than at 1KHz).

      1-to-7kHz, in Yeh et al (threshold [=35-38dB with behaviour] is the same at 7kHz and 1KHz).

      Note that zebra finch nestlings' begging calls peaking at 6kHz (Elie & Theunissen 2015, doi: 10.1007/s10071-015-0933-6), would fall 2kHz above the parents' best hearing range if it were only up to 4kHz.

      (b) The preprint incorrectly states throughout (e.g., L139, L163, L248) that heat-calls are 7-10kHz, when the actual value is 6-10kHz in the paper cited (Katsis et al, 2018).

      (c) Using the correct values from these studies, and heat-calls at 45 dB SLP (as measured by others (unpublished data), or as measured by the authors themselves, but which is not reported here (Anttonen et a,l 2025), the correct conclusion is that heat calls fall within the known zebra finch hearing range.

      (10) Published evidence towards high-frequency hearing, including in early development, is systematically omitted:

      (a) Other studies showing birds use high frequencies above the known avian hearing range are ignored. This includes oilbirds (7-23kHz; Brinklov et al 2017; by 1 of the preprint authors, doi: 10.1098/rsos.170255) and hummingbirds (10-20kHz; Duque et al 2020, doi: 10.1126/sciadv.abb9393), and in a lesser extreme, zebra finches' inspiratory song syllables at 5-7kHz (Goller & Dalley, 2001).

      (b) The discussion of anatomical development (L228-241) completely omits the well-known fact that the avian basilar papilla develops from high to low frequencies (i.e., base to apex), which - as many have pointed out - is opposite to the low-to-high development of sensitivity (e.g., cited: Cohen & Fermin 1978; Caus Capdevila et al 2021).

      (c) High frequency hearing in songbirds at hatching is several orders of magnitude better than in chickens and ducks at the same age, even though songbirds are altricial (e.g., at 4kHz, flycatcher: 47dB, chicken-duck: 90dB; at 5kHz, flycatcher: 65dB, chicken-duck: 115dB; Korneeva et al 2006, Saunders et al 1974). That is because Galliformes are low-frequency specialists, according to both anatomical and ecological evidence, with calls peaking at 0.8 to 1.2kHz rather than 2-6kHz in songbirds. It is incorrect to conclude that altricial embryos cannot perceive high frequencies because low-frequency specialist precocial birds do not (L250;261).

      The references used to support the statement on a very high threshold for precocial birds above 6kHz are also wrong (L250). Katayama 1985 did not test embryos, nor frequency tones. Neither of these two references tested ducks.

      (11) Incorrect statements do not reflect findings from the references cited

      For example:

      (a) "in altricial bird species hearing typically starts after hatching" (L12, in abstract), "with little to no functional hearing during embryonic stages (Woolley, 2017)." (L33).

      There is no evidence, in any species, to support these statements. This is only a - commonly repeated - assumption, not actually based on any data. On the contrary, the extremely limited evidence to date shows the opposite, with zebra finch embryos showing ZENK activation in the auditory cortex in response to song playback (Rivera et al, 2018, not cited).

      The book chapter cited (Woolley 2017) acknowledges this lack of evidence, and, in the context of song learning, provides as only references (prior to 2018), 2 studies showing that songbirds do not develop a normal song if the song tutor is removed before 10d post-hatch. That nestlings cannot memorise (to later reproduce) complex signals heard before d10 does not mean that they are deaf to any sound before day 10.

      Studies showing hearing in young songbird nestlings (see point 6 above) also contradict these statements.

      (b) "Zebra finch embryos supposedly are epigenetically guided to adapt to high temperatures by their parents high-frequency "heat calls" " (L36 and L135).

      This is an extremely vague and meaningless description of these results, which cannot be assessed by readers, even though these results are presented as a major justification for the present study. Rather than giving an interpretation of what "supposedly" may occur, it would be appropriate to simply synthesize the empirical evidence provided in these papers. They showed that embryonic exposure to heat-calls, as opposed to control contact calls, alters a suite of physiological and behavioural traits in nestlings, including how growth and cellular physiology respond to high temperatures. This also leads to carry-over effects on song learning and reproductive fitness in adulthood.

      (c) "The acoustic communication in precocial mallard ducks depends specifically on the low-frequency auditory sensitivity of the embryo (Gottlieb, 1975)" (L253)

      The study cited (Gottlieb, 1975) demonstrates exactly the opposite of this statement: it shows that duckling embryos, not only perceive high frequency sounds (relative to the species frequency range), but also NEED this exposure to display normal audition and behaviour post-hatch. Specifically, it shows that duckling embryos deprived of exposure to their own high-frequency calls (at 2 kHz), failed to identify maternal calls post-hatch because of their abnormal insensitivity to higher frequencies, which was later confirmed by directly testing their auditory perception of tones (Dimitrieva & Gottlieb, 1994).

      (12) Considering all of the mistakes and distortions highlighted above, it would be very premature to conclude, based on these results and statements, that altricial avian embryos are not sensitive to sound. This study provides no actual scientific ground to support this conclusion.

    5. Author Response:

      We thank all reviewers for their time and effort to carefully review our paper and for the constructive comments on our manuscript. Below we outline our planned revisions to the public reviews of the three reviewers.

      In our revision, we will include more details regarding our ABR measurements (including temperature, animal metadata), analysis (including filter settings) and lay out a much more detailed motivation for our ABR signal design. Furthermore, we will provide a more detailed discussion on the caveats of the technique and the interpretation of ABR data in general and our data specifically. Furthermore, we will add more discussion on differences between ABR based audiograms and behavioural data. The authors have extensive experience with the ABR technique and are well aware of its limitations, but also its strengths for use in animals that cannot be trained on behavioural tasks such as the very young zebra finches in this study. These additions will strengthen our paper. We think our conclusions remain justified by our data.

      Reviewer #1 and #2:

      We thank both reviewers for their positive words and suggested improvements. The planned general improvements listed above will take care of all suggestions and comments in the public review.

      Reviewer #3:

      We thank the reviewer for the detailed critique of our manuscript and many suggestions for improvement. The planned general improvements listed above will take care of many of the suggestions and comments listed in the public review. Here we will highlight a few first responses that we will address in detail in our resubmission.

      The reviewer’s major critiques can be condensed to the following four points.

      (1) ABR cannot be done in such small animals.

      This critique is unfounded. ABR measures the summed activity in the auditory pathway, and with smaller distance from brainstem to electrodes in small animals, the ABR signals are expected to have higher amplitude and consequently better SNR.  Thus, smaller animals should lead to higher amplitude ABR signals. We have successfully recorded ABR in animals smaller than 2 DPH zebra finches to support this claim (zebrafish (Jørgensen et al., 2012), 10 mm froglets (Goutte et al., 2017) and 5 mm salamanders (Capshaw et al., 2020). It is more surprising the technique still provides robust signals even in very large animals such as Minke whales (Houser et al., 2024).

      (2) The ABR methods used does not follow protocol for other published work in birds. Particularly the 25 ms long duration tone bursts may have underestimated high frequency hearing.

      There is no fixed protocol for ABR measurements, and several studies of bird ABR have used as long or even longer durations. Longer-duration signals were chosen deliberately and are necessary to have a sufficient number of cycles and avoid frequency splatter at our lowest frequencies used (see Lauridsen et al., 2021).

      (3) Sensitivity data should be corrected from ABR to behavioural data.

      We present the results of our measurements on hearing sensitivity using ABR, and ABR based thresholds are generally less sensitive than thresholds based on behavioural studies (presented in Fig 2c). Correcting for these measurements to behavioural thresholds is of course possible, but presenting only the corrected thresholds would be a misrepresentation of our sensitivity data. Even so it should be done only within species and age group and such data is currently not available. In our revision, we will include elaborate discussion on this topic.

      (4) Results are inconsistent with papers in developing songbirds.

      We agree that our results do not support and even question the claims in earlier work. These papers however do either 1) not measure hearing physiology or 2) do so in different species. To our best knowledge there is presently no data published on the auditory physiology development in songbird embryos. Our data are consistent with what is known about the physiology of auditory development in all birds studied so far. We will provide a detailed discussion on this topic in our revision.

      References

      Capshaw et al. (2020) J Exp Biol 223: jeb236489

      Goutte et al. (2017) Sci Rep 7: 12121, doi 10.1038/s41598-017-12145-5

      Houser et al. (2024) Science 386, 902-906. DOI:10.1126/science.ado7580).

      Jørgensen et al. (2012) Adv Exp Med Biol 730: 117-119

      Lauridsen et al (2021) J Exp Biol 224: jeb237313. https://doi.org/10.1242/jeb.237313

    1. eLife Assessment

      This study provides evidence for distinct neurotransmitter release modalities between two subclasses of dopaminergic neurons in the olfactory bulb. Specifically, it demonstrates dendritic neurotransmitter release in anaxonic neurons and axonal release in axon-bearing neurons. The presence of GABAergic self-inhibition in anaxonic neurons further underscores the functional divergence between these subtypes. Overall, the manuscript presents solid evidence and offers biologically important insights into the organization and function of dopaminergic circuits within the olfactory bulb.

    2. Reviewer #1 (Public review):

      Summary:

      Dorrego-Rivas et al. investigated two different DA neurons and their neurotransmitter release properties in the main olfactory bulb. They found that the two different DA neurons in mostly glomerular layers have different morphologies as well as electrophysiological properties. The anaxonic DA neurons are able to self-inhibit but the axon-bearing ones are not. The findings are interesting and important to increase the understanding both of the synaptic transmissions in the main olfactory bulb and the DA neuron diversity. However, there are some major questions that the authors need to address to support their conclusions.

      (1) It is known that there are two types of DA neurons in the glomerular layer with different diameters and capacitances (Kosaka and Kosaka, 2008; Pignatelli et al., 2005; Angela Pignatelli and Ottorino Belluzzi, 2017). In this manuscript, the authors need to articulate better which layer the imaging and ephys recordings took place, all glomerular layers or with an exception. Meanwhile, they have to report the electrophysiological properties of their recordings, including capacitances, input resistance, etc.

      (2) It is understandable that recording the DA neurons in the glomerular layer is not easy. However, the authors still need to increase their n's and repeat the experiments at least three times to make their conclusion more solid. For example (but not limited to), Fig 3B, n=2 cells from 1 mouse. Fig.4G, the recording only has 3 cells.

      (3) The statistics also use pseudoreplicates. It might be better to present the biology replicates, too.

      (4) In Figure 4D, the authors report the values in the manuscript. It is recommended to make a bar graph to be more intuitive.

      (5) In Figure 4F and G, although the data with three cells suggest no phenotype, the kinetics looked different. So, the authors might need to explore that aside from increasing the n.

      (6) Similarly, for Figure 4I and J, L and M, it is better to present and analyze it like F and G, instead of showing only the after-antagonist effect.

      Comments on revisions:

      In the rebuttal, the authors argued that it had been extremely hard to obtain recordings stable enough for before-and-after effects on the same cell. Alternatively, they could perform the before-and-after comparison on different cells.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides novel insights into the neurotransmitter release mechanisms employed by two distinct subclasses of dopaminergic neurons in the olfactory bulb (OB). The findings suggest that anaxonic neurons primarily release neurotransmitters through their dendrites, whereas axon-bearing neurons predominantly release neurotransmitters via their axons. Furthermore, the study reveals that anaxonic neurons exhibit self-inhibitory behavior, indicating that closely related neuronal subclasses may possess specialized roles in sensory processing.

      Strengths:

      This study introduces a novel and significant concept, demonstrating that two closely related neuron subclasses can exhibit distinct patterns of neurotransmitter release. Therefore, this finding establishes a valuable framework for future investigations into the functional diversity of neuronal subclasses and their contributions to sensory processing. Furthermore, these findings offer fundamental insights into the neural circuitry of the olfactory bulb, enhancing our understanding of sensory information processing within this critical brain region.

      Weaknesses:

      The reliance on synaptophysin-based presynaptic structures raises minor concerns about whether these structures represent functional synapses.

      Comments on revisions:

      Most of the concerns have been addressed by the authors, and there are no further comments about this manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This Reviewer was positive about the study, stating ‘The findings are interesting and important to increase the understanding both of the synaptic transmissions in the main olfactory bulb and the DA neuron diversity.’ They provided a number of helpful suggestions for improving the paper, which we have incorporated as follows:

      (1) It is known that there are two types of DA neurons in the glomerular layer with different diameters and capacitances (Kosaka and Kosaka, 2008; Pignatelli et al., 2005; Angela Pignatelli and Ottorino Belluzzi, 2017). In this manuscript, the authors need to articulate better which layer the imaging and ephys recordings took place, all glomerular layers or with an exception. Meanwhile, they have to report the electrophysiological properties of their recordings, including capacitances, input resistance, etc.

      We thank the Reviewer for this clarification. Indeed, the two dopaminergic cell types we study here correspond directly to the subtypes previously identified based on cell size. Our previous work showed that axon-bearing OB DA neurons have significantly larger somas than their anaxonic neighbours (Galliano et al. 2018), and we replicate this important result in the present study (Figure 3D). In terms of electrophysiological correlates of cell size, we now provide full details of passive membrane properties in the new Supplementary Figure 4, as requested. Axon-bearing DA neurons have significantly lower input resistance and show a non-significant trend towards higher cell capacitance. Both features are entirely consistent with the larger soma size in this subtype. We apologise for the oversight in not fully describing previous categorisations of OB DA neurons, and have now added this information and the appropriate citations to the Introduction (lines 56 to 59 of the revised manuscript). 

      In terms of cell location, all cells in this study were located in the OB glomerular layer. We sampled the entire glomerular layer in all experiments, including the glomerular/EPL border where the majority of axon-bearing neurons are located (Galliano et al. 2018). This is now clarified in the Materials and Methods section (lines 535 to 537 and 614 to 616 of the revised manuscript).

      (2) It is understandable that recording the DA neurons in the glomerular layer is not easy. However, the authors still need to increase their n's and repeat the experiments at least three times to make their conclusion more solid. For example (but not limited to), Fig 3B, n=2 cells from 1 mouse. Fig.4G, the recording only has 3 cells.

      Despite the acknowledged difficulty of these experiments, we have now added substantial extra data to the study as requested. We have increased the number of cells and animals to further support the following findings:

      Fig 3B: we now have n=5 cells from N=3 mice. We have created a new Supplementary Figure 1 to show all the examples.

      Figure 4G: we now have n=6 cells from N=4 mice.

      Figure 5G: we now have n=3 cells from N=3 mice.

      The new data now provide stronger support for our original conclusions. In the case of auto-evoked inhibition after the application of D1 and D2 receptor antagonists, a nonsignificant trend in the data suggests that, while dopamine is clearly not necessary for the response, it may play a small part in its strength. We have now included this consideration in the Results section (lines 256 to 264 of the revised manuscript).

      (3) The statistics also use pseudoreplicates. It might be better to present the biology replicates, too.

      Indeed, in a study focused on the structural and functional properties of individual neurons, we performed all comparisons with cell as the unit of analysis. This did often (though not always) involve obtaining multiple data points from individual mice, but in these low-throughput experiments n was never hugely bigger than N. The potential impact of pseudoreplicates and their associated within-animal correlations was therefore low. We checked this in response to the Reviewer’s comment by running parallel nested analyses for all comparisons that returned significant differences in the original submission. These are the cases in which we would be most concerned about potential false positive results arising from intra-animal correlations, which nested tests specifically take into account (Aarts et al., 2013). In every instance we found that the nested tests also reported significant differences between anaxonic and axonbearing cell types, thus fully validating our original statistical approach. We now report this in the relevant section of the Materials and Methods (lines 686 to 691 of the revised manuscript).

      (4) In Figure 4D, the authors report the values in the manuscript. It is recommended to make a bar graph to be more intuitive.

      This plot does already exist in the original manuscript. We originally describe these data to support the observation that an auto-evoked inhibition effect exists in anaxonic neurons (corresponding to now lines 240 to 245 of the revised manuscript). We then show them visually in their entirety when we compare them to the lack of response in axon-bearing neurons, depicted in Figure 5C. We still believe that this order of presentation is most appropriate for the flow of information in the paper, so have maintained it in our revised submission.

      (5) In Figure 4F and G, although the data with three cells suggest no phenotype, the kinetics looked different. So, the authors might need to explore that aside from increasing the n.

      We thank the Reviewer for this suggestion. To quantify potential changes in the autoevoked inhibition response kinetics, we fitted single exponential functions and compared changes in the rate constant (k; Methods, lines 650 to 652 of the revised manuscript). Overall, we observed no consistent or significant change in rate constant values after adding DA receptor antagonists. This finding is now reported in the Results section (lines 260 to 263 of the revised manuscript) and shown in a new Supplementary Figure 3.

      (6) Similarly, for Figure 4I and J, L and M, it is better to present and analyze it like F and G, instead of showing only the after-antagonist effect.

      We agree that the ideal scenario would have been to perform the experiments in Figure 4J and 4M the same way as those in Figure 4G, with a before vs after comparison. Unfortunately, however, this was not practically possible. 

      When attempting to apply carbenoxelone to already-patched cells, we found that this drug highly disrupted the overall health and stability of our recordings immediately after its application. This is consistent with previous reports of similar issues with this compound (e.g. Connors 2012, Epilepsy Currents; Tovar et al., 2009, Journal of Neurophysiology). After many such attempts, the total yield of this experiment was one single cell from one animal. Even so, as shown in the traces below, we were able to show that the auto-evoked inhibition response was not eliminated in this specific case:

      Author response image 1.

      Traces of an AEI response recorded before (magenta) and after (green) the application of carbenoxolone (n=1 cell from N=1 mouse).

      In light of these issues, we instead followed published protocols in applying the carbenoxolone directly in the bath without prior recording for 20 minutes (following Samailova et al., 2003, Journal of Neurochemistry) and ran the protocol after that time. Given that our main question was to ask whether gap junctions were strictly necessary for the presence of any auto-evoked inhibition response, our positive findings in these experiments still allowed us to draw clear conclusions.

      In contrast, the issue with the NKCC1 antagonist bumetanide was time. As acknowledged by this Reviewer, obtaining and maintaining high-quality patch recordings from OB DA neurons is technically challenging. Bumetanide is a slow-acting drug when used to modify neuronal chloride concentrations, because in addition to the time it takes to reach the neurons and effectively block NKCC1, the intracellular levels of chloride subsequently change slowly. Studies using this drug in slice physiology experiments typically use an incubation time of at least 20 minutes (e.g. Huberfeld et al., 2007, Journal of Neuroscience), which was incompatible with productive data collection in OB DA neurons. Again, after many unsuccessful efforts, we were forced instead to include bumetanide in the bath without prior recording for 20-30 minutes. As with the carbenoxolone experiment, our goal here was to establish whether autoevoked inhibition was in any way retained in the presence of this drug, so our positive result again allowed us to draw clear conclusions.

      Reviewer #1 (Recommendations for the authors):

      (1) I suggest the authors reconsider the terminology. For example, they use "strikingly" in their title. The manuscript reported two different transmitter release strategies but not the mechanisms, and the word "strikingly" is not professional, either.

      We appreciate the Reviewer’s attention to clarity and tone in the manuscript title, and have nevertheless decided to retain the original wording. The almost all-or-nothing differences between closely related cell types shown in structural and functional properties here (Figures 3F & 5C) are pronounced, extremely clear and easily spotted – all properties appropriate for the word ‘striking.’ In addition, we note that the use of this term is not at all unprofessional, with a PubMed search for ‘strikingly’ in the title of publications returning over 200 hits.

      (2) Similarly, almost all confocal scopes are 3D because images can be taken at stacks. So "3D confocal" is misleading.

      We understand that this is misleading. We have now replaced the sentence ‘Example snapshot of a 3D confocal stack of…’ by ‘Example confocal images of…’ in all the figure legends that apply.

      (3) It is recommended to present the data in bar graphs with data dots instead of showing the numbers in the manuscript directly.

      We agree entirely, and now present data plots for all comparisons reported in the study (Supplementary Figures 2, 4 and 5).

      Reviewer #2 (Recommendations for the authors):

      (1) Several experiments report notably small sample sizes, such as in Figures 3B and 5G, where data from only 2 cells derived from 1-2 mice are presented. Figures 4E-G also report the experimental result only from 3 cells derived from 3 mice. To enhance the statistical robustness and reliability of the findings, these experiments should be replicated with larger sample sizes.

      As per our response to Reviewer 1’s comment #2 above, and to directly address the concern that some evidence was ‘incomplete’, we have now added significant extra data and analysis to this revised submission (Figures 4 and 5; and Supplementary Figure 1). We believe that this has further enhanced the robustness and reliability of our findings, as requested.

      (2) The authors utilize vGAT-Cre for Figures 1-3 and DAT-tdTomato for Figures 4-5, raising concerns about consistency in targeting the same population of dopaminergic neurons. It remains unclear whether all OB DA neurons express vGAT and release GABA. Clarification and additional evidence are needed to confirm whether the same neuronal population was studied across these experiments.

      Although we indeed used different mouse lines to investigate structural and functional aspects of transmitter release, we can be very confident that both approaches allowed us to study the same two distinct DA cell types being compared in this paper. Existing data to support this position are already clear and strong, so in this revision we have focused on the Reviewer’s suggestion to clarify the approaches we chose.

      First, it is well characterised that in mouse and many other species all OB DA neurons are also GABAergic. This has been demonstrated comprehensively at the level of neurochemical identity and in terms of dopamine/GABA co-release, and is true across both small-soma/anaxonic and large-soma/axon-bearing subclasses (Kosaka & Kosaka 2008; 2016; Maher & Westbrook 2008; Borisovska et al., 2013; Vaaga et al., 2016; Liu et al. 2013). To specifically confirm vGAT expression, we have also now provided additional single-cell RNAseq data and immunohistochemical label in a revised Figure 1 (see also Panzanelli et al., 2007, now referenced in the paper, who confirmed endogenous vGAT colocalisation in TH-positive OB neurons). Most importantly, by using vGAT-cre mice here we were able to obtain sufficient numbers of both anaxonic and axon-bearing DA neurons among the vGAT-cre-expressing OB population. We could unambiguously identify these cells as dopaminergic because of their expression of TH protein which, due to the absence of noradrenergic neurons in the OB, is a specific and comprehensive marker for dopaminergic cells in this brain region (Hokfelt et al., 1975; Rosser et al., 1986; Kosaka & Kosaka 2016). Crucially, both axon-bearing and anaxonic OB DA subtypes strongly express TH (Galliano et al., 2018, 2021). We have now added additional text to the relevant Results section (lines 99 to 108 of the revised manuscript) to clarify these reasons for studying vGAT-cre mice here.

      We were also able to clearly identify and sample both subtypes of OB DA neuron using DAT-tdT mice. Our previous published work has thoroughly characterised this exact mouse line at the exact ages studied in the present paper (Galliano et al., 2018; Byrne et al., 2022). We know that DAT-tdT mice provide rather specific label for TH-expressing OB DA neurons (75% co-localisation; Byrne et al., 2022), but most importantly we know which non-DA neurons are labelled in this mouse line and how to avoid them. All nonTH-expressing but tdT-positive cells in juvenile DAT-tdT mice are small, dimly fluorescent and weakly spiking neurons of the calretinin-expressing glomerular subtype (Byrne et al., 2022). These cells are easily detected during physiological recordings, and were excluded from our study here. This information is now provided in the relevant Methods section (lines 616 to 619 of the revised manuscript, also referenced in lines 236 to 240 of the results section), and we apologise for its previous omission. Finally, we have shown both structurally and functionally that both axon-bearing and anaxonic OB DA subtypes are labelled in DAT-tdT mice (Galliano et al., 2018, Tufo et al., 2025; present study). Overall, these additional clarifications firmly establish that the same neuronal populations were indeed studied across our experiments.

      (3) The low TH+ signal in Figure 1D raises questions regarding the successful targeting of OB DA neurons. Further validation, such as additional staining, is required to ensure that the targeted neurons are accurately identified.

      As noted in our response to the previous comment, TH is a specific marker for dopaminergic neurons in the mouse OB, and is widely used for this purpose. Labelling for TH in our tissue is extremely reliable, and in fact gives such strong signal that we were forced to reduce the primary antibody concentration to 1:50,000 to prevent bleedthrough into other acquisition channels. Even at this concentration it was extremely straightforward to unambiguously identify TH-positive cells based on somatic immunofluorescence. We recognise, however, that the original example image in Figure 1D was not sufficiently clear, and have now provided a new example which illustrates the TH-based identification of these cells much more effectively. 

      (4) Estimating the total number of dopaminergic neurons in the olfactory bulb, along with the relative proportions of anaxonic and axon-bearing neuron subtypes, would provide valuable context for the study. Presenting such data is crucial to underscore the biological significance of the findings.

      This information has already been well characterised in previous studies. Total dopaminergic cell number in the OB is ~90,000 (Maclean & Shipley, 1988; Panzanelli et al., 2007; Parrish-Aungst et al., 2007). In terms of proportions, anaxonic neurons make up the vast majority of these cells, with axon-bearing neurons representing only ~2.5% of all OB dopaminergic neurons at P28 (Galliano et al., 2018). Of course, the relatively low number of the axon-bearing subtype does not preclude its having a potentially large influence on glomerular networks and sensory processing, as demonstrated by multiple studies showing the functional effects of inter-glomerular inhibition (Kosaka & Kosaka, 2008; Liu et al., 2013; Whitesell et al., 2013; Banerjee et al., 2015). This information has now been added to the Introduction (line 47 and lines 59 to 62 of the revised manuscript).

      (5) The authors report that in-utero injection was performed based on the premise that the two subclasses of dopaminergic neurons in the olfactory bulb are generated during embryonic development. However, it remains unclear whether in-utero injection is essential for distinguishing between these two subclasses. While the manuscript references a relevant study, the explanation provided is insufficient. A more detailed justification for employing in-utero injection would enhance the manuscript's clarity and methodological rigor.

      We apologise for the lack of clarity in explaining the approach. In utero injection is not absolutely essential for distinguishing between the two subclasses, but it does have two major advantages. 1) Because infection happens before cells migrate to their final positions, it produces sparse labelling which permits later unambiguous identification of individual cells’ processes; and 2) Because both subclasses are generated embryonically (compared to the postnatal production of only anaxonic DA neurons), it allows effective targeting of both cell types. We have now expanded the relevant section of the Results to explain the rationale for our approach in more detail (lines 109 to 116 of the revised manuscript).

      (6) In Figures 1A and 4A, it appears that data from previously published studies were utilized to illustrate the differential mRNA expression in dopaminergic neurons of the olfactory bulb. However, the Methods section and the manuscript lack a detailed description of how these dopaminergic neurons were classified or analyzed. Given that these figures contribute to the primary dataset, providing additional explanation and context is essential to ensure clarity of the findings.

      We apologise for the lack of clarity. We have now extended the part of the methods referring to the RNAseq data analysis (lines 666 to 678 of the revised manuscript). 

      (7) In Figure 2C, anaxonic dopamine neurons display considerable variability in the number of neurotransmitter release sites, with some neurons exhibiting sparse sites while others exhibit numerous sites. The authors should address the potential biological or methodological reasons for this variability and discuss its significance.

      We thank the Reviewer for highlighting this feature of our data. We have now outlined potential methodological reasons for the variability, whilst also acknowledging that it is consistent with previous reports of presynaptic site distributions in these cells (Kiyokage et al., 2017; Results, lines 169 to 172 of the revised manuscript). We have also added a brief discussion of the potential biological significance (Discussion, lines 446 to 450).

      (8) In the images used to differentiate anaxonic and axon-bearing neurons, the soma, axons, and dendrites are intermixed, making it difficult to distinguish structures specific to each subclass. Employing subclass-specific labeling or sparse labeling techniques could enhance clarity and accuracy in identifying these structures.

      Distinguishing these structures is indeed difficult, and was the main reason we used viral label to produce sparse labelling (see response to comment #5 above). In all cases we were extremely careful, including cells only when we could be absolutely certain of their anaxonic or axon-bearing identity, and could also be certain of the continuity of all processes. Crucially, while the 2D representations we show in our figures may suggest a degree of intermixing, we performed all analyses on 3D image stacks, significantly improving our ability to accurately assign structures to individual cells. We have now added extra descriptions of this approach in the relevant Methods section (lines 546 to 548 of the revised manuscript).

      (9) In Figure 3, the soma area and synaptophysin puncta density are compared between axon-bearing and anaxonic neurons. However, the figure only presents representative images of axon-bearing neurons. To ensure a fair and accurate comparison, representative images of both neuron subtypes should be included.

      The original figures did include example images of puncta density (or lack of puncta) in both cell types (Figure 2B and Figure 3E). For soma area, we have now included representative images of axon-bearing and anaxonic neurons with an indication of soma area measurement in a new Supplementary Figure 2A.

      (10) In Figure 4B, the authors state that gephyrin and synaptophysin puncta are in 'very close proximity.' However, it is unclear whether this proximity is sufficient to suggest the possibility of self-inhibition. Quantifying the distance between gephyrin and synaptophysin puncta would provide critical evidence to support this claim. Additionally, analyzing the distribution and proportion of gephyrinsynaptophysin pairs in close proximity would offer further clarity and strengthen the interpretation of these findings.

      We thank the Reviewer for raising this issue. We entirely agree that the example image previously shown did not constitute sufficient evidence to claim either close proximity of gephyrin and synaptophysin puncta, nor the possibility of self-inhibition. We are not in a position to perform a full quantitative analysis of these spatial distributions, nor do we think this is necessary given previous direct evidence for auto-evoked inhibition in OB dopaminergic cells (Smith and Jahr, 2002; Murphy et al., 2005; Maher and Westbrook, 2008; Borisovska et al., 2013) and our own demonstration of this phenomenon in anaxonic neurons (Figure 4). We have therefore removed the image and the reference to it in the text. 

      (11) In Figures 4J and 4M, the effects of the drugs are presented without a direct comparison to the control group (baseline control?). Including these baseline control data is essential to provide a clear context for interpreting the drug effects and to validate the conclusions drawn from these experiments.

      We appreciate the Reviewer’s attention to this important point. As this concern was also raised by Reviewer 1 (their point #6), we have provided a detailed response fully addressing it in our replies to Reviewer 1 above. 

      (12) In Lines 342-344, the authors claim that VMAT2 staining is notoriously difficult. However, several studies (e.g., Weihe et al., 2006; Cliburn et al., 2017) have successfully utilized VMAT2 staining. Moreover, Zhang et al., 2015 - a reference cited by the authors - demonstrates that a specific VMAT2 antibody effectively detects VMAT2. Providing evidence of VMAT2 expression in OB DA neurons would substantiate the claim that these neurons are GABA-co-releasing DA neurons and strengthen the study's conclusions.

      As noted in response to this Reviewer’s comment #2 above, there is clear published evidence that OB DA neurons are GABA- and dopamine-releasing cells. These cells are also known to express VMAT2 (Cave et al., 2010; Borisovska et al., 2013; Vergaña-Vera et al., 2015). We do not therefore believe that additional evidence of VMAT2 expression is necessary to strengthen our study’s conclusions. We did make every effort to label VMAT2-positive release sites in our neurons, but unfortunately all commercially available antibodies were ineffective. The successful staining highlighted by the Reviewer was either performed in the context of virally driven overexpression (Zhang et al., 2015) or was obtained using custom-produced antibodies (Weihe et al., 2006; Cliburn et al., 2017). We have now modified the Discussion text to provide more clarification of these points (lines 393 to 395 of the revised manuscript).

    1. eLife Assessment

      This important theoretical study shows that active hexatic topological defects in epithelia enable collective cell flows. Within the general limitations of coarse-grained hydrodynamic models in fully capturing cell-scale behavior, the study provides compelling evidence supporting its conclusions. These findings will be of interest to both biophysicists studying collective cell behaviors and biologists investigating epithelial flows during development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the physical mechanisms underlying cell intercalation, which then enables collective cell flows in confluent epithelia. The authors show that T1 transitions (the topological transitions responsible for cell intercalation) correspond to the unbinding of groups of hexatic topological defects. Defect unbinding, and hence cell intercalation and collective cell flows, are possible when active stresses in the tissue are extensile. This result helps to rationalize the observation that many epithelial cell layers have been found to exhibit extensile active nematic behavior.

      Strengths:

      The authors obtain their results based on a combination of active hexanematic hydrodynamics and a multiphase field (MPF) model for epithelial layers, whose connection is a strength of the paper. With the hydrodynamic approach, the authors find the active flow fields produced around hexatic topological defects, which can drive defect unbinding. Using the MPF simulations, the authors show that T1 transitions tend to localize close to hexatic topological defects.

    3. Reviewer #2 (Public review):

      Summary:

      This paper studies the role of hexatic defects in the collective migration of epithelia. The authors emphasize that epithelial migration is driven by cell intercalation events and not just isolated T1 events, and analyze this through the lens of hexatic topological defects. Finally, the authors study the effect of active and passive forces on the dynamics of hexatic defects using analytical results, and numerical results in both continuum and phase-field models. The results are very interesting, and highlight new ways of studying epithelial cell migration through the analysis of the binding and unbinding of hexatic defects.

      Strengths:

      (1) The authors convincingly argue that intercalation events are responsible for collective cell migration, and that these events are accompanied by the formation and unbinding of hexatic topological defects. (2) The authors clearly explain the dynamics of hexatic defects during T1 transitions, and demonstrate the importance of active and passive forces during cell migration. (3) The paper thorougly studies the T1 transition throught the viewpoint of hexatic defects. A continuum model approach to study T1 transitions in cell layers is novel and can lead to valuable new insights.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This paper investigates the physical mechanisms underlying cell intercalation, which then enables collective cell flows in confluent epithelia. The authors show that T1 transitions (the topological transitions responsible for cell intercalation) correspond to the unbinding of groups of hexatic topological defects. Defect unbinding, and hence cell intercalation and collective cell flows, are possible when active stresses in the tissue are extensile. This result helps to rationalize the observation that many epithelial cell layers have been found to exhibit extensile active nematic behavior.

      Strengths

      The authors obtain their results based on a combination of active hexanematic hydrodynamics and a multiphase field (MPF) model for epithelial layers, whose connection is a strength of the paper. With the hydrodynamic approach, the authors find the active flow fields produced around hexatic topological defects, which can drive defect unbinding. Using the MPF simulations, the authors show that T1 transitions tend to localize close to hexatic topological defects.

      We are grateful to Reviewer #1, for appreciating and highlighting the strengths of work.

      Weaknesses

      Citations are sometimes not comprehensive. Cases of contractile behavior found in collective cell flows, which would seemingly contradict some of the authors’ conclusions, are not discussed.

      I encourage the authors to address the comments and questions below.

      We are thankful to Reviewer #1, for their questions and comments. We have addressed them point by point below, and have amended the manuscript accordingly.

      (1) In Equation 1, what do the authors mean by the cluster’s size ℓ? How is this quantity defined? The calculations in the Methods suggest that ℓ indicates the distance between the p-atic defects and the center of the T1 cell cluster, but this is not clearly defined.

      We are thank Reviewer #1 for their question. We define the cluster size as the initial distance between the center of the quadrupole and any defect (see Methods). In a primary cell cluster, where cells themselves are the defects, the cluster’s size is the distance between the center of the central junction and the center of any cell in the cluster. Hence, this is half the diameter of an cell which, for example in a typical, confluent MDCK epithelial monolayer, would be about 10µm. We have added this clarification in the definition of the cluster size, above Eq. (1).

      (2) The multiphase field model was developed and reviewed already, before the Loewe et al. 2020 paper that the authors cite. Earlier papers include Camley et al. PNAS 2014, Palmieri et al. Sci. Rep. 2015, Mueller et al. PRL 2019, and Peyret et al. Biophys. J. 2019, as reviewed in Alert and Trepat. Annu. Rev. Condens. Matter Phys. 2020.

      We thank the referee for their suggestion to incorporate further MPF literature. We have done so in the amended manuscript.

      (3) At what time lag is the mean-squared displacement in Figure 3f calculated? How does the choice of a lag time affect these data and the resulting conclusions?

      The scatter plot in Fig. 3f was constructed by dividing the system into square subregions of size ∆ℓ = 35 l.u., each containing approximately 4 cells. For each subregion, we analyzed a time window of ∆t = 25 × 10<sup>3</sup> iterations, measuring both the normalized mean square displacement of cells (relative to the subregion area ∆ℓ<sup>2</sup>) and the average defect density. The normalized displacement is calculated as m.s.d. , where t∗ denotes the start time of the observation window. We chose the time window ∆t used to compute the mean square displacement to match the characteristic duration of T1 events and defect lifetimes in our simulations. Observation times much longer (∆t > 35 × 10<sup>3</sup>) than the typical T1 event duration would cause the two sets of data points to merge into a single group, suggesting no correlation between cell motility and defect density beyond defect life-time.

      (4) The authors argue that their results provide an explanation for the extensile behavior of cell layers. However, there are also examples of contractile behavior, such as in Duclos et al., Nat. Phys., 2017 and in P´erez-Gonz´alez et al., Nat. Phys., 2019. In both cases, collective cell flows were observed, which in principle require cell intercalations. How would these observations be rationalized with the theory proposed in this paper? Can these experiments and the theory be reconciled?

      The contractile or extensile nature of stress in epithelia depends crucially on the specific tissue type and its biological context. Different cell populations, depending on their position along the epithelial/mesenchymal spectrum, can exhibit either contractile or extensile behaviors. Our theory applies to tissues where hexatic order dominates at the cellular scale, particularly in confluent systems where neighbor exchanges occur primarily through T1 transitions. In contrast, the systems studied by Duclos et al., Nat. Phys. (2018) and Perez-Gonzalez et al. (Nat. Phys., 2019) exhibit nematic order at the cellular level, meaning their dynamics are governed by fundamentally different mechanisms. Since our framework is derived for hexatic-dominated tissues, it does not directly apply to those cases, though a hybrid hexanematic descriptions previously developed by some of the authors in Armengol-Collado et al. eLife 13:e86400 (2024) could help reconcile these observations. In general, a key distinction must be made between the contractility of individual cells and the extensile/contractile nature of the collective force network. To illustrate this, consider a cell exerting a 6- fold symmetric force distribution: each vertex force arises from an imbalance in junctional tensions with neighboring cells, which are themselves contractile due to actomyosin activity. However, the resulting vertex forces can be either contractile or extensile depending on network geometry and tension distribution. This is captured in our coarse-grained description [see Armengol-Collado et al. eLife 13:e86400 (2024)], where the active stress emerges from higher-order moments of cellular forces. Specifically, the deviatoric part of the hexatic active stress tensor , where is the cell radius, the number cell density and the intensity of cellular tension. The negative sign of the coefficient of the active stress shows that the active stress is extensile—consistently with observations in various epithelial systems (e.g., Saw et al., Nature 2017; Blanch-Mercader et al., Phys. Rev. Lett. 2018). Finally, we note that the connection between cellular-scale forces and large-scale extensility has been rationalized in other contexts, such as active nematics (Balasubramaniam et al., Nat. Mater. 2021).

      Reviewer #2 (Public Review):

      This paper studies the role of hexatic defects in the collective migration of epithelia. The authors emphasize that epithelial migration is driven by cell intercalation events and not just isolated T1 events, and analyze this through the lens of hexatic topological defects. Finally, the authors study the effect of active and passive forces on the dynamics of hexatic defects using analytical results, and numerical results in both continuum and phase-field models.

      The results are very interesting and highlight new ways of studying epithelial cell migration through the analysis of the binding and unbinding of hexatic defects.

      We are grateful to Reviewer #2, for their interest and for emphasizing the novelty of our work.

      Strengths

      (1) The authors convincingly argue that intercalation events are responsible for collective cell migration, and that these events are accompanied by the formation and unbinding of hexatic topological defects.

      (2) The authors clearly explain the dynamics of hexatic defects during T1 transitions, and demonstrate the importance of active and passive forces during cell migration.

      (3) The paper thoroughly studies the T1 transition through the viewpoint of hexatic defects. A continuum model approach to study T1 transitions in cell layers is novel and can lead to valuable new insights.

      We thank the Reviewer for their kind and supporting words, and for highlighting the clarity, persuasiveness, and thoroughness.

      Weaknesses

      (1) The authors could expand on the dynamics of existing hexatic defects during epithelial cell migration, in addition to how they are created during T1 transitions.

      We thank the referee for their comment. The detailed analysis of dislocation-pair unbinding modes and their statistical impact on the transition to collective migration is comprehensively addressed in our subsequent work Puggioni et al., arXiv:2502.09554. In the present study, we focus specifically on the fundamental mechanism enabling dislocation unbinding: active extensile stresses generate flows that drive dislocation pairs apart, while passive elastic stresses tend to pull them together (Krommydas et al., Phys. Rev. Lett. 2023; Armengol- Collado et al., arXiv:2502.13104). When active forces dominate over passive restoring forces, the dislocations unbind. This represents a crucial distinction from classical Berezinskii–Kosterlitz–Thouless or Kosterlitz–Thouless–Halperin–Nelson–Youn transitions, where thermal fluctuations drive defect unbinding. In our system, the process is fundamentally activity-driven. Nevertheless, the resulting state - characterized by unbound defects and collective migration - bears strong analogy to the melting transition in equilibrium systems. We emphasize that the dynamics of passive defects has been previously examined in Krommydas et al., Phys. Rev. Lett. 2023. A discussion of these aspects can be found in the Appendix “Numerical simulations of defect annihilation and unbinding”.

      (2) The different terms in the MPF model used to study cell layer dynamics are not fully justified. In particular, it is not clear why the model includes self-propulsion and rotational diffusion in addition to nematic and hexatic stresses, and how these quantities are related to each other.

      We thank the referee for their comment. The MPF model’s terms (e.g., self-propulsion, rotational diffusion), reflect the stochastic, deformable nature of cells as active droplets migrating with near-constant speed. We emphasize that self-propulsion is the only non-equilibrium mechanism in our model — no additional active stresses (nematic or hexatic) are imposed. We have clarified this point in the revised manuscript and expanded our discussion of the MPF model.

      (3) The authors could provide some physical intuition on what an active extensile or contractile term in the hexatic order parameter means, and how this is related to extensility and contractility in active nematics and/or for cell layers.

      We thank the referee for their comment. As we explain in the reply to comment [4] of Reviewer #1, the contractile or extensile nature of stress in epithelia depends crucially on the specific tissue type and its biological context. Different cell populations, depending on their position along the epithelial/mesenchymal spectrum, can exhibit either contractile or extensile behaviors. Our theory applies to tissues where hexatic order dominates at the cellular scale, particularly in confluent systems where neighbor exchanges occur primarily through T1 transitions. In contrast, the systems studied by Duclos et al., Nat. Phys. (2018) and Perez-Gonzalez et al. (Nat. Phys., 2019) exhibit nematic order at the cellular level, meaning their dynamics are governed by fundamentally different mechanisms. Since our framework is derived for hexatic-dominated tissues, it does not directly apply to those cases, though a hybrid hexanematic descriptions previously developed by some of the authors in Armengol-Collado et al. eLife 13:e86400 (2024) could help reconcile these observations. In general, a key distinction must be made between the contractility of individual cells and the extensile/contractile nature of the collective force network. To illustrate this, consider a cell exerting a 6-fold symmetric force distribution: each vertex force arises from an imbalance in junctional tensions with neighboring cells, which are themselves contractile due to actomyosin activity. However, the resulting vertex forces can be either contractile or extensile depending on network geometry and tension distribution. This is captured in our coarse-grained description [see Armengol-Collado et al. eLife 13:e86400 (2024)], where the active stress emerges from higher-order moments of cellular forces. Specifically, the deviatoric part of the hexatic active stress tensor , where is the cell radius, the number cell density and the intensity of cellular tension. The negative sign of the coefficient of the active stress shows that the active stress is extensile—consistently with observations in various epithelial systems (e.g., Saw et al., Nature 2017; Blanch-Mercader et al., Phys. Rev. Lett. 2018). Finally, we note that the connection between cellular-scale forces and large-scale extensility has been rationalized in other contexts, such as active nematics (Balasubramaniam et al., Nat. Mater. 2021).

      Recommendations for the Authors: Reviewer #2 (Recommendations for the Authors):

      (1) The authors point out that hexatic topological defects are produced in quadrupoles (L109). Does this also mean that these defects can be annihilated only in quadrupoles as well? In the same vein, are hexatic defects always bound in pairs, as suggested by the schematics, or is it possible to observe an isolated hexatic defect?

      We thank the referee for their question. Hexatic disclinations (the defect monopoles discussed in this work), much like electrons and positrons, can annihilate in any number of neutral charge configuration (dipole, quadrupole, octupole, etc.). Unbinding a pair of hexatic disinclination, however, costs much more energy than unbinding a quadrupole to dipoles. Hence isolated defects appear in abundance only in late, fully disordered phase, where the system has completely “melted”. For more details on how defect unbinding modes affect tissue dynamics, please see our subsequent work Puggioni et al., arXiv:2502.09554.

      (2) Could you clarify if the flows described in Figures 2(a)-(b), panel (i) are driven by a passive backflow term without activity? Could you compare the magnitudes of these flows compared to the typical active terms?

      We thank the referee for their question. In panel 2(b) there is only passive backflow. In 2(a) instead, both terms are included, and are in a regime of parameters where the active flow overcomes the active flow (and hence the active force overcomes the passive force as delineated in the discussions section). In turn, the magnitude of the passive flows, is studied in detail in our previous work Krommydas et al., (Phys. Rev. Lett. 2023).

      (3) Could you clarify how the continuum hexatic model and MPF model are related to each other? What are the similarities and differences in the dynamics of these models?

      We thank the referee for this insightful question. A key point of our work is precisely that the continuum hexatic model and the MPF (Multi-Phase Field) model are distinct in nature.

      The MPF model is an established agent-based framework used to simulate tissue dynamics at the cellular level. It captures individual cell behaviors and interactions through phase-field variables. In our work, we use the MPF model as a benchmark to extract statistical features of tissue dynamics, such as defect motion and orientational correlations. In contrast, our continuum hexatic model is a coarse-grained hydrodynamic theory that describes the dynamics of orientational order in active tissues. It is built on symmetry principles and conservation laws, and it does not rely on microscopic cell-level details. Instead, it captures the collective behavior of the system through a hexatic order parameter and its coupling to flow and activity.

      Despite their conceptual differences, the MPF model and our hydrodynamic theory exhibit similar statistical features. This agreement—also observed in the independent study by Jain et al. (Phys. Rev. Res. 2024)—provides strong support for the validity and generality of our continuum description.

      (4) When multiple references by the same author and year are cited using alphabets, the second alphabet is not in bold e.g. Giomi et al., 2022b, a in Line 75, and others.

      We are grateful to the referee carefully going through the manuscript and pointing out these typos. We have corrected them in the amended manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, the authors discuss epithelial tissue fluidity from a theoretical perspective. They focus on the description of topological transitions whereby cells change neighbors (T1 transitions). They explain how such transitions can be described by following the fate of hexatic defects. They first focus on a single T1 transition and the surrounding cells using a hydrodynamic model of active hexatics. They show that successful T1 intercalations, which promote tissue fluidity, require a sufficiently large extensile hexatic activity in the neighborhood of the cells attempting a T1 transition. If such activity is contractile or not sufficiently extensile, the T1 is reversed, hexatic defects annihilate, and the epithelial network configuration is unchanged. They then describe a large epithelium, using a phase field model to describe cells. They show a correlation between T1 events and hexatic defects unbinding, and identify two populations of T1 cells: one performing T1 cycles (failed T1), and not contributing to tissue migration, and one performing T1 intercalation (successful T1) and leading to the collective cell migration.

      Strengths

      The manuscript is scientifically sound, and the variety of numerical and analytical tools they use is impressive. The approach and results are very interesting and highlight the relevance of hexatic order parameters and their defects in describing tissue dynamics.

      We thank the Reviewer for recognizing the scientific soundness of the manuscript, the breadth of numerical and analytical tools employed, as well as their interest in our work.

      Weaknesses

      (1) Goal and message of the paper. (a) In my opinion, the article is mainly theoretical and should be presented as such. For instance, their conclusions and the consequences of their analysis in terms of biology are not extremely convincing, although they would be sufficient for a theory paper oriented to physicists or biophysicists. The choice of journal and potential readership should be considered, and I am wondering whether the paper structure should be re-organized, in order to have side-by-side the methods and the results, for instance (see also below).

      We thank the referee for their criticism. In response, we have made an effort to reword certain parts of the manuscript. As with any theoretical study, the biological implications of our work can only be fully assessed through experimental validation — a prospect we look forward to. Nevertheless, we have submitted our work to the subsection of Physics of Life, which we believe is perfectly suited to our content.

      (b) Currently, the two main results sections are somewhat disconnected, because they use different numerical models, and because the second section only marginally uses the results from the first section to identify/distinguish T1.

      We thank the referee, for their comment. In the second section we are using statistics from the MPF model, to support the analytical and numerical findings of our hydrodynamic theory of cell intercalation. In the time between our submission, further qualitative evidence have been brought to light in the work of Jain et al. (Phys. Rev. Res. 2024).

      (2) Quite surprisingly, the authors use a cell-based model to describe the macroscopic tissuescale behavior, and a hydrodynamic model to describe the cell-based events. In particular, their hydrodynamic description (the active hexatic model) is supposed to be a coarse-grained description, valid to capture the mesoscopic physics, and yet, they use it to describe cellscale events (T1 transitions). For instance, what is the meaning of the velocity field they are discussing in Figure 2? This makes me question the validity of the results of their first part.

      We thank the referee for their comment. There are many excellent discrete models of epithelial tissues in the literature (e.g., Bi et al., Phys. Rev. X 2016; Pasupalak et al., Soft Matter 2020; Graner et al., Phys. Rev. Lett. 1992), each capturing essential biological features such as cell division, apoptosis and sorting. While these models have provided invaluable insights, our work takes a different approach by developing a continuum theory aimed at describing epithelial dynamics at two levels: (1) mesoscopic intercalation events and (2) macroscopic collective migration. Crucially, our goal is not to replicate a specific discrete model — which would risk constructing a “model of a model” — but rather to derive a hydrodynamic description of tissue dynamics grounded in symmetry principles and conservation laws. Along this logic, the velocity field in our theory should be interpreted as an Eulerian (continuum) velocity, representing the coarse-grained flow of the tissue rather than the Lagrangian motion of individual cells. This distinction is central to our framework, which operates at scales where cellular details are averaged out, yet retains the essential physics of hexatic order and active stresses. We validate our predictions against the Multiphase Field (MPF) model. [We thank Reviewer 1 for their suggestion to incorporate further MPF literature.] Furthermore, Jain et al. (Phys. Rev. Res. 2024) have used the MPF to predict flow patterns around T1 transitions and obtained results compatible with those of our hydrodynamic theory. From this comparison we can conclude that both the MPF and our theory are able to capture the same aspect of cell intercalation in epithelial layer. This, however, does not imply that other discrete models of epithelia can reproduce this aspect too, nor that our theory is specifically tailored to the MPF model. We have clarified these points in the revised manuscript and expanded our discussion of the MPF model.

      (3) The quality of the numerical results presented in the second part (phase field model) could be improved. (a) In terms of analysis of the defects. It seems that they have all the tools to compare their cell-resolved simulations and their predictions about how a T1 event translates into defects unbinding. However, their analysis in Figure 3e is relatively minimal: it shows a correlation between T1 cells and defects. But it says nothing about the structure and evolution of the defects, which, according to their first section, should be quite precise.

      We thank the referee for their comment. Further qualitative evidence have been brought to light in the work of Jain et al. (Phys. Rev. Res. 2024), were the exact flow pattern predicted by our hydrodynamic theory is obtained, in the MPF, around cells undergoing T1 rearrangements.

      (b) In terms of clarity of the presentation. For instance, in Figure 3f, they plot the mean-square displacement as a function of a defect density. I thought that MSD was a time-dependent quantity: they must therefore consider MSD at a given time, or averaged over time. They should be explicit about what their definition of this quantity is.

      We thank the referee for raising this point. As clarified in our response to Reviewer 1, point 3, the mean square displacement (MSD) plotted in Fig. 3f is computed over a fixed time window of ∆t = 25×103 iterations, chosen to match the typical duration of T1 events and defect lifetimes. [See also reply to Reviewer #1, point (3).] The MSD is normalized by the subregion area and averaged over time within each window. We have now made this explicit in the amended version of the manuscript.

      (c) In terms of statistics. For instance, Figure 3g is used to study the role of rotational diffusion on the average time between T1s. The error bars in this figure are huge and make their claims hardly supported. Their claim of a ”monotonic decay” of the average time between intercalations is also not fully supported given their statistics.

      We appreciate the Reviewer’s comment regarding the statistical robustness of Fig. 3g. While we acknowledge that the error bars are substantial – reflecting the inherent variability in cell intercalation dynamics – the yellow curve does exhibit a consistent downward trend in the average time between T1 transitions as rotational diffusion increases. This monotonic decrease is visible across the entire range of variation of the rotational diffusion Dr, and is statistically supported when considering the trend over independent simulations. To address this concern, we have revised the main text to adjusted the wording: instead of stating that “the former is a monotonically decreasing function of Dr,” we now write that “the former displays a decreasing trend with Dr,” which better reflects the statistical variability while preserving the observed behavior.

      Reviewer #3 (Recommendations for the Authors):

      (1) Section 1 is difficult to follow due to multiple reasons: early but delayed definitions, unclear use of T1 intercalation vs. T1 cycles, disconnected figures and unclear simulation descriptions. We recommend including simulation setup details earlier and restructuring the flow of arguments.

      We thank the referee for their comment. We have made an effort in rewording and clarifying things in our amended manuscript. We are slightly confused by what they mean by “early but delayed definitions”, if they could clarify, we would be happy to amend the position and phrasing of these definitions accordingly.

      (2) It could be useful to have an additional figure early on defining schematically hexatic defects and an illustration showing an epithelium (or a simulation), similar to what the authors have produced in some of their other publications on this topic.

      We thank the referee for their comment. Figures 3c and 3d show what a hexatic defect looks like in a simulation of the epithelium. Following the referee’s recommendation, we have added a note in the caption of figure 3, citing our work were we show the same defects in MDCK epithelial monolayers (Armengol et al., Nat. Phys. 2023).

      (3) Minor points and typos:

      Line 88: the bond between vertices shrinks, not the vertices.

      Figure 1: the 1/6 is displayed as 1 6 (fraction bar missing).

      Line 232: “and order” → “one/an order”.

      Line 237: Fig. 3g) → Fig. 3g

      Line 298: ”nu” and ”v” hard to distinguish in eLife font.

      Methods: define all notation clearly (e.g., tensor product exponent, D/Dt in Eq. 3c).

      Methods: ”cell orientation, coarse-graining and topological defects” section is difficult to follow, schematic would help.

      Line 457 onward: unclear how panels (ii-iv) of Fig. 2ab are obtained.

      Line 480 onward: not referenced in main text.

      Figure 2: “avalancHe” typo.

      Figure 2 caption: “cell intercalaTION” typo.

      Movies are neither referenced nor explained.

      Figure 5 and 6 are not referenced in the main text.

      We thank the referee for their detailed read of the paper. We have corrected all typos.

    1. eLife Assessment

      This important work attempts to understand observed variability in oral shedding of SARS-CoV-2 and suggests that routine clinical factors are not determinative. The evidence supporting the conclusion is solid though the limited clinical heterogeneity of the included cohorts, the lack of COVID vaccination, and the absence of comprehensive viral load data for model training, makes the results difficult to generalize to contemporaneous COVID-19 conditions. This study may be of interest to virologists, public health officials and clinicians.

    2. Reviewer #1 (Public Review):

      Summary:

      This study by Park and colleagues uses longitudinal saliva viral load data from two cohorts (one in the US and one in Japan from a clinical trial) in the pre-vaccine era to subset viral shedding kinetics and then use machine learning to attempt to identify clinical correlates of different shedding patterns. The stratification method identifies three separate shedding patterns discriminated by peak viral load, shedding duration, and clearance slope. The authors also assess micro-RNAs as potential biomarkers of severity but do not identify any clear relationships with viral kinetics.

      Strengths:

      The cohorts are well developed, the mathematical model appears to capture shedding kinetics fairly well, the clustering seems generally appropriate, and the machine learning analysis is a sensible, albeit exploratory approach. The micro-RNA analysis is interesting and novel.

    3. Reviewer #2 (Public Review):

      Summary:

      This study argues it has found that it has stratified viral kinetics for saliva specimens into three groups by the duration of "viral shedding"; the authors could not identify clinical data or microRNAs that correlate with these three groups.

      Strengths:

      The question of whether there is a stratification of viral kinetics is interesting.

    4. Reviewer #3 (Public Review):

      The article presents a comprehensive study on the stratification of viral shedding patterns in saliva among COVID-19 patients. The authors analyze longitudinal viral load data from 144 mildly symptomatic patients using a mathematical model, identifying three distinct groups based on the duration of viral shedding. Despite analyzing a wide range of clinical data and micro-RNA expression levels, the study could not find significant predictors for the stratified shedding patterns, highlighting the complexity of SARS-CoV-2 dynamics in saliva. The research underscores the need for identifying biomarkers to improve public health interventions and acknowledges several limitations, including the lack of consideration of recent variants, the sparsity of information before symptom onset, and the focus on symptomatic infections.

      The manuscript is well-written, with the potential for enhanced clarity in explaining statistical methodologies. This work could inform public health strategies and diagnostic testing approaches.

      Comments on the revised version from the editor:

      The authors comprehensively addressed the concerns of all 3 reviewers. We are thankful for their considerable efforts to do so. Certain limitations remain unavoidable such as the lack of immunologic diversity among included study participants and lack of contemporaneous variants of concern.

      One remaining issue is the continued use of the target cell limited model which is sufficient in most cases, but misses key datapoints in certain participants. In particular, viral rebound is poorly described by this model. Even if viral rebound does not place these cases in a unique cluster, it is well understood that viral rebound is of clinical significance.

      In addition, the use of microRNAs as a potential biomarker is still not fully justified. In other words, are there specific microRNAs that have a pre-existing mechanistic basis for relating to higher or lower viral loads? As written it still feels like microRNA was included in the analysis simply because the data existed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary:

      This study by Park and colleagues uses longitudinal saliva viral load data from two cohorts (one in the US and one in Japan from a clinical trial) in the pre-vaccine era to subset viral shedding kinetics and then use machine learning to attempt to identify clinical correlates of different shedding patterns. The stratification method identifies three separate shedding patterns discriminated by peak viral load, shedding duration, and clearance slope. The authors also assess micro-RNAs as potential biomarkers of severity but do not identify any clear relationships with viral kinetics.

      Strengths:

      The cohorts are well developed, the mathematical model appears to capture shedding kinetics fairly well, the clustering seems generally appropriate, and the machine learning analysis is a sensible, albeit exploratory approach. The micro-RNA analysis is interesting and novel.

      Weaknesses:

      The conclusions of the paper are somewhat supported by the data but there are certain limitations that are notable and make the study's findings of only limited relevance to current COVID-19 epidemiology and clinical conditions.

      We sincerely appreciate the reviewer’s thoughtful and constructive comments, which have been invaluable in improving the quality of our study. We have carefully revised the manuscript to address all points raised.

      (1) The study only included previously uninfected, unvaccinated individuals without the omicron variant. It has been well documented that vaccination and prior infection both predict shorter duration shedding. Therefore, the study results are no longer relevant to current COVID-19 conditions. This is not at all the authors' fault but rather a difficult reality of much retrospective COVID research.

      Thank you for your comment. We agree with the review’s comment that some of our results could not provide insight into the current COVID-19 conditions since most people have either already been infected with COVID-19 or have been vaccinated. We revised our manuscript to discuss this (page 22, lines 364-368). Nevertheless, we believe it is novel that we have extensively investigated the relationship between viral shedding patterns in saliva and a wide range of clinical and microRNA data, and that developing a method to do so remains important. This is important for providing insight into early responses to novel emerging viral diseases in the future. Therefore, we still believe that our findings are valuable.

      (2) The target cell model, which appears to fit the data fairly well, has clear mechanistic limitations. Specifically, if such a high proportion of cells were to get infected, then the disease would be extremely severe in all cases. The authors could specify that this model was selected for ease of use and to allow clustering, rather than to provide mechanistic insight. It would be useful to list the AIC scores of this model when compared to the model by Ke.

      Thank you for your feedback and suggestion regarding our mathematical model. As the reviewer pointed out, in this study, we adopted a simple model (target cell-limited model) to focus on reconstruction of viral dynamics and stratification of shedding patterns rather than exploring the mechanism of viral infection in detail. Nevertheless, we believe that the target cell-limited model provides reasonable reconstructed viral dynamics as it has been used in many previous studies. We revised manuscript to clarify this point (page 10, lines 139-144). Also, we revised our manuscript to provide more detailed description of the model comparison along with information about AIC (page 10, lines 130-135).

      (3) Line 104: I don't follow why including both datasets would allow one model to work better than the other. This requires more explanation. I am also not convinced that non-linear mixed effects approaches can really be used to infer early model kinetics in individuals from one cohort by using late viral load kinetics in another (and vice versa). The approach seems better for making populationlevel estimates when there is such a high amount of missing data.

      Thank you for your feedback. We recognized that our explanation was insufficient by your comment. We intended to describe that, rather than comparing performance of the two models, data fitting can be performed with same level for both models by including both datasets. We revised the manuscript to clarify this point (page 10, lines 135-139).

      Additionally, we agree that nonlinear mixed effects models are a useful approach for performing population-level estimates of missing data. On the other hand, in addition, the nonlinear mixed effects model has the advantage of making the reasonable parameter estimation for each individual with not enough data points by considering the distribution of parameters of other individuals. Paying attention to these advantages, we adopted a nonlinear mixed effects model in our study. We also revised the manuscript to clarify this (page 27, lines 472-483).

      (4) Along these lines, the three clusters appear to show uniform expansion slopes whereas the NBA cohort, a much larger cohort that captured early and late viral loads in most individuals, shows substantial variability in viral expansion slopes. In Figure 2D: the upslope seems extraordinarily rapid relative to other cohorts. I calculate a viral doubling time of roughly 1.5 hours. It would be helpful to understand how reliable of an estimate this is and also how much variability was observed among individuals.

      We appreciate your detailed feedback on the estimated up-slope of viral dynamics. As the reviewer noted, the pattern differs from that observed in the NBA cohort, which may be due to their measurement of viral load from upper respiratory tract swabs. In our estimation, the mean and standard deviation of the doubling time (defined as ln2/(𝛽𝑇<sub>0</sub>𝑝𝑐<sup>−1</sup> − 𝛿)) were 1.44 hours and 0.49 hours, respectively. Although direct validation of these values is challenging, several previous studies, including our own, have reported that viral loads in saliva increase more rapidly than in the upper respiratory tract swabs, reaching their peak sooner. Thus, we believe that our findings are consistent with those of previous studies. We revised our manuscript to discuss this point with additional references (page 20, lines 303-311).

      (5) A key issue is that a lack of heterogeneity in the cohort may be driving a lack of differences between the groups. Table 1 shows that Sp02 values and lab values that all look normal. All infections were mild. This may make identifying biomarkers quite challenging.

      Thank you for your comment regarding heterogeneity in the cohort. Although the NFV cohort was designed for COVID-19 patients who were either mild or asymptomatic, we have addressed this point and revised the manuscript to discuss it (page 21, lines 334-337).

      (6) Figure 3A: many of the clinical variables such as basophil count, Cl, and protein have very low pre-test probability of correlating with virologic outcome.

      Thank you for your comment regarding some clinical information we used in our study. We revised our manuscript to discuss this point (page 21, lines 337-338).

      (7) A key omission appears to be micoRNA from pre and early-infection time points. It would be helpful to understand whether microRNA levels at least differed between the two collection timepoints and whether certain microRNAs are dynamic during infection.

      Thank you for your comment regarding the collection of micro-RNA data. As suggested by the reviewer, we compared micro-RNA levels between two time points using pairwise t-tests and Mann-Whitney U tests with FDR correction. As a result, no micro-RNA showed a statistically significant difference. This suggests that micro-RNA levels remain relatively stable during the course of infection, at least for mild or asymptomatic infection, and may therefore serve as a biomarker independent of sampling time. We have revised the manuscript to include this information (page 17, lines 259-262).

      (8) The discussion could use a more thorough description of how viral kinetics differ in saliva versus nasal swabs and how this work complements other modeling studies in the field.

      We appreciate the reviewer’s thoughtful feedback. As suggested, we have added a discussion comparing our findings with studies that analyzed viral dynamics using nasal swabs, thereby highlighting the differences between viral dynamics in saliva and in the upper respiratory tract. To ensure a fair and rigorous comparison, we referred to studies that employed the same mathematical model (i.e., Eqs.(1-2)). Accordingly, we revised the manuscript and included additional references (page 20, lines 303-311).

      Furthermore, we clarified the significance of our study in two key aspects. First, it provides a detailed analysis of viral dynamics in saliva, reinforcing our previous findings from a single cohort by extending them across multiple cohorts. Second, this study uniquely examines whether viral dynamics in saliva can be directly predicted by exploring diverse clinical data and micro-RNAs. Notably, cohorts that have simultaneously collected and reported both viral load and a broad spectrum of clinical data from the same individuals, as in our study, are exceedingly rare. We revised the manuscript to clarify this point (page 20, lines 302-311).

      (9) The most predictive potential variables of shedding heterogeneity which pertain to the innate and adaptive immune responses (virus-specific antibody and T cell levels) are not measured or modeled.

      Thank you for your comment. We agree that antibody and T cell related markers may serve as the most powerful predictors, as supported by our own study [S. Miyamoto et al., PNAS (2023), ref. 24] as well as previous reports. While this point was already discussed in the manuscript, we have revised the text to make it more explicit (page 21, lines 327-328).

      (10) I am curious whether the models infer different peak viral loads, duration, expansion, and clearance slopes between the 2 cohorts based on fitting to different infection stage data.

      Thank you for your comment. We compared features between 2 cohorts as reviewer suggested. As a result, a statistically significant difference between the two cohorts (i.e., p-value ≤ 0.05 from the t-test) was observed only at the peak viral load, with overall trends being largely similar. At the peak, the mean value was 7.5 log<sub>10</sub> (copies/mL) in the Japan cohort and 8.1 log<sub>10</sub> (copies/mL) in the Illinois cohort, with variances of 0.88 and 0.87, respectively, indicating comparable variability.

      Reviewer #2 (Public review)

      Summary:

      This study argues it has found that it has stratified viral kinetics for saliva specimens into three groups by the duration of "viral shedding"; the authors could not identify clinical data or microRNAs that correlate with these three groups.

      Strengths:

      The question of whether there is a stratification of viral kinetics is interesting.

      Weaknesses:

      The data underlying this work are not treated rigorously. The work in this manuscript is based on PCR data from two studies, with most of the data coming from a trial of nelfinavir (NFV) that showed no effect on the duration of SARS-CoV-2 PCR positivity. This study had no PCR data before symptom onset, and thus exclusively evaluated viral kinetics at or after peak viral loads. The second study is from the University of Illinois; this data set had sampling prior to infection, so has some ability to report the rate of "upswing." Problems in the analysis here include:

      We are grateful to the reviewer for the constructive feedback, which has greatly enhanced the quality of our study. In response, we have carefully revised the manuscript to address all comments.

      The PCR Ct data from each study is treated as equivalent and referred to as viral load, without any reports of calibration of platforms or across platforms. Can the authors provide calibration data and justify the direct comparison as well as the use of "viral load" rather than "Ct value"? Can the authors also explain on what basis they treat Ct values in the two studies as identical?

      Thank you for your comment regarding description of viral load data. We recognized the lack of explanation for the integration of viral load data by reviewer's comment. We calculated viral load from Ct value using linear regression equations between Ct and viral load for each study's measurement method, respectively. We revised the manuscript to clarify this point in the section of Saliva viral load data in Methods.

      The limit of detection for the NFV PCR data was unclear, so the authors assumed it was the same as the University of Illinois study. This seems a big assumption, as PCR platforms can differ substantially. Could the authors do sensitivity analyses around this assumption?

      Thank you for your comment regarding the detection limit for viral load data. As reviewer suggested, we conducted sensitivity analysis for assumption of detection limit for the NFV dataset. Specifically, we performed data fitting in the same manner for two scenarios: when the detection limit of NFV PCR was lower (0 log<sub>10</sub> copies/mL) or higher (2 log<sub>10</sub> copies/mL) than that of the Illinois data (1.08 log<sub>10</sub> copies/mL), and compared the results.

      As a result, we obtained largely comparable viral dynamics in most cases (Supplementary Fig 6). When comparing the AIC values, we observed that the AIC for the same censoring threshold was 6836, whereas it increased to 7403 under the low censoring threshold and decreased to 6353 under the higher censoring threshold. However, this difference may be attributable to the varying number of data points treated as below the detection limit. Specifically, when the threshold is set higher, more data are treated as below the detection limit, which may result in a more favorable error calculation. To discuss this point, we have added a new figure (Supplementary Fig 6) and revised the manuscript accordingly (page 25, lines 415-418).

      The authors refer to PCR positivity as viral shedding, but it is viral RNA detection (very different from shedding live/culturable virus, as shown in the Ke et al. paper). I suggest updating the language throughout the manuscript to be precise on this point.

      We appreciate the reviewer’s feedback regarding the terminology used for viral shedding. In response, we have revised all instances of “viral shedding” to “viral RNA detection” throughout the manuscript as suggested.

      Eyeballing extended data in Figure 1, a number of the putative long-duration infections appear to be likely cases of viral RNA rebound (for examples, see S01-16 and S01-27). What happens if all the samples that look like rebound are reanalyzed to exclude the late PCR detectable time points that appear after negative PCRs?

      We sincerely thank the reviewer for the valuable suggestion. In response, we established a criterion to remove data that appeared to exhibit rebound and subsequently performed data fitting

      (see Author response image 1 below). The criterion was defined as: “any data that increase again after reaching the detection limit in two measurements are considered rebound and removed.” As a result, 15 out of 144 cases were excluded due to insufficient usable data, leaving 129 cases for analysis. Using a single detection limit as the criterion would have excluded too many data points, while defining the criterion solely based on the magnitude of increase made it difficult to establish an appropriate “threshold for increase.”

      The fitting result indicates that the removal of rebound data may influence the fitting results; however, direct comparison of subsequent analyses, such as clustering, is challenging due to the reduced sample size. Moreover, the results can vary substantially depending on the criterion used to define rebound, and establishing a consistent standard remains difficult. Accordingly, we retained the current analysis and have added a discussion of rebound phenomena in the Discussion section as a limitation (page 22, lines 355-359). We once again sincerely appreciate the reviewer’s insightful and constructive suggestion.

      Author response image 1.

      Comparison of model fits before and after removing data suspected of rebound. Black dots represent observed measurements, and the black and yellow curves show the fitted viral dynamics for the full dataset and the dataset with rebound data removed, respectively.

      There's no report of uncertainty in the model fits. Given the paucity of data for the upslope, there must be large uncertainty in the up-slope and likely in the peak, too, for the NFV data. This uncertainty is ignored in the subsequent analyses. This calls into question the efforts to stratify by the components of the viral kinetics. Could the authors please include analyses of uncertainty in their model fits and propagate this uncertainty through their analyses?

      We sincerely appreciate the reviewer’s detailed feedback on model uncertainty. To address this point, we revised Extended Fig 1 (now renumbered as Supplementary Fig 1) to include 95% credible intervals computed using a bootstrap approach. In addition, to examine the potential impact of model uncertainty on stratified analyses, we reconstructed the distance matrix underlying stratification by incorporating feature uncertainty. Specifically, for each individual, we sampled viral dynamics within the credible interval and averaged the resulting feature, and build the distance matrix using it. We then compared this uncertainty-adjusted matrix with the original one using the Mantel test, which showed a strong correlation (r = 0.72, p < 0.001). Given this result, we did not replace the current stratification but revised the manuscript to provide this information through Result and Methods sections (page 11, lines 159-162 and page 28, lines 512-519). Once again, we are deeply grateful for this insightful comment.

      The clinical data are reported as a mean across the course of an infection; presumably vital signs and blood test results vary substantially, too, over this duration, so taking a mean without considering the timing of the tests or the dynamics of their results is perplexing. I'm not sure what to recommend here, as the timing and variation in the acquisition of these clinical data are not clear, and I do not have a strong understanding of the basis for the hypothesis the authors are testing.

      We appreciate the reviewers' feedback on the clinical data. We recognized that the manuscript lacked description of the handling of clinical data by your comment. In this research, we focused on finding “early predictors” which could provide insight into viral shedding patterns. Thus, we used clinical data measured in the earliest time (date of admission) for each patient. Another reason is that the date of admission is the almost only time point at which complete clinical data without any missing values are available for all participants. We revised our manuscript to clarify this point (page 5, lines 90-95).

      It's unclear why microRNAs matter. It would be helpful if the authors could provide more support for their claims that (1) microRNAs play such a substantial role in determining the kinetics of other viruses and (2) they play such an important role in modulating COVID-19 that it's worth exploring the impact of microRNAs on SARS-CoV-2 kinetics. A link to a single review paper seems insufficient justification. What strong experimental evidence is there to support this line of research?

      We appreciate the reviewer’s comments regarding microRNA. Based on this feedback, we recognized the need to clarify our rationale for selecting microRNAs as the analyte. The primary reason was that our available specimens were saliva, and microRNAs are among the biomarkers that can be reliably measured in saliva. At the same time, previous studies have reported associations between microRNAs and various diseases, which led us to consider the potential relevance of microRNAs to viral dynamics, beyond their role as general health indicators. To better reflect this context, we have added supporting references (page 17, lines 240-243).

      Reviewer #3 (Public review)

      The article presents a comprehensive study on the stratification of viral shedding patterns in saliva among COVID-19 patients. The authors analyze longitudinal viral load data from 144 mildly symptomatic patients using a mathematical model, identifying three distinct groups based on the duration of viral shedding. Despite analyzing a wide range of clinical data and micro-RNA expression levels, the study could not find significant predictors for the stratified shedding patterns, highlighting the complexity of SARS-CoV-2 dynamics in saliva. The research underscores the need for identifying biomarkers to improve public health interventions and acknowledges several limitations, including the lack of consideration of recent variants, the sparsity of information before symptom onset, and the focus on symptomatic infections. 

      The manuscript is well-written, with the potential for enhanced clarity in explaining statistical methodologies. This work could inform public health strategies and diagnostic testing approaches. However, there is a thorough development of new statistical analysis needed, with major revisions to address the following points:

      We sincerely appreciate the thoughtful feedback provided by Reviewer #3, particularly regarding our methodology. In response, we conducted additional analyses and revised the manuscript accordingly. Below, we address the reviewer’s comments point by point.

      (1) Patient characterization & selection: Patient immunological status at inclusion (and if it was accessible at the time of infection) may be the strongest predictor for viral shedding in saliva. The authors state that the patients were not previously infected by SARS-COV-2. Was Anti-N antibody testing performed? Were other humoral measurements performed or did everything rely on declaration? From Figure 1A, I do not understand the rationale for excluding asymptomatic patients. Moreover, the mechanistic model can handle patients with only three observations, why are they not included? Finally, the 54 patients without clinical data can be used for the viral dynamics fitting and then discarded for the descriptive analysis. Excluding them can create a bias. All the discarded patients can help the virus dynamics analysis as it is a population approach. Please clarify. In Table 1 the absence of sex covariate is surprising.

      We appreciate the detailed feedback from the reviewer regarding patient selection. We relied on the patient's self-declaration to determine the patient's history of COVID-19 infection and revised the manuscript to specify this (page 6, lines 83-84).

      In parameter estimation, we used the date of symptom onset for each patient so that we establish a baseline of the time axis as clearly as possible, as we did in our previous works. Accordingly, asymptomatic patients who do not have information on the date of symptom onset were excluded from the analysis. Additionally, in the cohort we analyzed, for patients excluded due to limited number of observations (i.e., less than 3 points), most patients already had a viral load close to the detection limit at the time of the first measurement. This is due to the design of clinical trial, as if a negative result was obtained twice in a row, no further follow-up sampling was performed. These patients were excluded from the analysis because it hard to get reasonable fitting results. Also, we used 54 patients for the viral dynamics fitting and then only used the NFV cohort for clinical data analysis. We acknowledge that our description may have confused readers. We revised our manuscript to clarify these points regarding patient selecting for data fitting (page 6, lines 96-102, page 24, lines 406-407, and page 7, lines 410-412). In addition, we realized, thanks to the reviewer’s comment, that gender information was missing in Table 1. We appreciate this observation and have revised the table to include gender (we used gender in our analysis). 

      (2) Exact study timeline for explanatory covariates: I understand the idea of finding « early predictors » of long-lasting viral shedding. I believe it is key and a great question. However, some samples (Figure 4A) seem to be taken at the end of the viral shedding. I am not sure it is really easier to micro-RNA saliva samples than a PCR. So I need to be better convinced of the impact of the possible findings. Generally, the timeline of explanatory covariate is not described in a satisfactory manner in the actual manuscript. Also, the evaluation and inclusion of the daily symptoms in the analysis are unclear to me.

      We appreciate the reviewer’s feedback regarding the collection of explanatory variables. As noted, of the two microRNA samples collected from each patient, one was obtained near the end of viral shedding. This was intended to examine potential differences in microRNA levels between the early and late phases of infection. No significant differences were observed between the two time points, and using microRNA from either phase alone or both together did not substantially affect predictive accuracy for stratified groups. Furthermore, microRNA collection was motivated primarily by the expectation that it would be more sensitive to immune responses, rather than by ease of sampling. We have revised the manuscript to clarify these points regarding microRNA (page 17, lines 243-245 and 259-262).

      Furthermore, as suggested by the reviewer, we have also strengthened the explanation regarding the collection schedule of clinical information and the use of daily symptoms in the analysis (page 6, lines 90-95, page 14, lines 218-220,).

      (3) Early Trajectory Differentiation: The model struggles to differentiate between patients' viral load trajectories in the early phase, with overlapping slopes and indistinguishable viral load peaks observed in Figures 2B, 2C, and 2D. The question arises whether this issue stems from the data, the nature of Covid-19, or the model itself. The authors discuss the scarcity of pre-symptom data, primarily relying on Illinois patients who underwent testing before symptom onset. This contrasts earlier statements on pages 5-6 & 23, where they claim the data captures the full infection dynamics, suggesting sufficient early data for pre-symptom kinetics estimation. The authors need to provide detailed information on the number or timing of patient sample collections during each period.

      Thank you for the reviewer’s thoughtful comments. The model used in this study [Eqs.(1-2)] has been employed in numerous prior studies and has successfully identified viral dynamics at the individual level. In this context, we interpret the rapid viral increase observed across participants as attributable to characteristics of SARS-CoV-2 in saliva, an interpretation that has also been reported by multiple previous studies. We have added the relevant references and strengthened the corresponding discussion in the manuscript (page 20, lines 303-311).

      We acknowledge that our explanation of how the complementary relationship between the two cohorts contributes to capturing infection dynamics was not sufficiently clear. As described in the manuscript, the Illinois cohort provides pre-symptomatic data, whereas the NFV cohort offers abundant end-phase data, thereby compensating for each other’s missing phases. By jointly analyzing the two cohorts with a nonlinear mixed-effects model, we estimated viral dynamics at the individual-level. This approach first estimates population-level parameters (fixed effects) using data from all participants and then incorporates random effects to account for individual variability, yielding the most plausible parameter values.

      Thus, even when early-phase data are lacking in the NFV cohort, information from the Illinois cohort allows us to infer most reasonable dynamics, and the reverse holds true for the end phase. In this context, we argued that combining the two cohorts enables mathematical modeling to capture infection dynamics at the individual level. Recognizing that our earlier description could be misleading, we have carefully reinforced the relevant description (page 27, lines 472-483). In addition, as suggested by the reviewer, we have added information on the number of data samples available for each phase in both cohorts (page 7, lines 106-109).

      (4) Conditioning on the future: Conditioning on the future in statistics refers to the problematic situation where an analysis inadvertently relies on information that would not have been available at the time decisions were made or data were collected. This seems to be the case when the authors create micro-RNA data (Figure 4A). First, when the sampling times are is something that needs to be clarified by the authors (for clinical outcomes as well). Second, proper causal inference relies on the assumption that the cause precedes the effect. This conditioning on the future may result in overestimating the model's accuracy. This happens because the model has been exposed to the outcome it's supposed to predict. This could question the - already weak - relation with mir-1846 level.

      We appreciate the reviewer’s detailed feedback. As noted in Reply to Comments 2, we collected micro-RNA samples at two time points, near the peak of infection dynamics and at the end stage, and found no significant differences between them. This suggests that micro-RNA levels are not substantially affected by sampling time. Indeed, analyses conducted using samples from the peak, late stage, or both yielded nearly identical results in relation to infection dynamics. To clarify this point, we revised the manuscript by integrating this explanation with our response in Reply to Comments 2 (page 17, lines 259-262). In addition, now we also revised manuscript to clarify sampling times of clinical information and micro-RNA (page 6, lines 90-95).

      (5) Mathematical Model Choice Justification and Performance: The paper lacks mention of the practical identifiability of the model (especially for tau regarding the lack of early data information). Moreover, it is expected that the immune effector model will be more useful at the beginning of the infection (for which data are the more parsimonious). Please provide AIC for comparison, saying that they have "equal performance" is not enough. Can you provide at least in a point-by-point response the VPC & convergence assessments?

      We appreciate the reviewer’s detailed feedback regarding the mathematical model. We acknowledge the potential concern regarding the practical identifiability of tau (incubation period), particularly given the limited early-phase data. In our analysis, however, the nonlinear mixed-effects model yielded a population-level estimate of 4.13 days, which is similar with previously reported incubation periods for COVID-19. This concordance suggests that our estimate of tau is reasonable despite the scarcity of early data.

      For model comparison, first, we have added information on the AIC of the two models to the manuscript as suggested by the reviewer (page 10, lines 130-135). One point we would like to emphasize is that we adopted a simple target cell-limited model in this study, aiming to focus on reconstruction of viral dynamics and stratification of shedding patterns rather than exploring the mechanism of viral infection in detail. Nevertheless, we believe that the target cell-limited model provides reasonable reconstructed viral dynamics as it has been used in many previous studies. We revised manuscript to clarify this (page 10, lines 135-144). 

      Furthermore, as suggested, we have added the VPC and convergence assessment results for both models, together with explanatory text, to the manuscript (Supplementary Fig 2, Supplementary Fig 3, and page 10, lines 130-135). In the VPC, the observed 5th, 50th, and 95th percentiles were generally within the corresponding simulated prediction intervals across most time points. Although minor deviations were noted in certain intervals, the overall distribution of the observed data was well captured by the models, supporting their predictive performance (Supplementary Fig 2). In addition, the log-likelihood and SAEM parameter trajectories stabilized after the burn-in phase, confirming appropriate convergence (Supplementary Fig 3).

      (6) Selected features of viral shedding: I wonder to what extent the viral shedding area under the curve (AUC) and normalized AUC should be added as selected features.

      We sincerely appreciate the reviewer’s valuable suggestion regarding the inclusion of additional features. Following this recommendation, we considered AUC (or normalized AUC) as an additional feature when constructing the distance matrix used for stratification. We then evaluated the similarity between the resulting distance matrix and the original one using the Mantel test, which showed a very high correlation (r = 0.92, p < 0.001). This indicates that incorporating AUC as an additional feature does not substantially alter the distance matrix. Accordingly, we have decided to retain the current stratification analysis, and we sincerely thank the reviewer once again for this interesting suggestion.

      (7) Two-step nature of the analysis: First you fit a mechanistic model, then you use the predictions of this model to perform clustering and prediction of groups (unsupervised then supervised). Thus you do not propagate the uncertainty intrinsic to your first estimation through the second step, ie. all the viral load selected features actually have a confidence bound which is ignored. Did you consider a one-step analysis in which your covariates of interest play a direct role in the parameters of the mechanistic model as covariates? To pursue this type of analysis SCM (Johnson et al. Pharm. Res. 1998), COSSAC (Ayral et al. 2021 CPT PsP), or SAMBA ( Prague et al. CPT PsP 2021) methods can be used. Did you consider sampling on the posterior distribution rather than using EBE to avoid shrinkage?

      Thank you for the reviewer’s detailed suggestions regarding our analysis. We agree that the current approach does not adequately account for the impact of uncertainty in viral dynamics on the stratified analyses. As a first step, we have revised Extended Data Fig 1 (now renumbered as Supplementary Fig 1) to include 95% credible intervals computed using a bootstrap approach, to present the model-fitting uncertainty more explicitly. Then, to examine the potential impact of model uncertainty on stratified analyses, we reconstructed the distance matrix underlying stratification by incorporating feature uncertainty. Specifically, for each individual, we sampled viral dynamics within the credible interval and averaged the resulting feature, and build the distance matrix using it. We then compared this uncertainty-adjusted matrix with the original one using the Mantel test, which showed a strong correlation (r = 0.72, p < 0.001). Given this result, we did not replace the current stratification but revised the manuscript to provide this information (page 11, lines 159-162 and page 28, 512-519).

      Furthermore, we carefully considered the reviewer’s proposed one-step analysis. However, implementation was constrained by data-fitting limitations. Concretely, clinical information is available only in the NFV cohort. Thus, if these variables are to be entered directly as covariates on the parameters, the Illinois cohort cannot be included in the data-fitting process. Yet the NFV cohort lacks any pre-symptomatic observations, so fitting the model to that cohort alone does not permit a reasonable (well-identified/robust) fitting result. While we were unable to implement the suggestion under the current data constraints, we sincerely appreciate the reviewer’s thoughtful and stimulating proposal.

      (8) Need for advanced statistical methods: The analysis is characterized by a lack of power. This can indeed come from the sample size that is characterized by the number of data available in the study. However, I believe the power could be increased using more advanced statistical methods. At least it is worth a try. First considering the unsupervised clustering, summarizing the viral shedding trajectories with features collapses longitudinal information. I wonder if the R package « LongituRF » (and associated method) could help, see Capitaine et al. 2020 SMMR. Another interesting tool to investigate could be latent class models R package « lcmm » (and associated method), see ProustLima et al. 2017 J. Stat. Softwares. But the latter may be more far-reached.

      Thank you for the reviewer’s thoughtful suggestions regarding our unsupervised clustering approach. The R package “LongitiRF” is designed for supervised analysis, requiring a target outcome to guide the calculation of distances between individuals (i.e., between viral dynamics). In our study, however, the goal was purely unsupervised clustering, without any outcome variable, making direct application of “LongitiRF” challenging.

      Our current approach (summarizing each dynamic into several interpretable features and then using Random Forest proximities) allows us to construct a distance matrix in an unsupervised manner. Here, the Random Forest is applied in “proximity mode,” focusing on how often dynamics are grouped together in the trees, independent of any target variable. This provides a practical and principled way to capture overall patterns of dynamics while keeping the analysis fully unsupervised.

      Regarding the suggestion to use latent class mixed models (R package “lcmm”), we also considered this approach. In our dataset, each subject has dense longitudinal measurements, and at many time points, trajectories are very similar across subjects, resulting in minimal inter-individual differences. Consequently, fitting multi-class latent class mixed models (ng ≥ 2) with random effects or mixture terms is numerically unstable, often producing errors such as non-positive definite covariance matrices or failure to generate valid initial values. Although one could consider using only the time points with the largest differences, this effectively reduces the analysis to a feature-based summary of dynamics. Such an approach closely resembles our current method and contradicts the goal of clustering based on full longitudinal information.

      Taken together, although we acknowledge that incorporating more longitudinal information is important, we believe that our current approach provides a practical, stable, and informative solution for capturing heterogeneity in viral dynamics. We would like to once again express our sincere gratitude to the reviewer for this insightful suggestion.

      (9) Study intrinsic limitation: All the results cannot be extended to asymptomatic patients and patients infected with recent VOCs. It definitively limits the impact of results and their applicability to public health. However, for me, the novelty of the data analysis techniques used should also be taken into consideration.

      We appreciate your positive evaluation of our research approach and acknowledge that, as noted in the Discussion section as our first limitation, our analysis may not provide valid insights into recent VOCs or all populations, including asymptomatic individuals. Nonetheless, we believe it is novel that we extensively investigated the relationship between viral shedding patterns in saliva and a wide range of clinical and micro-RNA data. Our findings contribute to a deeper and more quantitative understanding of heterogeneity in viral dynamics, particularly in saliva samples. To discuss this point, we revised our manuscript (page 22, lines 364-368).

      Strengths are:

      Unique data and comprehensive analysis.

      Novel results on viral shedding.

      Weaknesses are:

      Limitation of study design.

      The need for advanced statistical methodology.

      Reviewer #1 (Recommendations For The Authors):

      Line 8: In the abstract, it would be helpful to state how stratification occurred.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 2, lines 8-11).

      Line 31 and discussion: It is important to mention the challenges of using saliva as a specimen type for lab personnel.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 36-41).

      Line 35: change to "upper respiratory tract".

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, line 35).

      Line 37: "Saliva" is not a tissue. Please hazard a guess as to which tissue is responsible for saliva shedding and if it overlaps with oral and nasal swabs.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 42-45).

      Line 42, 68: Please explain how understanding saliva shedding dynamics would impact isolation & screening, diagnostics, and treatments. This is not immediately intuitive to me.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 48-50).

      Line 50: It would be helpful to explain why shedding duration is the best stratification variable.

      We thank the reviewer for the feedback. We acknowledge that our wording was ambiguous. The clear differences in the viral dynamics patterns pertain to findings observed following the stratification, and we have revised the manuscript to make this explicit (page 4, lines 59-61).

      Line 71: Dates should be listed for these studies.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 6, lines 85-86).

      Reviewer #2 (Recommendations For The Authors):

      Please make all code and data available for replication of the analyses.

      We appreciate the suggestion. Due to ethical considerations, it is not possible to make all data and code publicly available. We have clearly stated in the manuscript about it (Data availability section in Methods).

      Reviewer #3 (Recommendations For The Authors):

      Here are minor comments / technical details:

      (1) Figure 1B is difficult to understand.

      Thank you for the comment. We updated Fig 1B to incorporate more information to aid interpretation.

      (2) Did you analyse viral load or the log10 of viral load? The latter is more common. You should consider it. SI Figure 1 please plot in log10 and use a different point shape for censored data. The file quality of this figure should be improved. State in the material and methods if SE with moonlit are computed with linearization or importance sampling.

      Thank you for the comment. We conducted our analyses using log10-transformed viral load. Also, we revised Supplementary Fig 1 (now renumbered as Supplementary Fig 4) as suggested. We also added Supplementary Fig 3 and clarified in the Methods that standard errors (SE) were obtained in Monolix from the Fisher information matrix using the linearization method (page 28, lines 498-499).

      (3) Table 1 and Figure 3A could be collapsed.

      Thank you for the comment, and we carefully considered this suggestion. Table 1 summarizes clinical variables by category, whereas Fig 3A visualizes them ordered by p-value of statistical analysis. Collapsing these into a single table would make it difficult to apprehend both the categorical summaries and the statistical ranking at a glance, thereby reducing readability. We therefore decided to retain the current layout. We appreciate the constructive feedback again. 

      (4) Figure 3 legend could be clarified to understand what is 3B and 3C.

      We thank the reviewer for the feedback and have reinforced the description accordingly.

      (5) Why use AIC instead of BICc?

      Thank you for your comment. We also think BICc is a reasonable alternative. However, because our objective is predictive adequacy (reconstruction of viral dynamics), we judged AIC more appropriate. In NLMEM settings, the effective sample size required by BICc is ambiguous, making the penalty somewhat arbitrary. Moreover, since the two models reconstruct very similar dynamics, our conclusions are not sensitive to the choice of criterion.

      (6) Bibliography. Most articles are with et al. (which is not standard) and some are with an extended list of names. Provide DOI for all.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly.

      (7) Extended Table 1&2 - maybe provide a color code to better highlight some lower p-values (if you find any interesting).

      We thank the reviewer for the feedback. Since no clinical information and micro-RNAs other than mir-1846 showed low p-values, we highlighted only mir-1846 with color to make it easier to locate.

      (8) Please make the replication code available.

      We appreciate the suggestion. Due to ethical considerations, it is not possible to make all data and code publicly available. We have clearly stated in the manuscript about it (Data availability section in Methods).

    1. eLife Assessment

      The findings of this study are valuable as it demonstrates that when treatment is initiated during acute infection, HIV specific CD8 T cell responses are maintained long term and continued proliferative capacity of these cells may play a role in reducing HIV DNA levels. The evidence supporting the conclusions are solid with rigorous and advanced methodology used with the major limitations being that the findings are association level and do not meet strict criteria for causality. The work is of interest to the HIV cure field and suggests that enhancing early HIV specific CD8 T cell responses should be considered in the design of interventional cure strategies.

    2. Reviewer #2 (Public review):

      This study investigated the impact of early HIV specific CD8 T cell responses on the viral reservoir size after 24 weeks and 3 years of follow up in individuals who started ART during acute infection. Viral reservoir quantification showed that total and defective HIV DNA, but not intact, declined significantly between 24 weeks and 3 years post-ART. The authors also showed that functional HIV-specific CD8⁺ T-cell responses persisted over three years and that early CD8⁺ T-cell proliferative capacity was linked to reservoir decline, supporting early immune intervention in the design of curative strategies.

      The paper is well written, easy to read, and the findings are clearly presented. The study is novel as it demonstrates the effect of HIV specific CD8 T cell responses on different states of the HIV reservoir, that is HIV-DNA (intact and defective), the transcriptionally active and inducible reservoir. Although small, the study cohort was relevant and well-characterized as it included individuals who initiated ART during acute infection, 12 of whom were followed longitudinally for 3 years, providing unique insights into the beneficial effects of early treatment on both immune responses and the viral reservoir. The study uses advanced methodology. I enjoyed reading the paper.

      The study's limitations are minor and well acknowledged. While the cohort included only male participants-potentially limiting generalizability-the authors have clarified this limitation in the discussion. Although a chronic infection control group was not yet available, the authors explained that their protocol includes plans to add this comparison in future studies. These limitations are appropriately addressed and do not undermine the strength or validity of the study's conclusions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this work, van Paassen et al. have studied how CD8 T cell functionality and levels predict HIV DNA decline. The article touches on interesting facets of HIV DNA decay, but ultimately comes across as somewhat hastily done and not convincing due to the major issues. 

      (1) The use of only 2 time points to make many claims about longitudinal dynamics is not convincing. For instance, the fact that raw data do not show decay in intact, but do for defective/total, suggests that the present data is underpowered. The authors speculate that rising intact levels could be due to patients who have reservoirs with many proviruses with survival advantages, but this is not the parsimonious explanation vs the data simply being noisy without sufficient longitudinal follow-up. n=12 is fine, or even reasonably good for HIV reservoir studies, but to mitigate these issues would likely require more time points measured per person. 

      (1b) Relatedly, the timing of the first time point (6 months) could be causing a number of issues because this is in the ballpark for when the HIV DNA decay decelerates, as shown by many papers. This unfortunate study design means some of these participants may already have stabilized HIV DNA levels, so earlier measurements would help to observe early kinetics, but also later measurements would be critical to be confident about stability. 

      The main goal of the present study was to understand the relationship of the HIV-specific CD8 T-cell responses early on ART with the reservoir changes across the subsequent 2.5-year period on suppressive therapy. We have revised the manuscript in order to clarify this.  We chose these time points because the 24 week time point is past the initial steep decline of HIV DNA, which takes place in the first weeks after ART initiation. It is known that HIV DNA continues to decay for years after (Besson, Lalama et al. 2014, Gandhi, McMahon et al. 2017). 

      (2) Statistical analysis is frequently not sufficient for the claims being made, such that overinterpretation of the data is problematic in many places. 

      (2a) First, though plausible that cd8s influence reservoir decay, much more rigorous statistical analysis would be needed to assert this directionality; this is an association, which could just as well be inverted (reservoir disappearance drives CD8 T cell disappearance). 

      To correlate different reservoir measures between themselves and with CD8+ T-cell responses at 24 and 156 weeks, we now performed non-parametric (Spearman) correlation analyses, as they do not require any assumptions about the normal distribution of the independent and dependent variables. Benjamini-Hochberg corrections for multiple comparisons (false discovery rate, 0.25) were included in the analyses and did not change the results. 

      Following this comment we would like to note that the association between the T-cell response at 24 weeks and the subsequent decrease in the reservoir cannot be bi-directional (that can only be the case when both variables are measured at the same time point). Therefore, to model the predictive value of T-cell responses measured at 24 weeks for the decrease in the reservoir between 24 and 156 weeks, we fitted generalized linear models (GLM), in which we included age and ART regimen, in addition to three different measures of HIV-specific CD8+ T-cell responses, as explanatory variables, and changes in total, intact, and total defective HIV DNA between 24 and 156 weeks ART as dependent variables.

      (2b) Words like "strong" for correlations must be justified by correlation coefficients, and these heat maps indicate many comparisons were made, such that p-values must be corrected appropriately. 

      We have now used Spearman correlation analysis, provided correlation coefficients to justify the wording, and adjusted the p-values for multiple comparisons (Fig. 1, Fig 3., Table 2). Benjamini-Hochberg corrections for multiple comparisons (false discovery rate, 0.25) were included in the analyses and did not change the results.  

      (3) There is not enough introduction and references to put this work in the context of a large/mature field. The impacts of CD8s in HIV acute infection and HIV reservoirs are both deep fields with a lot of complexity. 

      Following this comment we have revised and expanded the introduction to put our work more in the context of the field (CD8s in acute HIV and HIV reservoirs). 

      Reviewer #2 (Public review): 

      Summary: 

      This study investigated the impact of early HIV specific CD8 T cell responses on the viral reservoir size after 24 weeks and 3 years of follow-up in individuals who started ART during acute infection. Viral reservoir quantification showed that total and defective HIV DNA, but not intact, declined significantly between 24 weeks and 3 years post-ART. The authors also showed that functional HIV-specific CD8⁺ T-cell responses persisted over three years and that early CD8⁺ T-cell proliferative capacity was linked to reservoir decline, supporting early immune intervention in the design of curative strategies. 

      Strengths: 

      The paper is well written, easy to read, and the findings are clearly presented. The study is novel as it demonstrates the effect of HIV specific CD8 T cell responses on different states of the HIV reservoir, that is HIV-DNA (intact and defective), the transcriptionally active and inducible reservoir. Although small, the study cohort was relevant and well-characterized as it included individuals who initiated ART during acute infection, 12 of whom were followed longitudinally for 3 years, providing unique insights into the beneficial effects of early treatment on both immune responses and the viral reservoir. The study uses advanced methodology. I enjoyed reading the paper. 

      Weaknesses: 

      All participants were male (acknowledged by the authors), potentially reducing the generalizability of the findings to broader populations. A control group receiving ART during chronic infection would have been an interesting comparison. 

      We thank the reviewer for their appreciation of our study. Although we had indeed acknowledged the fact that all participants were male, we have clarified why this is a limitation of the study (Discussion, lines 296-298). The reviewer raises the point that it would be useful to compare our data to a control group. Unfortunately, these samples are not yet available, but our study protocol allows for a control group (chronic infection) to ensure we can include a control group in the future.

      Reviewer #1 (Recommendations for the authors): 

      Minor: 

      On the introduction: 

      (1) One large topic that is mostly missing completely is the emerging evidence of selection on HIV proviruses during ART from the groups of Xu Yu and Matthias Lichterfeld, and Ya Chi Ho, among others. 

      Previously, it was only touched upon in the Discussion. Now we have also included this in the Introduction (lines 77-80).

      (2) References 4 and 5 don't quite match with the statement here about reservoir seeding; we don't completely understand this process, and certainly, the tissue seeding aspect is not known. 

      Line 61-62: references were changed and this paragraph was rewritten to clarify.

      (3) Shelton et al. showed a strong relationship with HIV DNA size and timing of ART initiation across many studies. I believe Ananwaronich also has several key papers on this topic. 

      References by Ananwaronich are included (lines 91-94).

      (4) "the viral levels decline within weeks of AHI", this is imprecise, there is a peak and a decline, and an equilibrium. 

      We agree and have rewritten the paragraph accordingly.

      (5) The impact of CD8 cells on viral evolution during primary infection is complex and likely not relevant for this paper. 

      We have left viral evolution out of the introduction in order to keep a focus on the current subject.

      (6) The term "reservoir" is somewhat polarizing, so it might be worth mentioning somewhere exactly what you think the reservoir is, I think, as written, your definition is any HIV DNA in a person on ART? 

      Indeed, we refer to the reservoir when we talk about the several aspects of the reservoir that we have quantified with our assays (total HIV DNA, unspliced RNA, intact and defective proviral DNA, and replication-competent virus). In most instances we try to specify which measurement we are referring to. We have added additional reservoir explanation to clarify our definition to the introduction (lines 55-58).

      (7) I think US might be used before it is defined. 

      We thank the reviewer for this notification, we have now also defined it in the Results section (line 131).

      (8) In Figure 1 it's also not clear how statistics were done to deal with undetectable values, which can be tricky but important. 

      We have now clarified this in the legend to Figure 2 (former Figure 1). Paired Wilcoxon tests were performed to test the significance of the differences between the time points. Pairs where both values were undetectable were always excluded from the analysis. Pairs where one value was undetectable and its detection limit was higher than the value of the detectable partner, were also excluded from the analysis. Pairs where one value was undetectable and its detection limit was lower than the value of the detectable partner, were retained in the analysis.

      In the discussion: 

      (1) "This confirms that the existence of a replication-competent viral reservoir is linked to the presence of intact HIV DNA." I think this statement is indicative of many of the overinterpretations without statistical justification. There are 4 of 12 individuals with QVOA+ detectable proviruses, which means there are 8 without. What are their intact HIV DNA levels? 

      We thank the reviewer for the question that is raised here. We have now compared the intact DNA levels (measured by IPDA) between participants with positive vs. negative QVOA output, and observed a significant difference. We rephrased the wording as follows: “We compared the intact HIV DNA levels at the 24-week timepoint between the six participants, from whom we were able to isolate replicating virus, and the fourteen participants, from whom we could not. Participants with positive QVOA had significantly higher intact HIV DNA levels than those with negative QVOA (p=0.029, Mann-Whitney test; Suppl. Fig. 3). Five of six participants with positive QVOA had intact DNA levels above 100 copies/106 PBMC, while thirteen of fourteen participants with negative QVOA had intact HIV DNA below 100 copies/106 PBMC (p=0.0022, Fisher’s exact test). These findings indicate that recovery of replication-competent virus by QVOA is more likely in individuals with higher levels of intact HIV DNA in IPDA, reaffirming a link between the two measurements.”

      (2) "To determine whether early HIV-specific CD8+ T-cell responses at 24 weeks were predictive for the change in reservoir size". This is a fundamental miss on correlation vs causation... it could be the inverse. 

      We thank the reviewer for the remark. We have calculated the change in reservoir size (the difference between the reservoir size at 24 weeks and 156 weeks ART) and analyzed if the HIVspecific CD8+ T-cell response at 24 weeks ART are predictive for this change. We do not think it can be inverse, as we have a chronological relationship (CD8+ responses at week 24 predict the subsequent change in the reservoir).

      (3) "This may suggest that active viral replication drives the CD8+ T-cell response." I think to be precise, you mean viral transcription drives CD8s, we don't know about the full replication cycle from these data. 

      We agree with the reviewer and have changed “replication” to “transcription” (line 280).

      (4) "Remarkably, we observed that the defective HIV DNA levels declined significantly between 24 weeks and 3 years on ART. This is in contrast to previous observations in chronic HIV infection (30)". I don't find this remarkable or in contrast: many studies have analyzed and/or modeled defective HIV DNA decay, most of which have shown some negative slope to defective HIV DNA, especially within the first year of ART. See White et al., Blankson et al., Golob et al., Besson et al., etc In addition, do you mean in long-term suppressed? 

      The point we would like to make is that,  compared to other studies, we found a significant, prominent decrease in defective DNA (and not intact DNA) over the course of 3 years, which is in contrast to other studies (where usually the decrease in intact is significant and the decrease in defective less prominent). We have rephrased the wording (lines 227-230) as follows:

      “We observed that the defective HIV DNA levels decreased significantly between 24 and 156 weeks of ART. This is different from studies in CHI, where no significant decrease during the first 7 years of ART (Peluso, Bacchetti et al. 2020, Gandhi, Cyktor et al. 2021), or only a significant decrease during the first 8 weeks on ART, but not in the 8 years thereafter, was observed (Nühn, Bosman et al. 2025).”

      Reviewer #2 (Recommendations for the authors): 

      (1) Page 4, paragraph 2 - will be informative to report the statistics here. 

      (2) Page 4, paragraph 4 - "General phenotyping of CD4+ (Suppl. Fig. 3A) and CD8+ (Supplementary Figure 3B) T-cells showed no difference in frequencies of naïve, memory or effector CD8+ T-cells between 24 and 156 weeks." - What did the CD4+ phenotyping show? 

      We thank the reviewer for the remark. Indeed, there were also no differences in frequencies of naïve, memory or effector CD4+ T-cells between 24 and 156 weeks. We have added this to the paragraph (now Suppl. Fig 4), lines 166-168.

      (3) Page 5, paragraph 3 - "Similarly, a broad HIV-specific CD8+ T-cell proliferative response to at least three different viral proteins was observed in the majority of individuals at both time points" - should specify n=? for the majority of individuals. 

      At time point 24 weeks, 6/11 individuals had a response to env, 10/11 to gag, 5/11 to nef, and 4/11 to pol. At 156 weeks, 8/11 to env, 10/11 to gag, 8/11 to nef and 9/11 to pol. We have added this to the text (lines 188-191).

      (4) Seven of 22 participants had non-subtype B infection. Can the authors explain the use of the IPDA designed by Bruner et. al. for subtype B HIV, and how this may have affected the quantification in these participants? 

      Intact HIV DNA was detectable in all 22 participants. We cannot completely exclude influence of primer/probe-template mismatches on the quantification results, however such mismatches could also have occurred in subtype B participants, and droplet digital PCR that IPDA is based on is generally much less sensitive to these mismatches than qPCR.

      (5) Page 7, paragraph 2 - the authors report a difference in findings from a previous study ("a decline in CD8 T cell responses over 2 years" - reference 21), but only provide an explanation for this on page 9. The authors should consider moving the explanation to this paragraph for easier understanding. 

      We agree with the reviewer that this causes confusion. Therefore, we have revised and changed the order in the Discussion.

      (6) Page 7, paragraph 2 - Following from above, the previous study (21) reported this contradicting finding "a decline in CD8 T cell responses over 2 years" in a CHI (chronic HIV) treated cohort. The current study was in an acute HIV treated cohort. The authors should explain whether this may also have resulted in the different findings, in addition to the use of different readouts in each study.

      We thank the reviewer for this attentiveness. Indeed, the study by Takata et al. investigates the reservoir and HIV-specific CD8+ T-cell responses in both the RV254/ SEARCH010 study who initiated ART during AHI and the RV304/ SEARCH013 who initiated ART during CHI. We had not realized that the findings of the decline in CD8 T cell responses were solely found in the RV304/ SEARCH013 (CHI cohort). It appears functional HIV specific immune responses were only measured in AHI at 96 weeks, so we have clarified this in the Discussion. 

      Besson, G. J., C. M. Lalama, R. J. Bosch, R. T. Gandhi, M. A. Bedison, E. Aga, S. A. Riddler, D. K. McMahon, F. Hong and J. W. Mellors (2014). "HIV-1 DNA decay dynamics in blood during more than a decade of suppressive antiretroviral therapy." Clin Infect Dis 59(9): 1312-1321.

      Gandhi, R. T., J. C. Cyktor, R. J. Bosch, H. Mar, G. M. Laird, A. Martin, A. C. Collier, S. A. Riddler, B. J. Macatangay, C. R. Rinaldo, J. J. Eron, J. D. Siliciano, D. K. McMahon and J. W. Mellors (2021). "Selective Decay of Intact HIV-1 Proviral DNA on Antiretroviral Therapy." J Infect Dis 223(2): 225-233.

      Gandhi, R. T., D. K. McMahon, R. J. Bosch, C. M. Lalama, J. C. Cyktor, B. J. Macatangay, C. R. Rinaldo, S. A. Riddler, E. Hogg, C. Godfrey, A. C. Collier, J. J. Eron and J. W. Mellors (2017). "Levels of HIV-1 persistence on antiretroviral therapy are not associated with markers of inflammation or activation." PLoS Pathog 13(4): e1006285.

      Nühn, M. M., K. Bosman, T. Huisman, W. H. A. Staring, L. Gharu, D. De Jong, T. M. De Kort, N. Buchholtz, K. Tesselaar, A. Pandit, J. Arends, S. A. Otto, E. Lucio De Esesarte, A. I. M. Hoepelman, R. J. De Boer, J. Symons, J. A. M. Borghans, A. M. J. Wensing and M. Nijhuis (2025). "Selective decline of intact HIV reservoirs during the first decade of ART followed by stabilization in memory T cell subsets." Aids 39(7): 798-811.

      Peluso, M. J., P. Bacchetti, K. D. Ritter, S. Beg, J. Lai, J. N. Martin, P. W. Hunt, T. J. Henrich, J. D. Siliciano, R. F. Siliciano, G. M. Laird and S. G. Deeks (2020). "Differential decay of intact and defective proviral DNA in HIV-1-infected individuals on suppressive antiretroviral therapy." JCI Insight 5(4).

    1. eLife Assessment

      This manuscript provides evidence that mouse germline cysts develop an asymmetric Golgi, ER, and microtubule-associated structure, referred to as Visham, which in many ways resembles the fusome of Drosophila germline cysts. This is an important study that provides new evidence that fusome-like structures exist in germ cell cysts across species. While most of the data are solid, several instances remain in which conclusions regarding the dynamics and function of Visham should be restated, or additional experimental evidence should be provided to more fully support the authors' interpretations.

    2. Reviewer #1 (Public review):

      Summary:

      The authors attempt to study how oocyte incomplete cytokinesis occurs in the mouse ovary.

      Strengths:

      The finding that UPR components are highly expressed during zygotene is an interesting result that has broad implications for how germ cells navigate meiosis. The findings that proteasome activity increases in germ cells compared to somatic cells suggest that the germline might have a quantitatively different response for protein clearance.

      Weaknesses:

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc? Is this a normal way to present fluorescent microscopy?

      (2) The authors should ensure that all claims regarding enrichment/lower vs lower values have indicated statistical tests.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs 6 cyst numbers?

      (b) Figures 4d and 4e do not have a statistical test indicated.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      (6) The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

    3. Reviewer #2 (Public review):

      This study identifies Visham, an asymmetric structure in developing mouse cysts resembling the Drosophila fusome, an organelle crucial for oocyte determination. Using immunofluorescence, electron microscopy, 3D reconstruction, and lineage labeling, the authors show that primordial germ cells (PGCs) and cysts, but not somatic cells, contain an EMA-rich, branching structure that they named Visham, which remains unbranched in male cysts. Visham accumulates in regions enriched in intercellular bridges, forming clusters reminiscent of fusome "rosettes." It is enriched in Golgi and endosomal vesicles and partially overlaps with the ER. During cell division, Visham localizes near centrosomes in interphase and early metaphase, disperses during metaphase, and reassembles at spindle poles during telophase before becoming asymmetric. Microtubule depolymerization disrupts its formation.

      Cyst fragmentation is shown to be non-random, correlating with microtubule gaps. The authors propose that 8-cell (or larger) cysts fragment into 6-cell and 2-cell cysts. Analysis of Pard3 (the mouse ortholog of Par3/Baz) reveals its colocalization with Visham during cyst asymmetry, suggesting that mammalian oocyte polarization depends on a conserved system involving Par genes, cyst formation, and a fusome-like structure.

      Transcriptomic profiling identifies genes linked to pluripotency and the unfolded protein response (UPR) during cyst formation and meiosis, supported by protein-level reporters monitoring Xbp1 splicing and 20S proteasome activity. Visham persists in meiotic germ cells at stage E17.5 and is later transferred to the oocyte at E18.5 along with mitochondria and Golgi vesicles, implicating it in organelle rejuvenation. In Dazl mutants, cysts form, but Visham dynamics, polarity, rejuvenation, and oocyte production are disrupted, highlighting its potential role in germ cell development.

      Overall, this is an interesting and comprehensive study of a conserved structure in the germline cells of both invertebrate and vertebrate species. Investigating these early stages of germ cell development in mice is particularly challenging. Although primarily descriptive, the study represents a remarkable technical achievement. The images are generally convincing, with only a few exceptions.

      Major comments:

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title "Mouse germline cysts contain a fusome-like structure that mediates oocyte development":

      The term "mediates" could be misleading, as the functional data on Visham (based on comparing its absence to wild-type) actually reflects either a microtubule defect or a Dazl mutant context. There is no specific loss-of-function of visham only.

      (1b) Result title, "Visham overlaps centrosomes and moves on microtubules":

      The term "moves" implies dynamic behavior, which would require live imaging data that are not described in the article.

      (1c) Result title, "Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation":

      The presented data show that the presence of Visham in the cyst coincides temporally with the expression and activity of the UPR response; the term "associates" is unclear in this context.

      (1d) Result title, "Visham participates in organelle rejuvenation during meiosis":

      The term "participates" suggests that Visham is required for this process, whereas the conclusion is actually drawn from the Dazl mutant context, not a specific loss-of-function of visham only.

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

    4. Reviewer #3 (Public review):

      This manuscript provides evidence that mice have a fusome, a conserved structure most well studied in Drosophila that is important for oocyte specification. Overall, a myriad of evidence is presented demonstrating the existence of a mouse fusome that the authors term visham. This work is important as it addresses a long-standing question in the field of whether mice have fusomes and sheds light on how oocytes are specified in mammals. Concerns that need to be addressed revolve around several conclusions that are overstated or unclear and are listed below.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule" but there is nothing in this section that shows it is Golgi-rich. It does show that the structure is asymmetric and has branches.

      (2) Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      (6) Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      (7) Line 257-259---wave 1 and 2 follicles need to be explained in the introduction, and how this fits with the observations here clarified.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary

      We thank the reviewer for the constructive and thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential implications of our findings regarding UPR activation and proteasome activity in germ cells.

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc. Is this a normal way to present fluorescent microscopy?

      The apparent saturation was not present in the original images, but likely arose from image compression during PDF generation. While the EMA granule was still apparent, in the revised submission, we will provide high-resolution TIFF files to ensure accurate representation of fluorescence intensity and will carefully optimize image display settings to avoid any saturation artifacts.

      (2) The authors should ensure that all claims regarding enrichment/lower vs. lower values have indicated statistical tests.

      We fully agree. In the revised version, we will correct any quantitative comparisons where statistical tests were not already indicated, with a clear statement of the statistical tests used, including p-values in figure legends and text.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs. 6 cyst numbers?

      We acknowledge that the description was not sufficiently detailed. Indeed, the test was not between 2 vs 6 cyst numbers, but between all possible ways 8-cell cysts or the larger cysts studied could fragment randomly into two pieces, and produce by chance 6-cell cysts in 13 of 15 observed examples. We will expand the legend and main text to clarify that a binomial test was used to determine that the proportion of cysts producing 6-cell fragments differed very significantly from chance.

      Revised text:

      “A binomial test was used to assess whether the observed frequency of 6-cell cyst products differed from random cyst breakage. Production of 6-cell cysts was strongly preferred (13/15 cysts; ****p < 0.0001).”

      (b) Figures 4d and 4e do not have a statistical test indicated.

      We will include the specific statistical test used and report the corresponding p-values directly in the figure legends.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      We will revise the abstract to better clarify the findings of this study. We will also replace the term Visham with mouse fusome to reflect its functional and structural analogy to the Drosophila and Xenopus fusomes, making the narrative more coherent and conclusive.

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      We appreciate this observation of a problem that occurred inadvertently when shortening an earlier version.  We will add 3–4 relevant references to appropriately support this section.

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      We will ensure that all acronyms are spelled out at first mention (e.g., Unfolded Protein Response (UPR), Endosome and Golgi-Associated Degradation (EGAD)).

      (6)  The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      We are not jumping between topics, but following a narrative relevant to the central question of whether female mouse germ cells develop using a fusome.  EMA, microtubule fragmentation, polarization proteins, ER, and balbiani body are all topics with a known connection to fusomes. This is explained in the general introduction and in relevant subsections. We appreciate this feedback that further explanations of these connections would be helpful. In the revised manuscript, use of the unified term mouse fusome will also help connect the narrative across sections.  UPR/ERAD/EGAD are processes that have been studied in repair and maintenance of somatic cells and in yeast meiosis.  We show that the major regulator XbpI is found in the fusome, and that the fusome and these rejuvenation pathway genes are expressed and maintained throughout oogenesis, rather than only during limited late stages as suggested in previous literature.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

      We believe this statement is accurate given the broad scope of the term "participates." It is supported by localization of the UPR regulator XbpI to the fusome. XbpI is the ortholog of HacI a key gene mediating UPR-mediated rejuvenation during yeast meiosis.  We also showed that rejuvenation pathway genes are expressed throughout most of meiosis (not previously known) and expanded cytological evidence of stage-specific organelle rejuvenation later in meiosis, such as mitochondrial-ER docking, in regions enriched in fusome antigens. However, we recognize the current limitations of this evidence in the mouse, and want to appropriately convey this, without going to what we believe would be an unjustified extreme of saying there is no evidence. 

      Reviewer #2 (Public Review):

      We thank the reviewer for the comprehensive summary and for highlighting both the technical achievement and biological relevance of our study. We greatly appreciate the thoughtful suggestions that have helped us refine our presentation and terminology.

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title “Mouse germline cysts contain a fusome-like structure that mediates oocyte development”

      We will change the statement to: “Mouse germline cysts contain a fusome that supports germline cyst polarity and rejuvenation.”

      (1b) Result title “Visham overlaps centrosomes and moves on microtubules” We acknowledge that “moves” implies dynamics. We will include additional supplementary images showing small vesicular components of the mouse fusome on spindle-derived microtubule tracks.

      (1c) Result title “Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation”

      We will revise this title to: “The mouse fusome associates with the UPR regulatory protein Xbp1 beginning at the onset of cyst formation” to reflect the specific UPR protein that was immunolocalized. 

      (1d) Result title “Visham participates in organelle rejuvenation during meiosis”

      We will revise this to: “The mouse fusome persists during organelle rejuvenation in meiosis.”

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      We appreciate the reviewer’s insightful comment. To maintain conceptual clarity and align with existing literature, we will refer to the structure as the mouse fusome throughout the manuscript, avoiding introduction of a new term.

      Reviewer #3 (Public Review):

      We thank the reviewer for emphasizing the importance of our study and for providing constructive feedback that will help us clarify and strengthen our conclusions.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule" 

      We agree that the enrichment of Golgi within the EMA PGCs was not shown until the next section. We will revise this heading to:

      “PGCs contain an asymmetric EMA granule.”

      (2)  Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      We will clarify that this identification is based on co-localization with Golgi markers (GM130 and GS28) and response to Brefeldin A treatment, which will be included as supplementary data. These findings support that the mouse fusome is Golgi-derived and can therefore be visualized by EM. The Golgi regions in E13.5 cyst cells move close together and associate with ring canals as visualized by EM (Figure 1E), the same as the mouse fusomes identified by EMA.

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      Space is at a premium in these figures, but we have no limitation on data documenting this absolutely clear co-localization. We will replace the existing images with high-resolution, non-compressed versions for the final figures to clearly illustrate the co-staining patterns for GM130 and Rab11a1.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      We will include additional supplementary data showing small mouse fusome vesicles aligned along microtubules.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      We agree and will revise the heading to: “Mouse fusome associates with the polarity protein Pard3.” We are adding data showing association of small fusome vesicles on microtubules.  

      (6)  Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      UPR genes are not upregulated in DAZ in the sense we have never documented them increasing. We show that UPR genes during this time behave like pleuripotency genes and normally decline, but in DAZ mutants their decline is slowed.  We will rephrase the paragraph to clarify that Dazl mutation partially decouples developmental processes that are normally linked, which alters UPR gene expression relative to cyst development.

      (7) Line 257-259-wave 1 and 2 follicles need to be explained in the introduction, and how these fits with the observations here clarified.

      Follicle waves are too small a focus of the current study to explain in the introduction, but we will request readers to refer to the cited relevant literature (Yin and Spradling, 2025) for further details.

      We sincerely thank all reviewers for their insightful and constructive feedback. We believe that the planned revisions—particularly the refined terminology, improved image quality, clarified statistics, and restructured abstract—will substantially strengthen the manuscript and enhance clarity for readers.

    1. eLife Assessment

      The authors combine experiments and mathematical modeling to determine how the infectivity of human cytomegalovirus scales with the viral concentration in the inoculum, i.e., considering the multiplicity of infection (MOI). They propose and test different model assumptions to explain a mechanism termed "apparent cooperativity" of virions based on an observed super-linear increase in the number of infected cells with increasing inocula. The authors present a solid study showing valuable findings for virologists and quantitative scientists working on the analysis and interpretation of viral infection dynamics. Some of the presented aspects would benefit from additional clarification.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      Suggestions to clarify the study:

      (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale".

      (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection).

      (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times?

      (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided.

      (5) In algorithm 1, how does P_i, as defined, relate to equation 1?

      (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?

      (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels.

      (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test.

    3. Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where one virion is required to infect a cell, does not match empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      They first used a very simple experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. From this, they could elegantly show how the proportion of infected cells differed from a "single hit" model, which they simulated using a simple mathematical model ("powerlaw model"), and better fit a model where virions need to cooperate to infect cells. They then explore which mechanism could explain this apparent cooperation:

      (1) Stochasticity alone cannot explain the results, although I am unsure how generalizable the results are, because the mathematical model chosen cannot, by design, explain such observations only by stochasticity.

      (2) Virion clumping seemed not to be enough either to generally explain such a pattern. For that, they first use a mathematical model showing that the apparent cooperation would be small. However, I am unsure how extreme the scenario of simulated virion clumping is. They then used dynamic light scattering to measure the distribution of the sizes of clumps. From these estimates, they show that virion clumps cannot reproduce the observed virion cooperation in serial dilution assays. However, the authors remain unprecise on how the uncertainty of these clumps' size distribution would impact the results, as most clumps have a size smaller than a single virion, leaving therefore a limited number of clumps truly containing virions.

      The two models remain unidentifiable from each other but could explain the apparent virion cooperativity: either due to an increase in susceptibility of the cell each time a virion tries to infect it, or due to viral compensation, where lesser fit viruses are able to infect cells in co-infection with a better fit virion. Unfortunately, the authors here do not attempt to fit their mathematical model to the experimental data but only show that theoretical models and experimental data generate similar patterns regarding virion apparent cooperation.

      Finally, the authors show that this virions cooperation could make the relationship between the estimated multiplicity of infection and viruses/cell deviate from the 1:1 relationship. Consequently, the dilution of a virion stock would lead to an even stronger decrease in infectivity, as more diluted virions can cooperate less for infection.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between contexts seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge, this is the first time that it was explored for a nonsegmented virus, and in the context of MOI estimation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.

      Strengths:

      (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.

      (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.

      (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity.

      Weaknesses:

      (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution.

      (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.

      (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls.

      (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available.

      (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      Suggestions to clarify the study:

      (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale". 

      This is a good point. In our revision, we will include a clarification to indicate that linear on the log-log scale relationship does not imply linear relationship on the linear-linear scale.

      (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection). 

      In our models we assumed cells to be independent of each other (see also responses to other similar points). Because we measure infection in individual cells, assuming cells are independent is a reasonable first approximation. However, the reviewer makes an excellent point that there may be some between-cell signaling happening in the culture that “alerts” or “conditions” cells to change their “resistance”. It is also possible that at higher genome/cell numbers, exposure of cells to virions or virion debris may change the state of cells in the culture, and more cells become “susceptible” to infection. This is a good point that we will list in Limitations subsection of Discussion; it is a good hypothesis to test in our future experiments.

      (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times? 

      Each cell has only one chance to be infected. Algorithm 1 clearly states that; we will add an extra sentence in “Agent-based simulations” to indicate this point.

      (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided. 

      This is a good point. We did not remove the clump if one of the virions in the clump manages to infect a cell, and indeed, this could be the reason why in some simulations we observe apparent cooperativity when modeling viral clumping. This is something we will explore in our revision.

      (5) In algorithm 1, how does P_i, as defined, relate to equation 1? 

      These are unrelated because eqn.(1) is a phenomenological model that links infection per cell to genomes per cell. P_i in algorithm 1 is “physics-inspired” potential barrier.

      (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?

      Indeed, it was a typo, p(1)=1-exp(-lambda) per eqn 1. Thank you, it will be corrected in the revised paper.

      (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels. 

      We do not know how exposure of a cell to one virion would change its “antiviral state”, i.e., to become more or less resistant to the next infection. If a cell becomes more resistant, there is no possibility to observe apparent cooperativity in infection of cells, so this hypothesis cannot explain our observations with n>1. Whether this mechanism plays a role in saturation of cell infection rate at lower than 1 value when genome/cell is large is unclear but is a possibility. We will add this point to Discussion in revision.

      (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test. 

      We repeated the analyses in Fig 3 with Mann-Whitney test, results were the same, so we would like to keep results from the t-test in the paper.

      Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where one virion is required to infect a cell, does not match empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      They first used a very simple experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. From this, they could elegantly show how the proportion of infected cells differed from a "single hit" model, which they simulated using a simple mathematical model ("powerlaw model"), and better fit a model where virions need to cooperate to infect cells. They then explore which mechanism could explain this apparent cooperation:

      (1) Stochasticity alone cannot explain the results, although I am unsure how generalizable the results are, because the mathematical model chosen cannot, by design, explain such observations only by stochasticity. 

      Our null model simulations are not just about stochasticity; they also include variability in virion infectivity and cell resistance to infection. We agree that simulations cannot truly prove that such variability cannot result in apparent cooperativity; however, we also provide a mathematical proof that increase in frequency of infected cells should be linear with virion concentration at small genome/cell numbers.

      (2) Virion clumping seemed not to be enough either to generally explain such a pattern. For that, they first use a mathematical model showing that the apparent cooperation would be small. However, I am unsure how extreme the scenario of simulated virion clumping is. They then used dynamic light scattering to measure the distribution of the sizes of clumps. From these estimates, they show that virion clumps cannot reproduce the observed virion cooperation in serial dilution assays. However, the authors remain unprecise on how the uncertainty of these clumps' size distribution would impact the results, as most clumps have a size smaller than a single virion, leaving therefore a limited number of clumps truly containing virions. 

      As we stated in the paper, clumping may explain apparent cooperativity in simulations depending on how stock dilution impacts distribution of virions/clump. This could be explored further, however, better experimental measurements of virions/clump would be highly informative (but we do not have resources to do these experiments at present). Our point is that the degree of apparent cooperativity is dependent on the target cell used (n is smaller on epithelial cells than on fibroblasts) that is difficult to explain by clumping which is a virion property. Per comment by reviewer 1, we will do some more analyses of the clumping model to investigate importance of clump removal per successful infection on the detected degree of apparent cooperativity.

      The two models remain unidentifiable from each other but could explain the apparent virion cooperativity: either due to an increase in susceptibility of the cell each time a virion tries to infect it, or due to viral compensation, where lesser fit viruses are able to infect cells in co-infection with a better fit virion. Unfortunately, the authors here do not attempt to fit their mathematical model to the experimental data but only show that theoretical models and experimental data generate similar patterns regarding virion apparent cooperation. 

      In the revision we will provide examples of simulations that “match” experimental data with a relatively high degree of apparent cooperativity; we have done those before but excluded them from the current version since they are a bit messy. Fitting simulations to data may be an overkill.

      Finally, the authors show that this virions cooperation could make the relationship between the estimated multiplicity of infection and viruses/cell deviate from the 1:1 relationship. Consequently, the dilution of a virion stock would lead to an even stronger decrease in infectivity, as more diluted virions can cooperate less for infection.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between contexts seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge, this is the first time that it was explored for a nonsegmented virus, and in the context of MOI estimation. 

      Thank you.

      Reviewer #3 (Public review): 

      Summary:

      The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.

      Strengths:

      (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.

      (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.

      (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity. 

      Thank you.

      Weaknesses:

      (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution. 

      This is an important point. We do believe that the current knowledge about HCMV virion production time – it takes 3-4 days to make virions per multiple papers (see Fig 7 in Vonka and Benyesh-Melnick JB 1966; Fig 3B in Stanton et al JCI 2010; and Fig 1A in Li et al. PNAS 2015) – is sufficient to justify our experimental design but we do agree that an additional control to block novel infections with would be useful. We had previously performed experiments with a HCMV TB-gL-KO that cannot make infectious virions (but the stock virions can be made from complemented target cells). We will investigate if our titration experiments with this virus strain have sufficient resolution to detect apparent cooperativity. However, at present we do not have the resources to perform novel experiments.  

      (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism. 

      Excellent suggestion. Because infection of a cell is a result of the joint viral infectivity and cell resistance, it may be hard to discriminate between these alternatives unless we specify them as particular molecular mechanisms. But we will try our best and list potential future experiments in the revised version of the paper.

      (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls. 

      Another good point. We do not have a good explanation, but we do not believe this is because of saturation of available target cells.  It seemed to only happen (or was most pronounced) with the ME stocks, which are typically lower in titer and so the higher MOI were nearly undiluted stock. It may be the effect of the conditioned medium.  Or perhaps there are non-infectious particles like dense bodies (enveloped particles that lack a capsid and genome) and non-infectious, enveloped particles (NIEPs) that compete for receptors or otherwise damage cells and these don’t get diluted out at the higher doses.  We plan to include these points in Discussion of the revised version of the paper.

      (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available. 

      This is a model-simulated scenario, and as such, there is no measured titers.

      (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users. 

      Good suggestion. We will include best-practice box using guidelines developed in Ryckman lab over the years in the revised version of the paper.

      Overall note to all reviews: We have deposited our codes and the data on github; yet, none of the reviewers commented on it.

    1. eLife Assessment

      This manuscript reports on the application of ribosome profiling (EZRA-seq and eRF1-seq) combined with massively parallel reporter assays to identify and characterize a GA-rich element associated with ribosome pausing during translation termination. While the development of eRF1-seq is useful and the identification of GA-rich elements upstream of stop codons is convincing, the level of support for other claims is inadequate. Specifically, the evidence that GA-rich sequences upstream of stop codons can base-pair with the 3′ end of 18S rRNA to prolong ribosome dwell time, and the evidence that Rps26 interferes with this interaction to regulate translation termination, are not adequate.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use high-resolution ribosome profiling (Ezra-seq) and eRF1 pulldown-based ribosome profiling (eRF1-seq) developed in their lab to identify a GA rich sequence motif located upstream of the stop codon responsible for translation termination pausing. They then perform a massively parallel assay with randomly generated sequences to further characterize this motif. Using mouse tissues, they show that termination pausing signatures can be tissue-specific. They use a series of published ribosome structures and 18S rRNA mutants, and eS26 knockdown experiments to propose that the GA rich sequence interacts with the 3′-end of the 18S rRNA.

      Strengths:

      (1) Robust ribosome profiling data and clear analyses clarify the subtle behavior of terminating ribosomes near the stop codon.

      (2) Novel termination or "false termination" sites revealed by eRF1-seq in the 5′-UTR, 3′-UTR, and CDS highlight a previously underappreciated facet of translation dynamics.

      Weakness:

      (1) Modest effects seen in ABCE1 knockdown do not seem to add up to the level of regulation. The authors state "ABCE1 regulates terminating ribosomes independent of the sequence context" on pg 9, and "ABCE1 modulates termination pausing independent of the mRNA sequence context" in the figure caption for Figure S4. Given the modest effect of the knockdown, such phrasing is most likely not supported. Further clarification of "ABCE1 plays a generic role in translation termination" is necessary.

      (2) The authors propose that the GA rich sequence element upstream of the stop codon on the mRNA could potentially base pair with the 3′-end of the 18S rRNA. In the PDBs the authors reference in their paper and also in 3JAG, 3JAH, 3JAI (structures of terminating ribosomes with the stop codon in the A-site and eRF1), the mRNA exiting the ribosome and the 3′-end of the 18S rRNA are about 25-30 A apart. In addition, a segment of eS26 is wedged in between these two RNA segments. This reviewer noted this arrangement in a random sampling of 5 other PDBs of mammalian and human ribosome 80S structures. How do the authors anticipate the base pairing they have proposed to occur in light of these steric hindrances? RpsS26 is known to be released by Tsr2 in yeast during very specific stresses. Is it their expectation that termination pausing in human/mammalian cells happens during stressful conditions only?

      (3) The authors say, "It is thus likely that mRNA undergoes post-decoding scanning by 18S rRNA." (pg. 10). It is unclear what the authors mean by "scanning." Do they mean that the mRNA gets scanned in a manner similar to scanning during initiation? There is no evidence presented to support that particular conclusion.

      (4) Role of termination pausing in the testis is highly speculative. The authors state: "It is thus conceivable that the wide range of ribosome density at stop codons in testis facilitates functional division of ribosome occupancy beyond the coding region." It is unclear what type of functional division they are referring to.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents results interpreted to indicate that sequences upstream of stop codons capable of base-pairing with the 3' end of 18S rRNA prolong the dwell time of 80S ribosomes at stop codons in a manner impeded by Rps26 in the 40S subunit exit channel, which leads to the proper completion of termination and ribosome recycling and prevents spurious translation of 3'UTR sequences by one or more unconventional mechanisms.

      Strengths:

      The standard 80S and selective eRF1 80S ribosome profiling data obtained using EZRA-Seq are of high quality, allowing the authors to detect an enrichment for purine-rich sequences upstream of stop codons at sites where termination is relatively slow and ribosomal complexes are paused with eRF1 still engaged in the A site.

      Weaknesses:

      There are many weaknesses in the experimental design, interpretation of results, and description of assay design and assumptions, the data obtained, and the interpretation of results, all of which detract from the scientific quality and significance of this work. In fact, a large proportion of paragraphs in the text and figure panels present some difficulty either in understanding how the experiment or data analysis was conducted or what the authors wish to conclude from the results, or that stem from an overinterpretation of findings or failure to consider other equally likely explanations.

    4. Reviewer #3 (Public review):

      Summary:

      This study from Jia et al carried out a variety of analyses of terminating ribosomes, including the development of eRF1-seq to map termination sites, identification of a GA-rich motif that promotes ribosome pausing, characterization of tissue-specific termination dynamics, and elucidation of the regulatory roles of 18S rRNA and RPS26. Overall, the study is thoughtfully designed, and its biological conclusions are well supported by complementary experiments. The tools and datasets generated provide valuable resources for researchers investigating the mechanisms of RNA translation.

      Strengths:

      (1) The study introduces eRF1-seq, a novel approach for mapping translation termination sites, providing a methodological advance for studying ribosome termination.

      (2) Through integrative bioinformatic analyses and complementary MPRA experiments, the authors demonstrate that GA-rich motifs promote ribosome pausing at termination sites and reveal possible regulatory roles of 18S rRNA in this process.

      (3) The study characterizes tissue-specific ribosome termination dynamics, showing that the testis exhibits stronger ribosome pausing at stop codons compared to other tissues. Follow-up experiments suggest that RPS26 may contribute to this tissue specificity.

      Weaknesses:

      The biological significance of ribosome pausing regulation at translation termination sites or of translational readthrough, for example, across different tissue types, remains unclear. Nevertheless, this question lies beyond the primary scope of the current study.

    5. Reviewer #4 (Public review):

      Summary:

      This manuscript by Qian and colleagues utilizes ribosome profiling, and reporter assays to dissect translation termination. Unfortunately, the data do not support the conclusions of the paper, controls are missing and several assays are not well validated and do not reproduce previous findings from others.

      Specific comments:

      • Translation termination has been studied in several organisms including mammalian cells and yeast. In those cases what is analyzed is not the peak height at the stop codon, but rather the difference in the ribosome density before and after the stop. Thus, analyzing peak height is not validated. I understand that this is relevant only for the ribosome profiling experiments (and Ezra-seq) not the RF1 profiling. But much of the data was acquired that way.

      • Moreover, the data do not reproduce previous findings and no effort is made to connect them to previous data. Previous data has shown that stop codon efficacy varies. This is not reproduced (S1C). Similarly, an effect from the +1 residue is not reproduced. The data isn't even stratified by different stop codons as previous work has shown that different surrounding residues have different effects in the context of different stop codons. Thus, none of the sequencing data is validated or trusted and does not reproduce previous findings.

      • The GA-rich sequence identified by Ezra-Seq and RF1 seq is not the same and it differs from previous sequences (Wangen &Green).

      • The authors claim that the majority of Rf1 peaks is at stop codons, but that is not true. It is only about 30% of the peaks. Also, not all mRNAs have peaks at the stop codons. That is at best problematic. Finally, there are mRNAs that are known to "suffer" from NMD, what do these look like in the Ezra-Seq and RF1-Seq? How about mRNAs that have programmed frameshifts? This raises questions on the validity of the eRF1 data.

      • Figure 4: First, instead of M/P ratio, one should analyze M/M+P, to normalize out differences in the loading and effects from collisions, which are guaranteed to occur here, but not considered or analyzed. Second, the data are analyzed as if what matters are codons in the P and E site (and beyond, where there are definitely NOT recognized codons). While there is evidence for some interactions, one would think that an additional analysis based on sequence would be helpful. Also, the supplemental data indicates that very rarely are there reciprocal changes (as should be the case), and as seen for stop codons.

      • Regarding the HiBit reporter assay: The two sequecnes clearly have effects on translation without considering stop codon context (Figure 4C), which need to be taken into account. Also, the effect from the sequences varies in the context of the assay in 4C and 4D (2-fold vs .5 fold), further questioning the assay. Moreover, the authors claim that re-initiation cannot account for Hibit levels, but that is clearly incorrect. The western in Figure 4E does not reproduce the data in 4D. While Hibit goes up (as in 4D, the putative GFP-fusion goes down. Finally, while the second reading frame should be more efficient is not explained and further argues for an artifact. Previous work (and work herein) suggests that read-through occurs equally in each reading frame. No controls for these assays are presented: e.g. stimulation by antibiotics, ABCE1 depletion, etc.

      • Figure 5 has similar problems. I don't understand how the Figure in 5A is made, but when you overlay the cited structures on Rps26, the molecules are identical. I guess the authors used some fantasy to build non-existing sequences differently into the structure. There is no basis for that. In panel C and the same in Figure 7, the number of analyzed mRNAs varies. This could influence the outcome and the EXACT same set of mRNAs should be analyzed. But the main problem here is that the authors need to analyze readthrough and not peak height as detailed above. Essential controls are missing that show what fraction of the 18S rRNA is mutated. Previous work has shown that 2 nt truncated 18S rRNA is actively degraded. It is hard to believe how 15% of altered ribosomes can abolish 100% of the effect from the C-rich sequences. Important validation is missing: the authors should analyze rRNA sequences in their ribo-seq dataset to demonstrate that they have the mutated rRNAs, and that these enrich and de-enrich as predicted.

      • In Figure 5-7 the authors develop a model that the sequence selectivity arises from base pairing between 18S rRNA and the mRNA. If so, then they should really stratify the data by number of WC pairs that can be formed. And only WC pairs, as GU pairs have a totally different geometry that will likely be discriminated against in this context. Also, the mutation is in a part of the helix that has no effect (Figure S3G). Thus, the data within the manuscript are inconsistent.

      • Figure 6 does not agree with published data (Li et al., Nature 2022). Previous work did not show testis-depletion of Rps26 in purified ribosomes. This is the critical difference as the authors here did not purify ribosomes. Also, another Rps is an essential control, even if purified ribosomes are used. The validity of this dataset is thus questionable . Depletion from polysomes is hard to believe, as overall there is less signal in the polysomes.

      • Figure 7 has similar problems as figure 5. Different pools of mRNAs are analyzed; peak height is not validated. Overexpression of Rps26 is not shown, as only Myc is shown, not Rps26. Beyond that, increased occupancy in ribosomes needs to be shown for the effect to come from ribosomes. Given how sick the cells are it is most likely that all effects are secondary and arise from whatever else is going on in the overexpression or depletion of Rps26. No controls are presented to show specific effects from Rps26.

      • The authors need to check Rli1/ABCE levels in their cells. Their data have features that are indicative of low ABCE1 levels. These include a very small effect from ABCE1 depletion. These could be responsible for some of the effects they observe.

    6. Author response:

      We thank the editor and reviewers for their thoughtful feedback. We agree with eLife’s overall assessment that, while profiling terminating ribosomes is informative in revealing termination dynamics, the underlying mechanisms require more evidence. Our revision will focus on three conceptual points.

      (1) We will tone down the statement that putative mRNA:rRNA interaction contributes to sequence-specific termination pausing.

      (2) We will clarify the potential role of Rps26 in regulating translation termination.

      (3) We will expand the discussion of tissue-specific termination pausing.

      Reviewer #1 (Public Review):

      (1) We admit that the modest effects of ABCE1 were partly due to the incomplete ABCE1 knockdown in HEK293 cells. Since the elevated ribosome density occurred at all stop codons, we argue that the action of ABCE1 is likely independent of the sequence context. We will rephrase relevant statements in the revised manuscript.

      (2) In terms of Rps26 structures, we agree the structural rearrangement in the absence of Rps26 is highly speculative. However, we do not believe the Rps26 stoichiometry is solely dependent on stress. We will clarify this important point in the revised manuscript.

      (3) We apologize for the confusion about 18S rRNA “scanning” and will revise the sentence in the main text.

      (4) We agree that functional significance of testis-specific termination dynamics is unclear. Since other reviewers raised similar concern, we will expand the discussion of tissue-specific termination pausing in the revised manuscript.

      Reviewer #2 (Public Review):

      We appreciate the Reviewer’s time and efforts in reviewing our manuscript. We are grateful for the insightful comments and many recommendations made by the reviewer to improve our manuscript. We feel that the reviewer may have some misunderstanding in terms of the sequence motif associated with the termination pausing, partly because of the lack of clarity in our original description of the results from MPRA and reporter assays. We will ensure that the reviewer’s points are fully addressed in the revised manuscript.

      Reviewer #3 (Public Review):

      We thank the reviewer’s positive comment on our manuscript. We agree that the tissue-specific termination differences were poorly described in the main text. Notably, other reviewers raised similar concerns. We will expand the relevant discussion in the revised manuscript, outlining this as a limitation and a future direction.

      Reviewer #4 (Public Review):

      We believe the reviewer mixed xthe public view with recommendation comments. The reviewer appears to be preoccupied by previous studies and questioned some inconsistency in our results. With the development of new technology such as eRF1-seq, we are encouraged to present “new” and “different” findings. All other reviewers appreciate the development of eRF1-seq to profile terminating ribosomes. In fact, we do not believe our data is fundamentally different from the established principles. Rather, our data provides new perspectives to further our understanding of ribosome dynamics at stop codons. We thank the reviewer for understanding.

      The reviewer is quite confused by our sequencing analysis based on peak height, or read density, which is commonly used to infer ribosome dynamics such as pausing. Regarding the sequencing analysis and reporter assays in cells expressing 18S mutant (Figure 5) and Rps26 (Figure 7), we feel that the reviewer has some misunderstanding. In the revised manuscript, we will do our best to clarify those relevant issues. Finally, the reviewer’s comment on base pairing is well-received and we will thoroughly revise the main text and discussion in the revised manuscript.

    1. eLife Assessment

      The authors investigated the epigenetic mechanisms regulating the differentiation of circulating monocytes that infiltrate the CNS and adopt microglia-like characteristics. The work is useful to the field, as the contribution of circulating myeloid cell-derived microglia remains controversial. However, the evidence presented is inadequate as the analyses are based on a very limited set of genes, which does not sufficiently support the authors' central claims.

    2. Reviewer #1 (Public review):

      Microglia are mononuclear phagocytes in the CNS and play essential roles in physiology and pathology. In some conditions, circulating monocytes may infiltrate in the CNS and differentiated into microglia or microglia-like cells. However, the specific mechanism is large unknown. In this study, the authors explored the epigenetic regulation of this process. The quality of this study will be significantly improved if a few questions are addressed.

      (1) The capacity of circulating myeloid cell-derived microglia are controversial. In this study, the authors utilized CX3CR1-GFP/CCR2-DsRed (hetero) mice as a lineage tracing line. However, this animal line is not an appropriate approach for this purpose. For example, when the CX3CR1-GFP/CCR2-DsRed as the undifferentiated donor cell, they are GFP+ and DsRed+. When the cell fate has been changed to microglia, they will change into GFP+ and DsRed- cells. However, this process is mediated with busulfan and artificially introduced bone marrow cells in the circulating cell, which is not existed in physiological and pathological conditions. These artifacts will potentially bring in artifacts and confound the conclusion, as the classical wrong text book knowledge of the bone marrow derived microglia theory and subsequently corrected by Fabio Rossi lab1,2. This is the most risk for drawing this conclusion. The top evidence is from the parabiosis animal model. Therefore, A parabiosis study before making this conclusion, combining a CX3CR1-GFP (hetero) mouse with a WT mouse without busulfan conditioning and looking at whether there are GFP+ microglia in the GFP- WT mouse brain. If there are no GFP+ microglia, the author should clarify this is not a physiological or pathological condition, but a defined artificial host condition, as previously study did3.

      (2) In some conditions, peripheral myeloid cells can infiltrate and replace the brain microglia4,5. Discuss it would be helpful to better understand the mechanism of microglia replacement.

      References:

      (1) Ajami, B., Bennett, J.L., Krieger, C., Tetzlaff, W., and Rossi, F.M. (2007). Local self-renewal can sustain CNS microglia maintenance and function throughout adult life. Nature neuroscience 10, 1538-1543. 10.1038/nn2014.

      (2) Ajami, B., Bennett, J.L., Krieger, C., McNagny, K.M., and Rossi, F.M.V. (2011). Infiltrating monocytes trigger EAE progression, but do not contribute to the resident microglia pool. Nature neuroscience 14, 1142-1149. http://www.nature.com/neuro/journal/v14/n9/abs/nn.2887.html#supplementary-information.

      (3) Mildner, A., Schmidt, H., Nitsche, M., Merkler, D., Hanisch, U.K., Mack, M., Heikenwalder, M., Bruck, W., Priller, J., and Prinz, M. (2007). Microglia in the adult brain arise from Ly-6ChiCCR2+ monocytes only under defined host conditions. Nature neuroscience 10, 1544-1553. 10.1038/nn2015.

      (4) Wu, J., Wang, Y., Li, X., Ouyang, P., Cai, Y., He, Y., Zhang, M., Luan, X., Jin, Y., Wang, J., et al. (2025). Microglia replacement halts the progression of microgliopathy in mice and humans. Science 389, eadr1015. 10.1126/science.adr1015.

      (5) Xu, Z., Rao, Y., Huang, Y., Zhou, T., Feng, R., Xiong, S., Yuan, T.F., Qin, S., Lu, Y., Zhou, X., et al. (2020). Efficient strategies for microglia replacement in the central nervous system. Cell reports 32, 108041. 10.1016/j.celrep.2020.108041.

    3. Reviewer #2 (Public review):

      Mouse fate mapping studies have established that the bulk of microglia derives from cells that seed the brain early during development. However, monocytes were also shown to give rise to parenchymal CNS macrophages and thus are potential candidates for microglia replacement therapy. Whether monocyte-derived cells adopt bona fide microglia identities has remained under debate. The study of Liu et al addresses this important outstanding question, focusing on the retina.

      Specifically, the authors investigate monocyte-derived macrophages that arise upon challenges in the murine retina using scRNAseq and ATACseq analyses, combined with flow cytometry and histology. They complement this approach with an analysis of BM chimeras and analyses of the latter. The authors conclude that monocyte-derived cells acquire markers that have originally been proposed to be microglia-specific, including P2ry12, Tmem119, and Fcrls.

      In 2018, four comprehensive independent studies reported the analyses of monocyte-derived CNS macrophages (PMID 30451869, 30523248, 29643186, 29861285). Following transcriptome and epigenome analyses, these teams came to the collective conclusion that HSC-derived cells remain distinct from microglia. Using advanced fate mapping and better isolation and profiling tools, a more recent study, however, concluded that, if given sufficient time of CNS residence, most monocyte-derived macrophages can, at the transcriptome level, become essentially identical to microglia (PMID 40279248, https://www.biorxiv.org/content/10.1101/2023.11.16.567402v1).

      Given this controversy, the study of Paschalis and colleagues, which focuses largely on retinal monocyte-derived cells, could have been a valuable resource and complement for clarification. Indeed, interestingly, their data suggest that microglia adaptation of monocyte-derived macrophages might be faster in the retina than in the CNS. However, for the reasons outlined below, the study falls in its present form short of providing significant new insight and is a missed opportunity.

      Comments:

      The major shortcoming of the study is that the authors decided to focus on a very limited number of genes to make their case, rather than performing a more informative, unbiased, and detailed global analysis. In contrast to what the authors state, much of the microglia community is, I believe, aware of experimental limitations and the problem with markers. Showing gain of microglia marker expression on monocyte-derived cells, or loss of monocyte markers, such as Ly6C, is not novel.

      This is highlighted Fig. 3F. No one argues today that monocyte-derived tissue macrophages differ from blood monocytes (although the authors repeatedly emphasize this as novelty). However, the heatmap shows that the engrafted cells clearly differ from naïve and injured microglia. What are these genes, their associated pathways ?

      Also, how about expression of the Sall1 gene that encodes a repressor that is considered important to maintain microglia identity (PMID37322178, 27776109). Somewhat surprisingly, Sall1 was recently also shown to be expressed by monocyte-derived CNS macrophages (PMID 40279248). It would be valuable information if the authors can corroborate this finding.

      The authors state in their discussion that monocyte-derived macrophages seem 'hardwired for inflammatory responses'. While this is an interesting suggestion, the NFkB motif enrichment is insufficient and should be complemented with a target list. Again, it would be important to be aware of heterogeneity.

      A critical factor when analyzing CNS macrophages is the exclusion of perivascular CNS border-associated cells, which also holds for the retina (see PMID 38596358). This should be addressed. Can the authors discriminate BAM from microglia in their scRNAseq data set, for instance, by their CD206 expression or other markers ? BAM have been shown to display distinct transcriptomes and even as a contamination could introduce significant bias.

      Even for the genes the authors focus on, it is hard to understand from the way the authors present the data what fraction of cells are positive. This would be critical information since there could be some heterogeneity. Flowcytometry analysis, including double staining for P2ry12, Tmem119, and Fcrls to see correlations, would here be valuable.

      The authors state in their title that 'epigenetic adaptation drives monocyte differentiation'. However, since all gene expression is governed by the epigenome, this is trivial. I would argue that to gain meaningful insight and justify such a statement, it would require an in-depth global comparative analysis of the chromatin status of yolk sac microglia and monocyte-derived CNS macrophages, including CUT&RUN analysis for specific histone marks and methylation patterns.

      Please cite and discuss PMID 30451869, 30523248, 29643186, 29861285, and in particular the more recent highly relevant study PMID 40279248.

    1. eLife Assessment

      This study presents a role for heparin sulfate in SARS-CoV-2 entry that runs counter to prevailing data in the field. If the conclusions were firmly supported by the data, the work would be a significant contribution to the field. While the use of diverse cellular models, virological tools, and robust microscopy approaches constitutes a useful data set, the proposed model remains incomplete and requires clarification of entry mechanisms, host factors, and viral variant-specific fusion pathways to substantiate it against established entry models.

    2. Reviewer #1 (Public review):

      This paper investigates how heparan sulfate (HS) engagement functions in the cellular entry of SARS-CoV-2. A prevailing model that has been developed over the last five years by work from many laboratories using a variety of biochemical, structural, and microscopic approaches is that HS acts a co-receptor for SARS-CoV-2; its binding to SARS-CoV-2 both concentrates virus on the surface of target cells and allosterically alters the spike protein to promote an "up/open" RBD conformation that enables engagement of the proteinaceous receptor human ACE2 on the cell surface (PMID: 32970989, 35926454, 38055954, 39401361, 40548749). These two events enable plasma membrane fusion (after a cleavage event promoted by plasma membrane TMPSS2) or endocytosis and subsequent pH-dependent fusion (which requires a cathepsin L-mediated cleavage of the spike).

      The authors in this study used a series of microscopy techniques, labeled pseudoviruses and authentic SARS-CoV-2 strains, and cells lacking or expressing HS and/or hACE2 to re-examine the specific stage(s) HS and hACE2 function in the entry process. They suggest that HS mediates SARS-CoV-2 cell-surface attachment and endocytosis, and that hACE2 functions "downstream" of this to facilitate productive infection. Their results also suggest that SARS-CoV-2 binds clusters of HS molecules projecting 60-410 nm, which act as docking sites for viral attachment. Blocking HS binding with pixantrone, a drug under clinical evaluation for cancer (due to its anti-topoisomerase II activity), inhibited SARS-CoV-2 Omicron JN.1 variant from attaching to and infecting human airway cells. The authors conclude that their work establishes a revised entry paradigm in which HS clusters mediate SARS-CoV-2 attachment and endocytosis, with ACE2 acting at some stage downstream. They speculate this idea might apply broadly to other viruses known to engage HS and has translational implications for developing antiviral agents that target HS interactions.

      The strengths of the interesting and technically well-executed study include the use of multiple high-resolution microscopy modalities, the tracking of labelled viruses, the use of both pseudoviruses and authentic SARS-CoV-2, and the use of primary airway cells. Nonetheless, there are issues that need to be addressed to buttress the proposed model compared to earlier ones. These include: (a) the distinction between macropinocytosis and receptor-mediated endocytosis and what this might mean for productive SARS-CoV-2 infection; (b) the need to account for TMPRSS2 expression and plasma membrane fusion; (c) addition of genetic studies in which hACE2 is expressed in cells lacking HS; (d) an unclear picture of exactly where downstream hACE2 functions; and (e) and a need for comparative/additional study of earlier SARS-CoV-2 variants, which preferentially fuse at the plasma membrane.

    3. Reviewer #2 (Public review):

      In this manuscript by Han et al, the authors assess the binding of SARS-CoV-2 to heparan sulfate clusters via advanced light microscopy of viral particles. The authors claim that the SARS-CoV-2 spike (in the context of pseudovirus and in authentic virus) engages heparan sulfate clusters on the cell surface, which then promotes endocytosis and subsequent infection. The finding that HSPGs are important for SARS-CoV-2 entry in some cell types is well-described, but the authors attempt to make the claim here that HS represents an alternative "receptor" and that HS engagement is far more important than the field appreciates. The data itself appears to be of appropriate quality and would be of interest to the field, but the overly generalized conclusions lack adequate experimental support. This significantly diminishes enthusiasm for this manuscript as written. The manuscript is imprecise and far overstates the actual findings shown by the data. Additional controls would be of great benefit.

      Further, it is this reviewer's opinion that the findings do not represent a novel paradigm as claimed. HS has been well described for SARS-CoV-2 and other viruses to serve as attachment factors to promote initial virus attachment. While the manuscript provides new insight into the details of this process, the manuscript attempts to oversell this finding by applying new words rather than new molecular details. The authors would be better served by presenting a more balanced and nuanced view of their interesting data. In this reviewer's opinion, the salesmanship significantly detracts from the data and manuscript.

      Major Comments:

      The authors need to rigorously define a "receptor" vs an "attachment factor." They also should avoid ambiguous terms such as "receptor underlying ...attachment" and "attachment receptor" (or at least clearly define them). Much of their argument hinges on the specific definition of these terms. This reviewer would argue that a receptor is a host factor that is necessary and sufficient for active promotion of viral entry (genome release into the cytoplasm), while an attachment factor is a host factor that enhances initial viral attachment/endocytosis but is neither necessary nor sufficient. The evidence does NOT implicate HS as a receptor under this fairly textbook definition. This is proven in Figure 1 (and elsewhere) in which ACE2 is absolutely required for viral entry.

      The authors should genetically perturb HS biosynthesis in their key assays to demonstrate necessity. HS biosynthesis genes have been shown to be important for SARS-CoV-2 entry into some cells but not others (Huh7.5 cells PMID 33306959, but not in Vero cells PMID 33147444, Calu3 cells 35879413, A549 cells 33574281, and others 36597481. The authors need to discuss this important information and reconcile it with their data and model if they want to claim that HS is broadly important.

      Is targeting HS really a compelling anti-viral strategy? The data show a ~5-fold reduction, which likely won't excite a drug company. The strengths and limitations of HS targeting should be presented in a more balanced discussion. Animal data showing anti-viral activity of PIX is warranted. This would enhance this claim and also provide key evidence of a relevant role for HS in a more physiologic model.

      The authors provide little discussion of the fact that these studies rely exclusively on cell lines (which also happen to be TMPRSS2-deficient). The role of proteases in the role of HS should be tested in the cell lines and primary cells used, as protease expression is a key determinant of the site of fusion.

      The claim that "SARS-CoV2 JN.1 variant binds to heparan sulfate, not hACE2, in primary human airway cells" is extraordinary and thus requires extraordinary evidence.

      First, PIX reduces attachment by 5-fold, which is not the same as "nearly abolished." Also, anti-ACE2 "nearly abolished" entry in 7D, while PIX did not. If the authors want to make these claims, an alternative method to disrupt HS (other than PIX) is needed in primary airway cells. A genetic approach would be much more convincing. The authors should also demonstrate whether entry in their primary cell assays is TMPRSS2 vs Cathepsin L dependent (using E64d and camostat, for instance) as mentioned above.

      Each figure should clearly state how many independent experiments and replicates per experiment were performed. What does "3 experiments" mean? Are these three independent experiments or three wells on one day?

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors define a new paradigm for the attachment and endocytosis of SARS-CoV-2 in which cell surface heparan sulfate (HS) is the primary receptor, with ACE2 having a downstream role within endocytic vesicles. This has implications for the importance of targeting virion-HS interactions as a therapeutic strategy.

      Strengths:

      The authors show that viruses are internalized via dynamin-dependent endocytosis and that endocytic internalization is the major pathway for pseudotyped SARS-CoV-2 genome expression. They show that HS-mediated viral attachment is a critical step preceding viral endocytosis and also subsequent genome expression. Further, they show that hACE2 acts downstream of endocytosis to promote viral infection, and may be co-internalised with virions after HS attachment. Pseudotyped virus and authentic SARS-CoV-2 provide similar results. In addition, the authors demonstrate that remarkable clusters of multiple HS chains exist on the cell surface, visualised by a number of elegant microscopy methods, and that these represent the docking sites for virions. These visualisations are an important general contribution in themselves to understanding the nanoscale interactions of HS at the cell surface.

      The use of a complementary range of methods, virus constructs, and cell models is a strength, and the results clearly support the conclusions.

      Overall, the results convincingly demonstrate a different model to the currently accepted mechanism in which the ACE2 protein is regarded as the cell surface receptor for SARS-CoV-2. Here, the authors provide compelling evidence that cell surface clusters of HS are the primary docking site, with ACE2 interactions occurring later, after endocytosis (whilst still being essential for viral genome expression). This is an exciting and important landmark evidence which supports the view that HS-virion interactions should be viewed as a key site for anti-viral drug targeting, likely in strategies that also target the downstream ACE2-based mechanism of viral entry within endosomes.

      Weaknesses:

      This reviewer identified only minor points regarding citing and discussing other studies and typos, which can be corrected.

    1. eLife Assessment

      This work details the finding that in at least one of the subunits of the heterohexameric chaperone complex Pfdn5 has additional functions beyond its contribution to cytoskeletal protein folding in Drosophila. The authors provide convincing evidence that it is a hitherto unknown microtubule associated protein in addition to regulating microtubule organization and levels of tubulin monomers. The important findings show that Pfdn5 loss exaggerates pathological manifestations of mutant human Tau bearing FTDP-17 linked mutations in Drosophila, while its overexpression suppresses them, suggesting that the latter may constitute a future therapeutic approach.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA-interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5 and convincingly demonstrated that this protein functions in regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing the Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377M-induced eye degeneration they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, even memory defects upon TauV377M expression.

      Strengths:

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5.

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010) pmid: 20071510, demonstrated that pan-neuronal expression of TauV377M disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner.

      If the mushroom bodies are defective upon TauV377M expression does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      The discovery that Pfdn5 (and 6 most likely) affect tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations, or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above are aiming at, will likely be helpful towards therapeutic interventions.

      Weaknesses:

      What is unclear however is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes.

      Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      Does Pfdn5 loss exacerbate TauV377M phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression?<br /> Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion.

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wild-type Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner.

      So, is TauV377M hyperphosphorylated in the larvae?? What happens to TauV377M phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tau-expressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of TauV377M (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which if true begs for a distinct mechanistic approach.

      Finally, the authors argue that TauV377M forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers and overexpression promotes the more benign large aggregates??

      Comments on revisions:

      In the revised manuscript Βisht et al have provided extensive new experimental evidence in support of previously more tenuous claims. These fully satisfy my comments and suggestions, and in my view, have significantly strengthened the manuscript with compelling new evidence.

    3. Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and tau-mediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine age-dependent, neurodegeneration.

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      Comments on revisions:

      The revision adequately addresses most of the previously raised concerns, resulting in a significantly improved manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5, and convincingly demonstrated that this protein functions in the regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule-associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377Minduced eye degeneration, they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, the overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, and even memory defects upon TauV377M expression.

      Strengths

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5. 

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010), PMID: 20071510, demonstrated that pan-neuronal expression of Tau<sup>V377M</sup> disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner. 

      (1) If the mushroom bodies are defective upon Tau<sup>V377M</sup>. expression, does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We thank the reviewer for this insightful comment. Consistent with Kosmidis et al. (2010), we confirm that expression of hTau<sup>V377M</sup> disrupts the architecture of mushroom bodies.   In addition, we find, as suggested by the reviewer, that coexpression of either Pfdn5 or Pfdn6 with hTau<sup>V377M</sup> significantly restores the organization of the mushroom bodies. These new findings strongly support the hypothesis that Pfdn5 or Pfdn6 mitigate hTau<sup>V377M</sup> -induced memory deficits by preserving the structure of the mushroom body, likely through stabilizing the microtubule network. This data has now been included in the revised manuscript (Figure 7H-O).

      (2) The discovery that Pfdn5 (and 6 most likely) affects tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above, are aiming at will likely be helpful towards therapeutic interventions.

      We agree that determining whether prefoldin modulates the toxicity of both mutant and wildtype Tau is critical for understanding its broader relevance to Tauopathies. We have now performed additional experiments required to address this issue. These new data show that loss of Pfdn5 also exacerbates toxicity associated with wildype Tau (hTau<sup>WT</sup>), in a manner similar to that observed with hTau<sup>V337M</sup> or hTau<sup>R406W</sup>. Specifically, overexpression of hTau<sup>WT</sup> in a Pfdn5 mutant background leads to Tau aggregate formation (Figure S7G-I), and coexpression of Pfdn5 with hTau<sup>WT</sup> reduces the associated synaptic defects (Figure S11F-L). These findings underscore a general role for Pfdn5 in modulating diverse Tauopathy-associated phenotypes and suggest that it could be a broadly relevant therapeutic target. 

      Weakness

      (3) What is unclear, however, is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      We appreciate this important suggestion. To investigate a potential direct interaction between Pfdn5 and Tau<sup>V377M</sup>, we performed co-immunoprecipitation experiments using lysates from adult fly brain expressing hTau<sup>V337M</sup>. Under the conditions tested, we did not detect a direct physical interaction. While this does not support a direct interaction, it does not strongly refute it either. We note that Pfdn5 and Tau are colocalized within axons (Figure S13J-K). At this stage, we are unable to resolve the issue of direct vs indirect association. If indirect, then Tau and Pfdn5 act within the same subcellular compartments (axon); if direct, then either only a small fraction of the total cellular proteins is in the Tau-Pfdn5 complex and therefore difficult to detect in bulk protein westerns, or the interactions are dynamic or occur in conditions that we have not been able to mimic in vitro. 

      (4) Does Pfdn5 loss exacerbate Tau<sup>V377M</sup> phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion. 

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wildtype Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is Tau<sup>V377M</sup> hyperphosphorylated in the larvae?? What happens to Tau<sup>V377M</sup> phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      We completely agree that it is important to link Tau-induced phenotypes with the microtubule destabilization and phosphorylation state of Tau.   We performed immunostaining using futsch antibody to check the microtubule organization at the NMJ and observed a severe reduction in futsch intensity when Tau<sup>V337M</sup> was expressed in the Pfdn5 mutant (ElavGal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>), suggesting that Pfdn5 absence exacerbates the hTau<sup>V337M</sup> defects due to more microtubule destabilization (Figure S6F-J). 

      We have performed additional experiments to examine the phosphorylation state of hTau in Drosophila larval axons. Immunocytochemistry indicated that only a subset of hTau aggregates in Pfdn5 mutants (Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>) are recognized by phospho-hTau antibodies.   For instance, the AT8 antibody (targeting pSer202/pThr205) (Goedert et al., 1995) labelled only a subset of aggregates identified by the total hTau antibody (D5D8N) (Figure S9AE). Moreover, feeding these larvae (Elav-Gal4>Tau<sup>V337M</sup; DPfdn5<sup>15/40</sup>) with LiCl, which blocks GSK3b, still showed robust Tau aggregation (Figure S9F-J). 

      These results imply that: a) soluble phospho-hTau levels in Pfdn5 mutants are low and not reliably detected with a single phospholylation-specific antibody; b) Loss of Pfdn5 results in Tau aggregation in a hyperphosphorylation-independent manner similar to what has been reported earlier (LI et al. 2022); and c) the destabilization of microtubules in Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup> results in Tau dissociation and aggregate formation. These data and conclusions have been incorporated into the revised manuscript.

      (5) Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tauexpressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      We have now examined whether heterozygous loss of Pfdn5 (∆Pfdn5/+) enhances the effect of Tau expression. While each genotype (hTau<sup>V337M</sup>, hTau<sup>WT</sup> or ∆Pfdn5/+) alone is viable, Elav-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not cause lethality. 

      (6) Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of Tau<sup>V377M</sup> (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which, if true, begs for a distinct mechanistic approach.

      To test whether Pfdn5 modulates Tau phenotypes by altering Tau protein levels, we performed western blot analysis under Pfdn5 or Pfdn6 overexpression conditions and observed no change in hTau<sup>V337M</sup> levels (Figure 6O). However, in the absence of Pfdn5, both hTau<sup>V337M</sup> and hTau<sup>WT</sup> form large, insoluble aggregates that are not detected in soluble lysates by standard western blotting but are visualized by immunocytochemistry (Figure S7G-I). Thus, the apparent reduction in Tau levels on western blots reflects a solubility shift, not an actual decrease in Tau expression. These findings argue against a simple model in which Pfdn5 regulates Tau abundance and instead support a mechanism in which Pfdn5 loss leads to change in Tau conformation, leading to its sequesteration away for already destabilized microtubules.  

      (7) Finally, the authors argue that Tau<sup>V377M</sup> forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers, as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers, and overexpression promote the more benign large aggregates??

      We have performed additional experiments to analyze the nature of these aggregates using 1,6-HD. The 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6hexanediol failed to dissolve these Tau aggregates (Figure S8), demonstrating the formation of stable filamentous flame-shaped NFT-like aggregates in the absence of Pfdn5 (Figure 5D and Figure S9).

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and taumediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. 

      In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. 

      Our new experiments confirm that knockdown of Pfdn5 alone does not cause eye phenotypes.

      Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine agedependent, neurodegeneration.

      In response, we have modulated and explained our conclusions in this regard as described later in our “rebuttal.”

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that Tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      The reviewer is correct that the impact would be greater if Pfdn5-Tau interactions were also examined in human tissue.   While we have not attempted these experiments ourselves, we hope that our observations will stimulate others to test the conservation of phenomena we describe. There are, however, several lines of circumstantial evidence from human Alzheimer’s disease datasets that implicate PFDN5 in disease pathology. For example, recent compilations and analyses of proteomic data show reductions of CCT components, TBCE, as well as Prefoldin subunits, including PFDN5, in AD tissue (HSIEH et al. 2019; TAO et al. 2020; JI et al. 2022; ASKENAZI et al. 2023; LEITNER et al. 2024; SUN et al. 2024). Furthermore, whole blood mRNA expression data from Alzheimer's patients revealed downregulation of PFDN5 transcript (JI et al. 2022). Together, these findings from human data are consistent with the roles of PFDN5 in suppressing diverse neurodegenerative processes. We have incorporated these points into the discussion section of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      See public review for experimental recommendations focusing on the Tau Pfdn interactions.  I would refrain from using the word aggregates, I would call them puncta, unless there is molecular or visual (ie AFM) evidence that they are indeed insoluble aggregates.  Finally, although including the full genotypes written out below the axis in the bar graphs is appreciated, it nevertheless makes them difficult to read due to crowding in most cases and somewhat distracting from the figure. 

      In my opinion, a more reader-friendly manner of reporting the phenotypes will be highly helpful. For example, listing each component of the genotype on the left of each bar graph and adding a cross or a filled circle under the bar to inform of the full genotype of the animals used.

      As described in the response to the previous comment, we now have strong direct evidences to support our view that the observed puncta are stable Tau aggregates. Thus, we feel justified to use the term Tau-aggregates in preference to Tau puncta. 

      We have tried to write the genotypes to make them more reader-friendly.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 119-121: 35 modifiers from 64 seem like an unusually high hit rate. Are these individual genes or lines? Were all modifiers supported by at least 2 independent RNAi strains targeting non-overlapping sequences? A supplemental table should be included detailing all genes and specific strains tested, with corresponding results.

      We agree with the reviewer that 35 modifiers from 64 genes may be too high. However, since the genes knocked down in the study are chaperones, crucial for maintaining proteostasis, we may have got unusually high hits. The information related to individual genes and lines is provided in Supplemental Table 1. We have now included an additional Supplemental Table 3, which lists the genes and the RNAi lines used in Figure 1, detailing the sequence target information. The table also specifies the number of independent RNAi strains used and the corresponding results. 

      (2) Figure 1: The authors quantify the areas of ommatidial fusion and necrosis as degeneration, but it is difficult to appreciate the aberrations in the photos provided. Was any consideration given to also quantifying eye size?

      We have processed the images to enhance their contrast and make the aberrations clearer. The percentage of degenerated eye area (Figure 1M) was normalized with total eye area. The method for quantifying degenerated area has been explained in the materials and methods section.

      (3) Figure 1: a) Only enhancers of rough eyes are shown but no controls are included to evaluate whether knockdown of these genes causes eye toxicity in the absence of Tau. These are important missing controls. All putative Tau enhancers, including Pdn5/6, need to be tested with GMR-GAL4 independently of Tau to determine whether they cause a rough eye. In a previous publication from some of the same investigators (Raut et al 2017), knockdown of Pfdn using eyGAL4 was shown to induce severe eye morphology defects - this raises questions about the results shown here. 

      We agree that assessing the effects of HSP knockdown independent of Tau is essential to confirm modifier specificity. We have now performed these knockdowns, and the data are reported in Supplemental Table 1. For RNAi lines represented in Figure 1, which enhanced Tau-induced degeneration/eye developmental defect, except for one of the RNAi lines against Pfdn6 (GD34204), no detectable eye defects were observed when knocked down with GMR-Gal4 at 25°C, suggesting that enhancement is specific to the Tau background. 

      Use of a more eye-specific GMR-Gal4 driver at 25°C versus broader expressing ey-Gal4 at 29°C in prior work (Raut et al. 2017) likely reflects the differences in the eye morphological defects.

      (b) Besides RNAi, do the classical Pdn5 deletion alleles included in this work also enhance the tau rough eye when heterozygous? Please also consider moving the Pfdn5/6 overexpression studies to evaluate possible suppression of the Tau rough eye to Figure 1, as it would enhance the interpretation of these data (but see also below).

      GMR-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not enhance rough eye phenotype. 

      (4) For genes of special interest, such as Pdn5, and other genes mentioned in the results, the main figure, or discussion, it is also important to perform quantitative PCR to confirm that the RNAi lines used actually knock down mRNA expression and by how much. These studies will establish specificity.

      We agree that confirming RNAi efficiency via quantitative PCR (qPCR) is essential for validating the knockdown efficiency. We have now included qPCR data, especially for key modifiers, confirming effective knockdown (Figure S2).

      (5) Lines 235-238: how do you conclude whether the tau phenotype is "enhanced" when Pfdn5 causes a similar phenotype on its own? Could the combination simply be additive? Did overexpression of Pdn5 suppress the UAS-hTau NMJ bouton phenotype (see below)? 

      Although Pfdn5 mutants and hTau expression individually increase satellite boutons, their combination leads to a significantly more severe and additional phenotype, such as significantly decreased bouton size and increased bouton number, indicating an enhancing rather than purely additive interaction (Figure 4 and Figure S6C). Moreover, we now show that overexpression of Pfdn5 significantly suppressed the hTau<sup>V337M</sup>-induced NMJ phenotypes. This new data has been incorporated as Figure S11F-L in the revised manuscript. 

      Alternatively, did the authors consider reducing fly tau in the Pdn5 mutant background?

      In new additional experiments, we observe that double mutants for Drosophila Tau (dTau) and Pfdn5 also exhibit severe NMJ defects, suggesting genetic interactions between dTau and Pfdn5. This data is shown below for the reviewer.

      Author response image 1.

      A double mutant combination of dTau and Pfdn5 aggravates the synaptic defects at the Drosophila NMJ. (A-D') Confocal images of NMJ synapses at muscle 4 of A2 hemisegment showing synaptic morphology in (A-A') control, (B-B') ΔPfdn5<SUP>15/40</SUP>, (C-C') dTauKO/dTauKO (Drosophila Tau mutant), (D-D') dTauKO/dTauKO; ∆Pfdn5<SUP>15/40</SUP> double immunolabeled for HRP (green), and CSP (magenta). The scale bar in D for (A-D') represents 10 µm. 

      (6) It may be important to further extend the investigation to the actin cytoskeleton. It is noted that Pfdn5 also stabilizes actin. Importantly, tau-mediated neurodegeneration in Drosophila also disrupts the actin cytoskeleton, and many other regulators of actin modify tau phenotypes.

      We appreciate the suggestion to examine the actin cytoskeleton. While prior studies indicate that Pfdn5 might regulate the actin cytoskeleton and that Tau<sup>V377M</sup> hyperstabilizes the actin cytoskeleton, we did not observe altered actin levels in Pfdn5 mutants (Figure 2G). However, actin dynamics may represent an additional mechanism through which Pfdn5 might temporally influence Tauopathy. Future work will address potential actin-related mechanisms in Tauopathy.

      (7) Figure 2: in the provided images, it is difficult to appreciate the futsch loops. Please include an image with increased magnification. It appears that fly strains harboring a genomic rescue BAC construct are available for Pfdn-this would be a complementary reagent to test besides Pfdn overexpression.

      We have updated Figure 2 to include high magnification NMJ images as insets, clearly showing the Futsch loops. While we have not yet tested a genomic rescue BAC construct for Pfdn5, we plan to use the fly line harboring this construct in future work.

      (8) Figure 3: Some of the data is not adequately explained. The use of Ran as a loading control seems rather unusual. What is the justification? Pfdn appears to only partially co-localize with a-tubulin in the axon; can the authors discuss or explain this? Further, in Pfdn5 mutants, there appears to be a loss of a-tubulin staining (3b'); this should also be discussed.

      We appreciate the reviewer's concern regarding the choice of loading control for our Western blot analysis. Importantly, since Tubulin levels and related pathways were the focus of our analysis, traditional loading controls such as α- or β-tubulin or actin were deemed unsuitable due to potential co-regulation. Ran, a nuclear GTPase involved in nucleocytoplasmic transport, is not known to be transcriptionally or post-translationally regulated by Tubulin-associated signaling pathways. To ensure its reliability as a loading control, we confirmed by densitometric analysis that Ran expression showed minimal variability across all samples. Hence, we used Ran for accurate normalization in the Western blot data represented in this manuscript. We have also used GAPDH as a loading control and found no difference with respect to Ran as a loading control across samples.

      We appreciate the reviewer's comment regarding the interpretation of our Pearson's correlation coefficient (PCC) results. While the mean colocalization value of 0.6 represents a moderate positive correlation (MUKAKA 2012), which may not reach the conventional threshold for "high positive" colocalization (usually considered 0.7-0.9), it nonetheless indicates substantial spatial overlap between the proteins of interest. Importantly, colocalization analysis provides supportive but indirect evidence for molecular proximity.  To further validate the interaction, we performed a microtubule binding assay, which directly demonstrates the binding of Pfdn5 to stabilized microtubules.

      In accordance with the western blot analysis shown in Figure 2G-I, the levels of Tubulin are reduced in the Pfdn5 mutants (Figure 3B''). We have incorporated and discussed this in the revised manuscript.

      (9) Figure 4: Overexpression of Pfdn appears to rescue the supernumerary satellite bouton numbers induced by human Tau; however, interpretation of this experiment is somewhat complicated as it is performed in Pfdn mutant genetic background. Can overexpression of Pfdn on its own rescue the Tau bouton defect in an otherwise wildtype background?

      We have now coexpressed Pfdn5 and hTau<SUP>V337M</SUP> in an otherwise wild-type background. As shown in Figure S11F-L, Pfdn5 overexpression suppresses Tau-induced bouton defects. We have incorporated the data in the Results section to support the role of Pfdn5 as a modifier of Tau toxicity.

      (10) Lines 256-263 / Figure 5: (a) What exactly are these tau-positive structures (punctae) being stained in larval brains in Fig 5C-E? Most prior work on tau aggregation using Drosophila models has been done in the adult brain, and human wildtype or mutant Tau is not known to form significant numbers of aggregates in neurons (although aggregates have been described following glia tau expression). 

      Therefore, the results need to be further clarified. Besides the provided schematic, a zoomed-out image showing the whole larval brain is needed here for orientation. Have these aggregates been previously characterized in the literature? 

      We agree with the reviewer that the expression of the wildtype or mutant form of human Tau in Drosophila is not known to form aggregates in the larval brain, in contrast to the adult brain (JACKSON et al. 2002; OKENVE-RAMOS et al. 2024). Consistent with previous reports, we also observed that Tau expression on its own does not form aggregates in the Drosophila larval brain.

      However, in the absence of Pfdn5, microtubule disruption is severe, leading to reduced Taumicrotubule binding and formation of globular/round or flame-shaped tangles like aggregates in the larval brain. Previous studies have reported that 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6-Hexanediol failed to dissolve these Tau puncta, demonstrating the formation of stable aggregates in the absence of Pfdn5. Additionally, we now performed a Tau solubility assay and show that in the absence of Pfdn5, a significant amount of Tau goes in the pellet fraction, which could not be detected by phospho-specific AT8 Tau antibody (targeting pSer202/pThr205) but was detected by total hTau antibody (D5D8N) on the western blots (Figure S8). These data further reinforce our conclusion that  Pfdn5 prevents the transition of hTau from soluble and/or microtubule-associated state to an aggregated, insoluble, and pathogenic state. These new data have been incorporated into the revised manuscript.

      (b) Can additional markers (nuclei, cell membrane, etc.) be used to highlight whether the taupositive structures are present in the cell body or at synapses?

      We performed the co-staining of Tau and Elav to assess the aggregated Tau localization. We found that in the presence of Pfdn5, Tau is predominantly cytoplasmic and localised to the cell body and axons. In the absence of Pfdn5, Tau forms aggregates but is still localized to the cell body or axons. However, some of the aggregates are very large, and the subcellular localization could not be determined (Figure S8M-N'). These might represent brain regions of possible nuclear breakdown and cell death (JACKSON et al. 2002).

      (c) It would also be helpful to perform western blots from larval (and adult) brains examining tau protein levels, phospho-tau species, possible higher-molecular weight oligomeric forms, and insoluble vs. soluble species. These studies would be especially important to help interpret the potential mechanisms of observed interactions.

      Western blot analysis revealed that overexpression of Pfdn5 does not alter total Tau levels (Figure 6O). In Pfdn5 mutants, however, hTau<sup>V337M</sup> levels were reduced in the supernatant fraction and increased in the pellet fraction, indicating a shift from soluble monomeric Tau to aggregated Tau.

      (d) Does overexpression of Pdn5 (UAS-Pdn5) suppress the formation of tau aggregates? I would therefore recommend that additional experiments be performed looking at adult flies (perhaps in Pfdn5 heterozygotes or using RNAi due to the larval lethality of Pdn5 null animals).

      Overexpression of Pfdn5 significantly reduced Tau-aggregates (Elav-Gal4/UASTau<sup>V337M</sup>; UAS-Pfdn5; DPfdn5<sup>15/40</sup>) observed in Pfdn5 mutants (Figure 5E). Coexpression of Pfdn5 and hTau<sup>V337M</sup> suppresses the Tau aggregates/puncta in 30-day adult brain. Since heterozygous DPfdn<sup>15</sup>/+ did not show a reduction in Pfdn5 levels, we did not test the suppression of Tau aggregates in  DPfdn<sup>15</sup>/+; Elav>UAS-Pfdn5, UAS-Tau<sup>V337M</sup>.

      (11) Figure 6, panels A-N: The GMR>Tau rough eye is not a "neurodegenerative" but rather a predominantly developmental phenotype. It results from aberrant retinal developmental patterning and the subsequent secretion/formation of the overlying eye cuticle (lenslets). I am confused by the data shown suggesting a "shrinking eye size" and increasing roughened surface over time (a GMR>tau eye similar to that shown in panel B cannot change to appear like the one in panel H with aging). The rough eye can be quite variable among a population of animals, but it is usually fixed at the time the adult fly ecloses from the pupal case, and quite stable over time in an individual animal. Therefore, any suppression of the Tau rough eye seen at 30 days should be appreciable as soon as the animals eclose. These results need to be clarified. If indeed there is robust suppression of Tau rough eye, it may be more intuitive and clearer to include these data with Figure 1, when first showing the loss-of-function enhancement of the Tau rough eye. Also, why is Pfdn6 included in these experiments but not in the studies shown in Figures 2-5?

      We thank the reviewer for their careful and knowledgeable assessment of the GMR>Tau rough eye model. We appreciate the clarification that the rough eye phenotype could be “developmental” rather than neurodegenerative.”  Our initial observations regarding "shrinking eye size" and "increased surface roughness" clearly show age-related progression of structural change.   Such progression has been observed and reported by others (IIJIMA-ANDO et al. 2012; PASSARELLA AND GOEDERT 2018).   We observed an age-dependent increase in the number of fused ommatidia in GMR-Gal4 >Tau, which were rescued by Pfdn5 or Pfdn6 expression. We noted that adult-specific induction of hTau<sup>V337M</sup> adult flies using the Gal80<sup>ts</sup> and GMR-GeneSwitch (GMR-GS) systems was not sufficient to induce a significant eye phenotype; thus, early expression of Tau in the developing eye imaginal disc appears to be required for the adult progressive phenotype that we observe. We feel that it is inadequate to refer to this adult progressive phenotype as “developmental,” and while admittedly arguable whether this can be termed “degenerative.”   

      To address neurodegeneration more directly, we focused on 30-day-old adult fly brains and demonstrated that Pfdn5 overexpression suppresses age-dependent Tau-induced neurodegeneration in the central nervous system (Figure 6H-N and Figure S12). This supports our central conclusion regarding the neuroprotective role of Pfdn5 in age-associated Tau pathology. Since we found an enhancement in the Tau-induced synaptic and eye phenotypes by Pfdn6 knockdown, we also generated CRISPR/Cas9-mediated loss-of-function mutants for Pfdn6. However, loss of Pfdn6 resulted in embryonic/early first instar lethality, which precluded its detailed analysis at the larval stages.

      (12) Figure 6, panels O-T: the elav>tau image appears to show a different frontal section plane compared to the other panels. It is advisable to show images at a similar level in all panels since vacuolar pathology can vary by region. It is also useful to be able to see the entire brain at a lower power, but the higher power inset view is obscuring these images. I would recommend creating separate panels rather than showing them as insets.

      In the revised figure, we now display the low- and high-magnification images as separate, clearly labeled panels instead of using insets. This improves visibility of the brain morphology while providing detailed views of the vacuolar pathology (Figure 6H-L).

      (13) Figure 6/7: For the experiments in which Pfdn5/6 is overexpressed and possibly suppresses tau phenotypes (brain vacuoles and memory), it is important to use controls that normalize the number of UAS binding sites, since increased UAS sites may dilute GAL4 and reduced Tau expression levels/toxicity. Therefore, it would be advisable to compare with Elav>Tau flies that also include a chromosome with an empty UAS site or other transgenes, such as UAS-GFP or UAS-lacZ.

      We thank the reviewer for the suggestion. Now we have incorporated proper controls in the brain vacuolization, the mushroom body, and ommatidial fusion rescue experiments. Also, we have independently verified whether Gal4 dilution has any effect on the Tau phenotypes (Figure 6H-L, Figure 7, and Figure S11A-B).

      (14) Lines 311-312: the authors say vacuolization occurs in human neurodegenerative disease, which is not really true to my knowledge and definitely not stated in the citation they use. Please re-phrase.

      Now we have made the appropriate changes in the revised manuscript.

      (15) Figure 7: The authors claim that Pfdn5/6 expression does not impact memory behavior, but there in fact appears to be a decrease in preference index (panel D vs panel B). Does this result complicate the interpretation of the potential interaction with Tau (panel F). Are data from wildtype control flies available?

      In our memory assay, a decrease in performance index (PI) of the trained flies compared to the naïve flies indicates memory formation (normal memory in control flies, Figure 7B). In contrast, a lack of significant difference in PI indicates a memory defect (Figure 7C: hTau<sup>V337M</sup> overexpressed flies). "Decrease in preference index (panel D vs panel B)" is not a sign of memory defect; it may be interpreted as a better memory instead. Hence, neuronal overexpression of Pfdn5 (Figure 7D) or Pfdn6 (Figure 7E) in wildtype neurons does not cause memory deficits. In addition, coexpression of Pfdn5/6 and hTau<sup>V337M</sup> successfully rescues the Tau-induced memory defect (significant drop in PI compared to the PI of naïve flies in Figure 7F-G). Moreover, almost complete rescue of the Tau-induced mushroom body defect on Pfdn5 or Pfdn6 expression further establishes potential interaction between Pfdn5/6 and Tau. This data has been incorporated into the revised manuscript.

      The memory assay itself with extensive data on wildtype flies and various other genotype will shortly be submitted for publication in another manuscript (Majumder et al, manuscript under preparation); However, we can confirm for the reviewer that wildtype flies, trained and assayed by the protocol described, show a significant decrease in performance index compared to the naïve flies, indicative of strong learning and memory performance, very similar to the control genotype data shown in Figure 7B. 

      Additional minor considerations

      (16) Lines 50-52: there are many therapeutic interventions for treating tauopathies, but not curative or particularly effective ones.

      Now we have made the appropriate changes in the revised manuscript.

      (17) Lines 87-106 seem like a duplication of the abstract. Consider deleting or condensing.

      We have made the appropriate changes in the revised manuscript.

      (18) Where is pfdn5 expressed? Development v. adult? Neuron v. glia? Conservation?

      Prefoldin5 is expressed throughout development but strongly localized to the larval trachea and neuronal axons. Drosophila Pfdn5 shows 35% overall identity with human PFDN5. 

      (19) Liine 187: is pfdn5 truly "novel"?

      The role of Pfdn5 as microtubule-binding and stabilizing is a new finding and has not been predicted or described before. Hence, it is a novel neuronal microtubule-associated protein.  

      (20) Figure 5, panel F, genotype labels on the x-axis are confusing; consider simplifying to Control, DPfdn, and Rescue.

      We have made appropriate changes in the figure for better readability.

      (21) Figures 5/8: it might be preferable to use consistent colors for Tau/HRP--Tau is labeled green in Figure 5 and then purple in Figure 8.

      We have made these changes where possible. 

      (22) Lines 311-312: Vacuolar neuropathology is NOT typically observed in human Tauopathy.

      We thank the reviewer for pointing this out. We have made the appropriate changes in the revised manuscript.

      (23) Lines 328-349: The explanation could be made more clear. Naïve flies should not necessarily be called controls. Also, a more detailed explanation of how the preference index is computed would be helpful. Why are some datapoints negative values?

      (a) We have rewritten this paragraph to make the description and explanation clearer. The detailed method and formula to calculate the Preference index have been incorporated in the Materials and Methods section.

      (b) We have replaced the term Control with Naïve. 

      (c) Datapoints with negative values appeared in some of the 'Trained' group flies. It indicates that post-CuSO<sub>4</sub> training, some groups showed repulsion towards the otherwise attractive odor 2,3B. As 2,3B is an attractive odorant, naïve or control flies show attraction towards it compared to air, which is evident from a higher number of flies in the Odor arm (O) compared to that of the Air arm (A) of the Y-maze; thus, the PI [(O-A/O+A)*100] is positive in case of naïve fly groups. Training of the flies led to an association of the attractive odorant with bitter food, leading to a decrease of attraction, and even repulsion towards the odorant in a few instances, resulting in less fly count in the odor arm compared to the air arm. Hence, the PI becomes negative as (O-A) is negative in such instances. Thus, it is not an anomaly but indicates strong learning. 

      (24) Line 403: misspelling "Pdfn"

      We have corrected this.

      (25) Lines 423-425: recommend re-phrasing, since tauopathies are human diseases. Mice and other animal models may be susceptible to tau-mediated neuronal dysfunction but not Tauopathy, per see.

      We have made the appropriate changes in the revised manuscript.

      (26) Lines 468-469: "tau neuropathology" rather than "tau associated neuropathies".

      We have made the appropriate changes in the revised manuscript. 

      References

      Askenazi, M., T. Kavanagh, G. Pires, B. Ueberheide, T. Wisniewski et al., 2023 Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14: 4466.

      Hsieh, Y. C., C. Guo, H. K. Yalamanchili, M. Abreha, R. Al-Ouran et al., 2019 Tau-Mediated Disruption of the Spliceosome Triggers Cryptic RNA Splicing and Neurodegeneration in Alzheimer's Disease. Cell Rep 29: 301-316 e310.

      Iijima-Ando, K., M. Sekiya, A. Maruko-Otake, Y. Ohtake, E. Suzuki et al., 2012 Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8: e1002918.

      Jackson, G. R., M. Wiedau-Pazos, T. K. Sang, N. Wagle, C. A. Brown et al., 2002 Human wildtype tau interacts with wingless pathway components and produces neurofibrillary pathology in Drosophila. Neuron 34: 509-519.

      Ji, W., K. An, C. Wang and S. Wang, 2022 Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159: 38.

      Leitner, D., G. Pires, T. Kavanagh, E. Kanshin, M. Askenazi et al., 2024 Similar brain proteomic signatures in Alzheimer's disease and epilepsy. Acta Neuropathol 147: 27.

      Li, L., Y. Jiang, G. Wu, Y. A. R. Mahaman, D. Ke et al., 2022 Phosphorylation of Truncated Tau Promotes Abnormal Native Tau Pathology and Neurodegeneration. Mol Neurobiol 59: 6183-6199.

      Mershin, A., E. Pavlopoulos, O. Fitch, B. C. Braden, D. V. Nanopoulos et al., 2004 Learning and memory deficits upon TAU accumulation in Drosophila mushroom body neurons. Learn Mem 11: 277-287.

      Mukaka, M. M., 2012 Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69-71.

      Okenve-Ramos, P., R. Gosling, M. Chojnowska-Monga, K. Gupta, S. Shields et al., 2024 Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22: e3002504.

      Passarella, D., and M. Goedert, 2018 Beta-sheet assembly of Tau and neurodegeneration in Drosophila melanogaster. Neurobiol Aging 72: 98-105.

      Sun, Z., J. S. Kwon, Y. Ren, S. Chen, C. K. Walker et al., 2024 Modeling late-onset Alzheimer's disease neuropathology via direct neuronal reprogramming. Science 385: adl2992.

      Tao, Y., Y. Han, L. Yu, Q. Wang, S. X. Leng et al., 2020 The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11: 233.

      Wegmann, S., B. Eftekharzadeh, K. Tepper, K. M. Zoltowska, R. E. Bennett et al., 2018 Tau protein liquid-liquid phase separation can initiate tau aggregation. EMBO J 37.

    1. eLife Assessment

      In this valuable study, the authors used an elegant genetic approach to delete EED at the post-neural crest induction stage. The usage of the single-cell RNA-seq analysis method is extremely suitable to determine changes in the cell type-specific gene expression during development. Results backed by solid evidence demonstrate that Eed is required for craniofacial osteoblast differentiation and mesenchymal proliferation after the induction of the neural crest.

    2. Reviewer #2 (Public review):

      Summary:

      The role of PRC2 in post neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type specific gene expression changes in developmental systems.

      Original Weaknesses:

      (1) Although this study is well designed and contains state-of-art single cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novelty in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Fig. 4e)

      Comments on latest version:

      The authors have addressed weaknesses 2 and 3 of my previous comment very well. For weaknesses 1 and 4, the authors have added a main Fig 5 and its associated supplemental materials, which definitely strengthen the mechanistic depth of the story. However, I think the audience would appreciate if the following questions/points could be further addressed regarding the Cut&Tag data (mostly related to main Figure 5):

      (1) The authors described that Sox10-Cre would be expressed at E8.75, and in theory, EED-FL would be ablated soon after that. Why would E16.5 exhibit a much smaller loss in H3K27me3 compared to E12.5? Shouldn't a prolong loss of EED lead to even worse consequence?

      (2) The gene expression change at E12.5 upon loss of EED (shown in Fig. 4h) seems to be massive, including many PRC2-target genes. However, the H3K27me3 alteration seems to be mild even at E12.5. Does this infer a PRC2 or H3K27 methylation - independent role of EED? To address this, I suggest the authors re-consider addressing my previously commented weakness #4 regarding the RNA-seq versus Cut&Tag change correlation. For example, a gene scatter plot with X-axis of RNA-seq changes versus Y-axis of H3K27me3 level changes.

      (3) The CUT&Tag experiments seem to contain replicates according to the figure legend, but no statistical analysis was presented including the new supplemental tables. Also, for Fig. 5c-d, instead of showing the MRR in individual conditions, I think the audience would really want to know the differential MRR between Fl/WT and Fl/Fl. In other words, how many genes/ MRR have statistically lower H3K27me3 level upon EED loss.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Epigenetic regulation complex (PRC2) is essential for neural crest specification, and its misregulation has been shown to cause severe craniofacial defects. This study shows that Eed, a core PRC2 component, is critical for craniofacial osteoblast differentiation and mesenchymal proliferation after neural crest induction. Using mouse genetics and single-cell RNA sequencing, the researcher found that conditional knockout of Eed leads to significant craniofacial hypoplasia, impaired osteogenesis, and reduced proliferation of mesenchymal cells in post-migratory neural crest populations.

      Overall, the study is superficial and descriptive. No in-depth mechanism was analyzed and the phenotype analysis is not comprehensive.

      We thank the reviewer for sharing their expertise and for taking the time to provide helpful suggestions to improve our study. We are gratified that the striking phenotypes we report from Eed loss in post-migratory neural crest craniofacial tissues were appreciated. The breadth and depth of our phenotyping techniques, including skeletal staining, micro-CT, echocardiogram, immunofluorescence, histology, and primary craniofacial cell culture provide comprehensive data in support our hypothesis that PRC2 is required for epigenetic control of craniofacial osteoblast differentiation. To provide mechanistic data in support of this hypothesis, we have now performed CUT&Tag H3K27me3 chromatin profiling on nuclei harvested from E12.5 or E16.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissue. These new data, which are presented in Fig. 5, Supplementary Fig. 9, and Supplementary Tables 7-10 of our revised manuscript, validate our hypothesis that epigenetic regulation of chromatin architecture downstream of PRC2 activity underlies craniofacial osteoblast differentiation. In particular, we now show that Eed-dependent H3K27me3 methylation is associated with correct temporal expression of transcription factors that are necessary for craniofacial differentiation and patterning, such as including Msx1, Pitx1, Pax7, which were initially nominated by single-cell RNA sequencing of E12.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6.

      Reviewer #2 (Public review):

      Summary:

      The role of PRC2 in post-neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type-specific gene expression changes in developmental systems.

      We thank the reviewer for their generous and helpful comments on our study. We are happy that our mouse genetic and single-cell RNA sequencing approaches were appropriate in pairing the craniofacial phenotypes we report with distinct gene expression changes in post-migratory neural crest tissues upon Eed deletion.

      Weaknesses:

      (1) Although this study is well designed and contains state-of-the-art single-cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      Thank you for this suggestion. As described in response to Reviewer #1, we have now performed CUT&Tag H3K27me3 chromatin profiling on nuclei harvested from E12.5 or E16.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues to provide mechanistic epigenomic data in support of our hypothesis that hat PRC2 is required for craniofacial osteoblast differentiation. These new data, which are presented in Fig. 5, Supplementary Fig. 9, and Supplementary Tables 7-10 of our revised manuscript, integrate genome-wide and targeted metaplot visualizations across genotypes with in-depth analyses of methylation rich regions and genes associated with methylation rich loci. Broadly, these new data reveal that changes in H3K27me3 occupancy correlate with gene expression changes from single-cell RNA sequencing of E12.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novel in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      We acknowledge and cite the study the reviewer has indicated (Schwarz et al. Development 2014) in our initial and revised manuscripts. This elegant investigation uses Wnt1-Cre to delete Ezh2 and reports a phenotype similar to the one we observed with Sox10-Cre deletion of Eed, but our study adds depth to the understanding of PRC2’s vital role in neural crest development by ablating Eed, which has a unique function in the PRC2 complex by binding to H3K27me3 and allosterically activating Ezh2. In this sense, our study sheds light on whether phenotypes arising from deletion of Eed, the PRC2 “reader”, differ from phenotypes arising from deletion of Ezh2, the PRC2 “writer”, in neural crest derived tissues. Moreover, we provide the first single-cell RNA sequencing and epigenomic investigations of craniofacial phenotypes arising from PRC2 activity in the developing neural crest. Due to limitations associated with the Wnt1-Cre transgene (Lewis et al. Developmental Biology 2013), which targets pre-migratory neural crest cells, our investigations used Sox10Cre, which targets the migratory neural crest and is completely recombined by E10.5. We have included a detailed comparison of these mouse models in the Discussion section of our revised manuscript, and we thank the reviewer for this thoughtful suggestion. 

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      We thank the reviewer for the opportunity to improve the presentation of our single-cell RNA sequencing data. In response, we have added Supplementary Fig. 8 to our revised manuscript, which shows the cell clusters most affected by EED disruption in UMAP space across genotypes. Because we wanted to capture the fill diversity of cell types underlying the phenotypes we report, we did not sort Sox10+ cells (via FACS, for example) from craniofacial tissues before single-cell RNA sequencing. Our resulting single-cell RNA sequencing data are therefore inclusive of a diversity of cell types in UMAP space, and the prevalence of many of these cell types was unaffected by epigenetic disruption of neural crest derived tissues. The prevalence of the cell clusters that are most affected across genotypes and which are most relevant to our analyses of the developing neural crest are shown in Fig. 4c (and now also in Supplementary Fig. 8), including C0 (differentiating osteoblasts), C4 (mesenchymal stem cells), C5 (mesenchymal stem cells), and C7 (proliferating mesenchymal stem cells). Marker genes and pseudobulked differential expression analyses across these clusters are shown in Fig. 4d and Fig. 4e-h, respectively. 

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Figure 4e).

      We agree with the reviewer that the single-cell RNA sequencing data in our initial submission do not provide insight into direct versus indirect changes in gene expression downstream of PRC2. In contrast, the CUT&Tag chromatin profiling data that we have generated for this revision provides mechanistic insight into H3K27me3 occupancy and direct effects on gene expression resulting from PRC2 inactivation in our mouse models.

      REVIEWING EDITOR COMMENTS

      The following are recommended as essential revisions

      (1) The study is overall superficial and primarily descriptive, lacking in-depth mechanistic analysis and comprehensive phenotype evaluation.

      Please see responses to Reviewer #1 and Reviewer #2 (weaknesses 1 and 4) above. 

      (2) The authors did not investigate the temporal and spatial expression of Eed during cranial neural crest development, which is crucial for explaining the observed phenotypes.

      The temporal and spatial expression of Eed during embryogenesis is well studied. Eed is ubiquitously expressed starting at E5.5, peaks at E9.5, and is downregulated but maintained at a high basal expression level through E18.5 (Schumacher et al. Nature 1996). Although comprehensive analysis of Eed expression in neural crest tissues has not been reported (to our knowledge), Eed physically and functionally interacts with Ezh2 (Sewalt et al. Mol Cell Biol 1998), which is enriched at a diversity of timepoints throughout all developing craniofacial tissues (Schwarz et al. Development 2014). In our study, we confirmed enrichment of Eed expression in craniofacial tissues throughout development using QPCR, and have provided a more detailed description of these published and new findings in the Discussion section of our revised manuscript. 

      (3) There is no apoptosis analysis provided for any of the samples.

      We evaluated the presence of apoptotic cells in E12.5 craniofacial sections using immunofluorescence for Cleaved Caspase 3 in Supplementary Fig. 3d. Although we found a modest increase in the labeling index of apoptotic cells, there was insufficient evidence to conclude that apoptosis is a substantial factor in craniofacial hypoplasia resulting from Eed loss in post-migratory neural crest craniofacial tissues. We have clarified these findings in the Results and Discussion sections of our revised manuscript. 

      (4) As Eed is a core component of the PRC2 complex, were any other components altered in the Eed cKO mutant? How does Eed regulation influence osteogenic differentiation and proliferation through known pathways?

      We thank the editors for this thoughtful inquiry. Although we did not specifically investigate expression or stability of other PRC2 components in Eed conditional mutants, and little is known about how Eed regulates osteogenic differentiation or proliferation through any pathway, our single-cell RNA sequencing data presented in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6 provide a significant conceptual advance with mechanistic implications for understanding bone development downstream of Eed and do not reveal any alterations in the expression of other PRC2 components across genotypes. We have clarified these important details in the Discussion section of our revised manuscript. 

      (5) The authors may compare the Eed cKO phenotype with that of the previous EZH2 cKO mouse model since both Eed and EZH2 are essential subunits of PRC2.

      Please see responses to editorial comment 2 above and the last paragraph of the Discussion section of our revised manuscript for comparisons between Eed and Ezh2 knockout phenotypes.

    1. eLife Assessment

      This useful study explores the role of RAP2A in asymmetric cell division (ACD) regulation in glioblastoma stem cells (GSCs), drawing parallels to Drosophila ACD mechanisms and proposing that an imbalance toward symmetric divisions drives tumor progression. While findings on RAP2A's role in GSC expansion are promising, and the reviewers found the study innovative and technically solid, the study relies on neurosphere models without in vivo confirmation and will therefore need to be further validated in the future.

    2. Reviewer #1 (Public review):

      Summary:

      The authors validate the contribution of RAP2A to GB progression. RAp2A participates in asymetric cell division, and the localization of several cell polarity markers including cno and Numb.

      Strengths:

      The use of human data, Drosophila models and cell culture or neurospheres is a good scenario to validate the hypothesis using complementary systems.

      Moreover, the mechanisms that determine GB progression, and in particular glioma stem cells biology, are relevant for the knowledge on glioblastoma and opens new possibilities to future clinical strategies.

      Weaknesses:

      While the manuscript presents a well-supported investigation into RAP2A's role in GBM, some methodological aspects could benefit from further validation. The major concern is the reliance on a single GB cell line (GB5), including multiple GBM lines, particularly primary patient-derived 3D cultures with known stem-like properties, would significantly enhance the study's robustness.

      Several specific points raised in previous reviews have improved this version of the manuscript:

      • The specificity of Rap2l RNAi has been further confirmed by using several different RNAi tools.

      • Quantification of phenotypic penetrance and survival rates in Rap2l mutants would help determine the consistency of ACD defects. The authors have substantially increased the number of samples analyzed including three different RNAi lines (both the number of NB lineages and the number of different brains analyzed) to confirm the high penetrance of the phenotype.

      • The observations on neurosphere size and Ki-67 expression require normalization (e.g., Ki-67+ cells per total cell number or per neurosphere size). This is included in the manuscript and now clarified in the text.

      • The discrepancy in Figures 6A and 6B requires further discussion. The authors have included a new analysis and further explanations and they can conclude that in 2 cell-neurospheres there are more cases of asymmetric divisions in the experimental condition (RAP2A) than in the control.

      • Live imaging of ACD events would provide more direct evidence. Live imaging was not done due to technical limitations. Despite being a potential contribution to the manuscript, the current conclusions of the manuscript are supported by the current data, and live experiments can be dispensable

      • Clarification of terminology and statistical markers (e.g., p-values) in Figure 1A would improve clarity. This has been improved.

      Comments on revisions:

      The manuscript has improved the clarity in general, and I think that it is suitable for publication. However, for future experiments and projects, I would like to insist in the relevance of validating the results in vivo using xenografts with 3D-primary patient-derived cell lines or GB organoids.

    3. Reviewer #2 (Public review):

      This study investigates the role of RAP2A in regulating asymmetric cell division (ACD) in glioblastoma stem cells (GSCs), bridging insights from Drosophila ACD mechanisms to human tumor biology. They focus on RAP2A, a human homolog of Drosophila Rap2l, as a novel ACD regulator in GBM is innovative, given its underexplored role in cancer stem cells (CSCs). The hypothesis that ACD imbalance (favoring symmetric divisions) drives GSC expansion and tumor progression introduces a fresh perspective on differentiation therapy. However, the dual role of ACD in tumor heterogeneity (potentially aiding therapy resistance) requires deeper discussion to clarify the study's unique contributions against existing controversies.

      Comments on revisions:

      More experiments as suggested in the original assessment of the submission are needed to justify the hypothesis drawn in the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors validate the contribution of RAP2A to GB progression. RAp2A participates in asymmetric cell division, and the localization of several cell polarity markers, including cno and Numb.

      Strengths:

      The use of human data, Drosophila models, and cell culture or neurospheres is a good scenario to validate the hypothesis using complementary systems.

      Moreover, the mechanisms that determine GB progression, and in particular glioma stem cells biology, are relevant for the knowledge on glioblastoma and opens new possibilities to future clinical strategies.

      Weaknesses:

      While the manuscript presents a well-supported investigation into RAP2A's role in GBM, several methodological aspects require further validation. The major concern is the reliance on a single GB cell line (GB5), which limits the generalizability of the findings. Including multiple GBM lines, particularly primary patient-derived 3D cultures with known stem-like properties, would significantly enhance the study's relevance.

      Additionally, key mechanistic aspects remain underexplored. Further investigation into the conservation of the Rap2l-Cno/aPKC pathway in human cells through rescue experiments or protein interaction assays would be beneficial. Similarly, live imaging or lineage tracing would provide more direct evidence of ACD frequency, complementing the current indirect metrics (odd/even cell clusters, Numb asymmetry).

      Several specific points require attention:

      (1) The specificity of Rap2l RNAi needs further confirmation. Is Rap2l expressed in neuroblasts or intermediate neural progenitors? Can alternative validation methods be employed?

      There are no available antibodies/tools to determine whether Rap2l is expressed in NB lineages, and we have not been able either to develop any. However, to further prove the specificity of the Rap2l phenotype, we have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line analyzed. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3.

      (2) Quantification of phenotypic penetrance and survival rates in Rap2l mutants would help determine the consistency of ACD defects.

      In the experiment previously mentioned (repetition of the original Rap2l RNAi line analysis along with two additional Rap2l RNAi lines) we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). Details are shown in the text (page 6, lines 142-148), in Supplementary Figure 3 and in the corresponding figure legend.

      (3) The observations on neurosphere size and Ki-67 expression require normalization (e.g., Ki-67+ cells per total cell number or per neurosphere size). Additionally, apoptosis should be assessed using Annexin V or TUNEL assays.

      The experiment of Ki-67+ cells was done considering the % of Ki-67+ cells respect the total cell number in each neurosphere. In the "Materials and methods" section it is well indicated: "The number of Ki67+ cells with respect to the total number of nuclei labelled with DAPI within a given neurosphere were counted to calculate the Proliferative Index (PI), which was expressed as the % of Ki67+ cells over total DAPI+ cells"

      Perhaps it was not clearly showed in the graph of Figure 5A. We have now changed it indicating: "% of Ki67+ cells/ neurosphere" in the "Y axis". 

      Unfortunately, we currently cannot carry out neurosphere cultures to address the apoptosis experiments. 

      (4) The discrepancy in Figures 6A and 6B requires further discussion.

      We agree that those pictures can lead to confusion. In the analysis of the "% of neurospheres with even or odd number of cells", we included the neurospheres with 2 cells both in the control and in the experimental condition (RAP2A). The number of this "2 cell-neurospheres" was very similar in both conditions (27,7 % and 27 % of the total neurospheres analyzed in each condition), and they can be the result of a previous symmetric or asymmetric division, we cannot distinguish that (only when they are stained with Numb, for example, as shown in Figure 6B). As a consequence, in both the control and in the experimental condition, these 2-cell neurospheres included in the group of "even" (Figure 6A) can represent symmetric or asymmetric divisions. However, in the experiment shown in Figure 6B, it is shown that in these 2 cellneurospheres there are more cases of asymmetric divisions in the experimental condition (RAP2A) than in the control.

      Nevertheless, to make more accurate and clearer the conclusions, we have reanalyzed the data taking into account only the neurospheres with 3-5-7 (as odd) or 4-6-8 (as even) cells. Likewise, we have now added further clarifications regarding the way the experiment has been analyzed in the methods.

      (5) Live imaging of ACD events would provide more direct evidence.

      We agree that live imaging would provide further evidence. Unfortunately, we currently cannot carry out neurosphere cultures to approach those experiments.

      (6) Clarification of terminology and statistical markers (e.g., p-values) in Figure 1A would improve clarity.

      We thank the reviewer for pointing out this issue. To improve clarity, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or not-significant changes in their expression levels.

      (7) Given the group's expertise, an alternative to mouse xenografts could be a Drosophila genetic model of glioblastoma, which would provide an in vivo validation system aligned with their research approach.

      The established Drosophila genetic model of glioblastoma is an excellent model system to get deep insight into different aspects of human GBM. However, the main aim of our study was to determine whether an imbalance in the mode of stem cell division, favoring symmetric divisions, could contribute to the expansion of the tumor. We chose human GBM cell lines-derived neurospheres because in human GBM it has been demonstrated the existence of cancer stem cells (glioblastoma or glioma stem cells -GSCs--). And these GSCs, as all stem cells, can divide symmetric or asymmetrically. In the case of the Drosophila model of GBM, the neoplastic transformation observed after overexpressing the EGF receptor and PI3K signaling is due to the activation of downstream genes that promote cell cycle progression and inhibit cell cycle exit. It has also been suggested that the neoplastic cells in this model come from committed glial progenitors, not from stem-like cells.

      With all, it would be difficult to conclude the causes of the potential effects of manipulating the Rap2l levels in this Drosophila system of GBM. We do not discard this analysis in the future (we have all the "set up" in the lab). However, this would probably imply a new project to comprehensively analyze and understand the mechanism by which Rap2l (and other ACD regulators) might be acting in this context, if it is having any effect. 

      However, as we mentioned in the Discussion, we agree that the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      Reviewer #2 (Public review):

      This study investigates the role of RAP2A in regulating asymmetric cell division (ACD) in glioblastoma stem cells (GSCs), bridging insights from Drosophila ACD mechanisms to human tumor biology. They focus on RAP2A, a human homolog of Drosophila Rap2l, as a novel ACD regulator in GBM is innovative, given its underexplored role in cancer stem cells (CSCs). The hypothesis that ACD imbalance (favoring symmetric divisions) drives GSC expansion and tumor progression introduces a fresh perspective on differentiation therapy. However, the dual role of ACD in tumor heterogeneity (potentially aiding therapy resistance) requires deeper discussion to clarify the study's unique contributions against existing controversies. Some limitations and questions need to be addressed.

      (1) Validation of RAP2A's prognostic relevance using TCGA and Gravendeel cohorts strengthens clinical relevance. However, differential expression analysis across GBM subtypes (e.g., MES, DNA-methylation subtypes ) should be included to confirm specificity.

      We have now included a Supplementary figure (Supplementary Figure 2), in which we show the analysis of RAP2A levels in the different GBM subtypes (proneural, mesenchymal and classical) and their prognostic relevance (i.e. the proneural subtype that presents RAP2A levels significantly higher than the others is the subtype that also shows better prognostic).

      (2) Rap2l knockdown-induced ACD defects (e.g., mislocalization of Cno/Numb) are well-designed. However, phenotypic penetrance and survival rates of Rap2l mutants should be quantified to confirm consistency.

      We have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. To determine the phenotypic penetrance, we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different Rap2l RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3 and in the corresponding figure legend. 

      (3) While GB5 cells were effectively used, justification for selecting this line (e.g., representativeness of GBM heterogeneity) is needed. Experiments in additional GBM lines (especially the addition of 3D primary patient-derived cell lines with known stem cell phenotype) would enhance generalizability.

      We tried to explain this point in the paper (Results). As we mentioned, we tested six different GBM cell lines finding similar mRNA levels of RAP2A in all of them, and significantly lower levels than in control Astros (Fig. 3A). We decided to focus on the GBM cell line called GB5 as it grew well (better than the others) in neurosphere cell culture conditions, for further analyses. We agree that the addition of at least some of the analyses performed with the GB5 line using other lines (ideally in primary patientderive cell lines, as the reviewer mentions) would reinforce the results. Unfortunately, we cannot perform experiments in cell lines in the lab currently. We will consider all of this for future experiments.

      (4) Indirect metrics (odd/even cell clusters, NUMB asymmetry) are suggestive but insufficient. Live imaging or lineage tracing would directly validate ACD frequency.

      We agree that live imaging would provide further evidence. Unfortunately, we cannot approach those experiments in the lab currently.

      (5) The initial microarray (n=7 GBM patients) is underpowered. While TCGA data mitigate this, the limitations of small cohorts should be explicitly addressed and need to be discussed.

      We completely agree with this comment. We had available the microarray, so we used it as a first approach, just out of curiosity of knowing whether (and how) the levels of expression of those human homologs of Drosophila ACD regulators were affected in this small sample, just as starting point of the study. We were conscious of the limitations of this analysis and that is why we followed up the analysis in the datasets, on a bigger scale. We already mentioned the limitations of the array in the Discussion:

      "The microarray we interrogated with GBM patient samples had some limitations. For example, not all the human genes homologs of the Drosophila ACD regulators were present (i.e. the human homologs of the determinant Numb). Likewise, we only tested seven different GBM patient samples. Nevertheless, the output from this analysis was enough to determine that most of the human genes tested in the array presented altered levels of expression"[....] In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (6) Conclusions rely heavily on neurosphere models. Xenograft experiments or patient-derived orthotopic models are critical to support translational relevance, and such basic research work needs to be included in journals.

      We completely agree. As we already mentioned in the Discussion, the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      (7) How does RAP2A regulate NUMB asymmetry? Is the Drosophila Rap2l-Cno/aPKC pathway conserved? Rescue experiments (e.g., Cno/aPKC knockdown with RAP2A overexpression) or interaction assays (e.g., Co-IP) are needed to establish molecular mechanisms.

      The mechanism by which RAP2A is regulating ACD is beyond the scope of this paper. We do not even know how Rap2l is acting in Drosophila to regulate ACD. In past years, we did analyze the function of another Drosophila small GTPase, Rap1 (homolog to human RAP1A) in ACD, and we determined the mechanism by which Rap1 was regulating ACD (including the localization of Numb): interacting physically with Cno and other small GTPases, such as Ral proteins, and in a complex with additional ACD regulators of the "apical complex" (aPKC and Par-6). Rap2l could be also interacting physically with the "Ras-association" domain of Cno (domain that binds small GTPases, such as Ras and Rap1). We have added some speculations regarding this subject in the Discussion:

      "It would be of great interest in the future to determine the specific mechanism by which Rap2l/RAP2A is regulating this process. One possibility is that, as it occurs in the case of the Drosophila ACD regulator Rap1, Rap2l/RAP2A is physically interacting or in a complex with other relevant ACD modulators."

      (8) Reduced stemness markers (CD133/SOX2/NESTIN) and proliferation (Ki-67) align with increased ACD. However, alternative explanations (e.g., differentiation or apoptosis) must be ruled out via GFAP/Tuj1 staining or Annexin V assays.

      We agree with these possibilities.  Regarding differentiation, the potential presence of increased differentiation markers would be in fact a logic consequence of an increase in ACD divisions/reduced stemness markers. Unfortunately, we cannot approach those experiments in the lab currently.

      (9) The link between low RAP2A and poor prognosis should be validated in multivariate analyses to exclude confounding factors (e.g., age, treatment history).

      We have now added this information in the "Results section" (page 5, lines 114-123).

      (10) The broader ACD regulatory network in GBM (e.g., roles of other homologs like NUMB) and potential synergies/independence from known suppressors (e.g., TRIM3) warrant exploration.

      The present study was designed as a "proof-of-concept" study to start analyzing the hypothesis that the expression levels of human homologs of known Drosophila ACD regulators might be relevant in human cancers that contain cancer stem cells, if those human homologs were also involved in modulating the mode of (cancer) stem cell division. 

      To extend the findings of this work to the whole ACD regulatory network would be the logic and ideal path to follow in the future.

      We already mentioned this point in the Discussion:

      "....it would be interesting to analyze in the future the potential consequences that altered levels of expression of the other human homologs in the array can have in the behavior of the GSCs. In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (11) The figures should be improved. Statistical significance markers (e.g., p-values) should be added to Figure 1A; timepoints/culture conditions should be clarified for Figure 6A.

      Regarding the statistical significance markers, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or notsignificant changes in their expression levels. 

      Regarding the experimental conditions corresponding to Figure 6A, those have now been added in more detail in "Materials and Methods" ("Pair assay and Numb segregation analysis" paragraph).

      (12) Redundant Drosophila background in the Discussion should be condensed; terminology should be unified (e.g., "neurosphere" vs. "cell cluster").

      As we did not mention much about Drosophila ACD and NBs in the "Introduction", we needed to explain in the "Discussion" at least some very basic concepts and information about this, especially for "non-drosophilists". We have reviewed the Discussion to maintain this information to the minimum necessary.

      We have also reviewed the terminology that the Reviewer mentions and have unified it.

      Reviewer #1 (Recommendations for the authors):

      To improve the manuscript's impact and quality, I would recommend:

      (1) Expand Cell Line Validation: Include additional GBM cell lines, particularly primary patient-derived 3D cultures, to increase the robustness of the findings.

      (2) Mechanistic Exploration: Further examine the conservation of the Rap2lCno/aPKC pathway in human cells using rescue experiments or protein interaction assays.

      (3) Direct Evidence of ACD: Implement live imaging or lineage tracing approaches to strengthen conclusions on ACD frequency.

      (4) RNAi Specificity Validation: Clarify Rap2l RNAi specificity and its expression in neuroblasts or intermediate neural progenitors.

      (5) Quantitative Analysis: Improve quantification of neurosphere size, Ki-67 expression, and apoptosis to normalize findings.

      (6) Figure Clarifications: Address inconsistencies in Figures 6A and 6B and refine statistical markers in Figure 1A.

      (7) Alternative In Vivo Model: Consider leveraging a Drosophila glioblastoma model as a complementary in vivo validation approach.

      Addressing these points will significantly enhance the manuscript's translational relevance and overall contribution to the field.

      We have been able to address points 4, 5 and 6. Others are either out of the scope of this work (2) or we do not have the possibility to carry them out at this moment in the lab (1, 3 and 7). However, we will complete these requests/recommendations in other future investigations.

      Reviewer #2 (Recommendations for the authors):

      Major Revision /insufficient required to address methodological and mechanistic gaps.

      (1) Enhance Clinical Relevance

      Validate RAP2A's prognostic significance across multiple GBM subtypes (e.g., MES, DNA-methylation subtypes) using datasets like TCGA and Gravendeel to confirm specificity.

      Perform multivariate survival analyses to rule out confounding factors (e.g., patient age, treatment history).

      (2) Strengthen Mechanistic Insights

      Investigate whether the Rap2l-Cno/aPKC pathway is conserved in human GBM through rescue experiments (e.g., RAP2A overexpression with Cno/aPKC knockdown) or interaction assays (e.g., Co-IP).

      Use live-cell imaging or lineage tracing to directly validate ACD frequency instead of relying on indirect metrics (odd/even cell clusters, NUMB asymmetry).

      (3) Improve Model Systems & Experimental Design

      Justify the selection of GB5 cells and include additional GBM cell lines, particularly 3D primary patient-derived cell models, to enhance generalizability.

      It is essential to perform xenograft or orthotopic patient-derived models to support translational relevance.

      (5) Address Alternative Interpretations

      Rule out other potential effects of RAP2A knockdown (e.g., differentiation or apoptosis) using GFAP/Tuj1 staining or Annexin V assays.

      Explore the broader ACD regulatory network in GBM, including interactions with NUMB and TRIM3, to contextualize findings within known tumor-suppressive pathways.

      (6) Improve Figures & Clarity

      Add statistical significance markers (e.g., p-values) in Figure 1A and clarify timepoints/culture conditions for Figure 6A.

      Condense redundant Drosophila background in the discussion and ensure consistent terminology (e.g., "neurosphere" vs. "cell cluster").

      We have been able to address points 1, partially 3 and 6. Others are either out of the scope of this work or we do not have the possibility to carry them out at this moment in the lab. However, we are very interested in completing these requests/recommendations and we will approach that type of experiments in other future investigations.

    1. eLife Assessment

      This important study reveals that connexin43 (Cx43) hemichannels are directly activated by CO₂ through a conserved carbamylation motif, extending a mechanism previously described for β-connexins to α-connexins. The evidence is convincing, supported by complementary biochemical and electrophysiological analyses showing CO₂-induced hemichannel opening and ATP release in cultured cells and hippocampal slices. These findings advance our understanding of connexin regulation by metabolic gases and will be of broad interest to researchers studying cell communication, neural signaling, and gasotransmitter biology.

    2. Reviewer #1 (Public review):

      Summary:

      This study builds on previous work demonstrating that several beta connexins (Cx26, Cx30 and Cx32) have a carbamylation motif which renders them sensitive to CO2. In response to CO2, hemichannels composed of these connexins open, enabling diffusion of small molecules (such as ATP) between the cytosol and extracellular environment. Here, the authors have identified that an alpha connexin, Cx43, also contains a carbamylation motif, and they demonstrate that CO2 opens Cx43 hemichannels. Most of the study involves using transfected cells expressing wild-type and mutant Cx43 to define amino acids required for CO2 sensitivity. Hippocampal tissue slices in culture were used to show that CO2-induced synaptic transmission was affected by Cx43 hemichannels, providing a physiological context. The authors point out that the Cx43 gene significantly diverges from the beta connexins that are CO2 sensitive, suggesting that the conserved carbamylation motif was present before the alpha and beta connexin genes diverged.

      Strengths:

      The molecular analysis defining the amino acids which contribute to the CO2 sensitivity of Cx43 is a major strength of the study. The rigor of analysis was strengthened by using three independent assays for hemichannel opening: dye uptake, patch clamp channel measurements and ATP secretion. The resulting analysis identified key lysines in Cx43 that were required for CO2-mediated hemichannel opening. A double K to E Cx43 mutant produced a construct that produced hemichannels that were constitutively open, which further strengthened the analysis.

      Using hippocampal tissue sections to demonstrate that CO2 can influence field excitatory postsynaptic potentials (fEPSPs) provides a native context for CO2 regulation of Cx43 hemichannels. Cx43 mutations associated with Oculodentodigital Dysplasia (ODDD) inhibited CO2-induced hemichannel opening, although the mechanism by which this occurs was not elucidated.

      Cytosolic pH was measured and it was further demonstrated that Cx43 hemichannels composed of untagged Cx43 are sensitive to CO2.

      A molecular phylogenetic survey was performed which identified several other non-beta connexins that have a putative carbamylation motif. How this relates to connexin evolution was added to the discussion.

      Weaknesses:

      Cultured cells are typically grown in incubators containing 5% CO2 which is ~40 mmHg. Determining compensatory mechanisms that enable the cells to be viable if Cx43 hemichannels are open at this PCO2 would strengthen the study.

      Experiments using Gap26 to inhibit Cx43 hemichannels in fEPSP measurements used a scrambled peptide as a control. Including gap peptides specifically targeting Cx26, Cx30 and Cx32 as additional controls would strengthen the study, since the tissue sections have a complex pattern of connexin expression.

    3. Reviewer #2 (Public review):

      Summary:

      This paper examines the CO2 sensitivity of Cx43 hemichannels and gap junctional channels in transiently transfected Hela cells using several different assays including ethidium dye uptake, ATP release, whole cell patch clamp recordings and an imaging assay of gap junctional dye transfer. The results show that raising pCO2 from 20 to 70 mmHg (at a constant pH of 7.3) cause an increase in opening of Cx43 hemichannels but did not block Cx43 gap junctions. This study also showed that raising pCO2 from 20 to 35 mm Hg resulted in an increase in synaptic strength in hippocampal rat brain slices, presumably due to downstream ATP release, suggesting that the CO2 sensitivity of Cx43 may be physiologically relevant. As a further test of the physiological relevance of the CO2 sensitivity of Cx43, it was shown that two pathological mutations of Cx43 that are associated with ODDD caused loss of Cx43 CO2-sensitivity. Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26. To understand the structural changes involved in CO2 sensitivity, a number of mutations were made in Cx43 sites thought to be the equivalent of those known to be involved in the CO2 sensitivity of Cx26 and the CO2 sensitivity of these mutants was investigated.

      Strengths:

      This study shows that the apparent lack of functional Cx43 hemichannels observed in a number of previous in vitro function studies may be due to the use of HEPES to buffer the external pH. When Cx43 hemichannels were studied in external solutions in which CO2/bicarbonate was used to buffer pH instead of HEPES, Cx43 hemichannels showed significantly higher levels of dye uptake, ATP release, and ionic conductance. These findings may have major physiological implications since Cx43 hemichannels are found in many organs throughout the body including the brain, heart and immune system.

      Weaknesses:

      Interpretation of the site-directed mutation studies is complicated. Although Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26, the results of site-directed mutation studies were inconsistent with a simple model in which K144 and K105 interact following carbamylation to cause the opening of Cx43 hemichannels.

      Secondly, although it is shown that two Cx43 ODDD associated mutations show a loss of CO2 sensitivity, there is no evidence that the absence of CO2 sensitivity is involved in the pathology of ODDD.

    4. Reviewer #3 (Public review):

      In this paper, authors aimed to investigate carbamylation effects on the function of Cx43-based hemichannels. Such effects have previously been characterized for other connexins, e.g. for Cx26, which display increased hemichannel (HC) opening and closure of gap junction channels upon exposure to increased CO2 partial pressure (accompanied by increased bicarbonate to keep pH constant). The authors used HeLa cells transiently transfected with Cx43 to investigate CO2-dependent carbamylation effects on Cx43 HC function. In contrast to Cx43-based gap junction channels that are here reported to be insensitive to PCO2 alterations, they provide evidence that Cx43 HC opening is highly dependent on the PCO2 pressure in the bath solution, over a range of 20 up to 70 mmHg encompassing the physiologically normal resting level of around 40 mmHg. They furthermore identified several Cx43 residues involved in Cx43 HC sensitivity to PCO2: K105, K109, K144 & K234; mutation of 2 or more of these AAs is necessary to abolish CO2 sensitivity. The subject is interesting and the results indicate that a fraction of HCs is open at a physiological 40 mmHg PCO2, which differs from the situation under HEPES buffered solutions where HCs are mostly closed under resting conditions. The mechanism of HC opening with CO2 gassing is linked to carbamylation and authors pinpointed several Lys residues involved in this process. Overall, the work is interesting as it shows that Cx43 HCs have a significant open probability under resting conditions of physiological levels of CO2 gassing, probably applicable to/relevant for brain, heart and other Cx43 expressing organs. The paper gives a detailed account on various experiments performed (dye uptake, electrophysiology, ATP release to assess HC function) and results concluded from those. They further consider many candidate carbamylation sites by mutating them to negatively charged Glu residues. The paper finalizes with hippocampal slice work showing evidence for connexin-dependent increases of the EPSP amplitude that could be inhibited by HC inhibition with Gap26 (Fig. 10). Another line of evidence comes from the Cx43-linked ODDD genetic disease whereby L90V as well as the A44V mutations of Cx43 prevented the CO2 induced hemichannel opening response (Fig. 11). Although the paper is interesting, in its present state it suffers from (i) a problematic Fig. 3, precluding interpretation of the data shown, and (ii) the poor use of hemichannel inhibitors that are necessary to strengthen the evidence in the crucial experiment of Fig. 2 and others.

      Comments on revisions:

      The traces in Fig.2B show that the HC current is inward at 20 mmHg PCO2, while it switches to an outward current at 55mmHg PCO2. HCs are non-selective channels, so their current should switch direction around 0 mV but not around -50 mV. As such, the -50 mV switching point indicates involvement of another channel distinct from non-selective Cx43 hemichannels. In the revised version, this problem has not been solved nor addressed. Additionally, I identified another problem in that the experimental traces shown lack a trace at the baseline condition of PCO2 35mmHg, while the summary graph depicts a data point. Not showing a trace at baseline PCO2 35mmHg renders data interpretation in the summary graph questionable.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study builds on previous work demonstrating that several beta connexins (Cx26, Cx30, and Cx32) have a carbamylation motif which renders them sensitive to CO<sub>2</sub>. In response to CO<sub>2</sub>, hemichannels composed of these connexins open, enabling diffusion of small molecules (such as ATP) between the cytosol and extracellular environment. Here, the authors have identified that an alpha connexin, Cx43, also contains a carbamylation motif, and they demonstrate that CO<sub>2</sub> opens Cx43 hemichannels. Most of the study involves using transfected cells expressing wildtype and mutant Cx43 to define amino acids required for CO<sub>2</sub> sensitivity. Hippocampal tissue slices in culture were used to show that CO<sub>2</sub>-induced synaptic transmission was affected by Cx43 hemichannels, providing a physiological context. The authors point out that the Cx43 gene significantly diverges from the beta connexins that are CO<sub>2</sub> sensitive, suggesting that the conserved carbamylation motif was present before the alpha and beta connexin genes diverged. 

      Strengths: 

      (1) The molecular analysis defining the amino acids that contribute to the CO<sub>2</sub> sensitivity of Cx43 is a major strength of the study. The rigor of analysis was strengthened by using three independent assays for hemichannel opening: dye uptake, patch clamp channel measurements, and ATP secretion. The resulting analysis identified key lysines in Cx43 that were required for CO<sub>2</sub>-mediated hemichannel opening. A double K to E Cx43 mutant produced a construct that produced hemichannels that were constitutively open, which further strengthened the analysis. 

      (2) Using hippocampal tissue sections to demonstrate that CO<sub>2</sub> can influence field excitatory postsynaptic potentials (fEPSPs) provides a native context for CO<sub>2</sub> regulation of Cx43 hemichannels. Cx43 mutations associated with Oculodentodigital Dysplasia (ODDD) inhibited CO<sub>2</sub>-induced hemichannel opening, although the mechanism by which this occurs was not elucidated. 

      Weaknesses: 

      (1) Cx43 channels are sensitive to cytosolic pH, which will be affected by CO<sub>2</sub>. Cytosolic pH was not measured, and how this affects CO<sub>2</sub>-induced Cx43 hemichannel activity was not addressed. 

      We have now addressed this with intracellular pH measurements and removal of the C-terminal pH sensor from Cx43 -the hemichannel remains CO<sub>2</sub> sensitive.

      (2) Cultured cells are typically grown in incubators containing 5% CO<sub>2</sub>, which is ~40 mmHg. It is unclear how cells would be viable if Cx43 hemichannels are open at this PCO2. 

      The cells look completely healthy with normal morphology and no sign of excessive cell death in the cultures. Presumably they have ways of compensating for the effects of partially open Cx43 hemichannels.

      (3) Experiments using Gap26 to inhibit Cx43 hemichannels in fEPSP measurements used a scrambled peptide as a control. Analysis should also include Gap peptides specifically targeting Cx26, Cx30, and Cx32 as additional controls. 

      We don’t feel this is necessary given the extensive prior literature in hippocampus showing the effect of ATP release via open Cx43 hemichannels on fEPSP amplitude that used astrocytic specific knockout of Cx43 and Gap26 (doi: 10.1523/jneurosci.0015-14.2014).

      (4) The mechanism by which ODDD mutations impair CO2-mediated hemichannel opening was not addressed. Also, the potential roles for inhibiting Cx43 hemichannels in the pathology of ODDD are unclear. 

      These pathological mutations that alter CO<SUB>2</SUB> sensitivity are similar to pathological mutation in Cx26 and Cx32, which also remove CO<SUB>2</SUB> sensitivity. Our cryo-EM studies on Cx26 give clues as to why these mutations have this effect -they alter conformational mobility of the channel (Brotherton et al 2022 doi: 10.1016/j.str.2022.02.010 and Brotherton et al 2024 doi: 10.7554/eLife.93686). We assume that similar considerations apply to Cx43, but this requires improved cryoEM structures of Cx43 hemichannels at differing levels of PCO<SUB>2</SUB>.

      We agree that the link between loss of CO<SUB>2</SUB> sensitivity of Cx43 and ODDD is not established and have revised the text to make this clear.

      (5) CO2 has no effect on Cx43-mediated gap junctional communication as opposed to Cx26 gap junctions, which are inhibited by CO2. The molecular basis for this difference was not determined. 

      Cx26 gap junction channels are so far unique amongst CO<SUB>2</SUB> sensitive connexins in being closed by CO<SUB>2</SUB>. We have addressed the mechanism by which this occurs in Nijjar et al 2025 DOI: 10.1113/JP285885 -the requirement of carbamylation of K108 in Cx26 (in addition to K125) for GJC closure.

      (6) Whether there are other non-beta connexins that have a putative carbamylation motif was not addressed. Additional discussion/analysis of how the evolutionary trajectory for Cx43 maintaining a carbamylation motif is unique for non-beta connexins would strengthen the study. 

      We have performed a molecular phylogenetic survey to show that the carbamylation motif occurs across the alpha connexin clade and have shown that Cx50 is indeed CO<SUB>2</SUB> sensitive (doi: 10.1101/2025.01.23.634273). This is now in Fig 12.

      Reviewer #2 (Public review): 

      Summary: 

      This paper examines the CO<SUB>2</SUB>  sensitivity of Cx43 hemichannels and gap junctional channels in transiently transfected Hela cells using several different assays, including ethidium dye uptake, ATP release, whole cell patch clamp recordings, and an imaging assay of gap junctional dye transfer. The results show that raising pCO<sub>2</sub> from 20 to 70 mmHg (at a constant pH of 7.3) causes an increase in opening of Cx43 hemichannels but does not block Cx43 gap junctions. This study also showed that raising pCO<SUB>2</SUB> from 20 to 35 mm Hg resulted in an increase in synaptic strength in hippocampal rat brain slices, presumably due to downstream ATP release, suggesting that the CO<SUB>2</SUB> sensitivity of Cx43 may be physiologically relevant. As a further test of the physiological relevance of the CO<sub>2</sub> sensitivity of Cx43, it was shown that two pathological mutations of Cx43 that are associated with ODDD caused loss of Cx43 CO<sub>2</sub>-sensitivity. Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26. To understand the structural changes involved in CO<SUB>2</SUB> sensitivity, a number of mutations were made in Cx43 sites thought to be the equivalent of those known to be involved in the CO<SUB>2</SUB> sensitivity of Cx26, and the CO<SUB>2</SUB> sensitivity of these mutants was investigated. 

      Strengths: 

      This study shows that the apparent lack of functional Cx43 hemichannels observed in a number of previous in vitro function studies may be due to the use of HEPES to buffer the external pH. When Cx43 hemichannels were studied in external solutions in which CO<SUB>2</SUB>/bicarbonate was used to buffer pH instead of HEPES, Cx43 hemichannels showed significantly higher levels of dye uptake, ATP release, and ionic conductance. These findings may have major physiological implications since Cx43 hemichannels are found in many organs throughout the body, including the brain, heart, and immune system. 

      Weaknesses: 

      (1) Interpretation of the site-directed mutation studies is complicated. Although Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26, the results of site-directed mutation studies were inconsistent with a simple model in which K144 and K105 interact following carbamylation to cause the opening of Cx43 hemichannels. 

      The mechanism of opening of Cx43 is more complex than that of Cx26, Cx32 and Cx50 and involves more Lys residues. The 4 Lys residues in Cx43 that are involved in opening the hemichannel have their equivalents in Cx26, but in Cx26 these additional residues seem to be involved in the closing of the GJC rather than opening of the hemichannel (see above). Cx50 is simpler and involves only two Lys residues (doi: 10.1101/2025.01.23.634273), which are equivalent to those in Cx26.

      (2) Secondly, although it is shown that two Cx43 ODDD-associated mutations show a loss of CO<sub>2</sub> sensitivity, there is no evidence that the absence of CO2 sensitivity is involved in the pathology of ODD

      We agree, but this is probably because this has not been directly tested by experiment, as the CO<Sub>2</sub> sensitivity of Cx43 was not previously known. As mentioned above we have revised the text to ensure that this is clear.

      Reviewer #3 (Public review): 

      In this paper, the authors aimed to investigate carbamylation effects on the function of Cx43-based hemichannels. Such effects have previously been characterized for other connexins, e.g., for Cx26, which display increased hemichannel (HC) opening and closure of gap junction channels upon exposure to increased CO<sub>2</sub> partial pressure (accompanied by increased bicarbonate to keep pH constant). 

      The authors used HeLa cells transiently transfected with Cx43 to investigate CO<sub>2</sub> dependent carbamylation effects on Cx43 HC function. In contrast to Cx43-based gap junction channels that are reported here to be insensitive to PCO<sub>2</sub> alterations, they provide evidence that Cx43 HC opening is highly dependent on the PCO2 pressure in the bath solution, over a range of 20 up to 70 mmHg encompassing the physiologically normal resting level of around 40 mmHg. They furthermore identified several Cx43 residues involved in Cx43 HC sensitivity to PCO2: K105, K109, K144 & K234; mutation of 2 or more of these AAs is necessary to abolish CO<sub>2</sub> sensitivity. The subject is interesting and the results indicate that a fraction of HCs is open at a physiological 40 mmHg PCO<sub>2</sub>, which differs from the situation under HEPES buffered solutions where HCs are mostly closed under resting conditions. The mechanism of HC opening with CO<sub>2</sub> gassing is linked to carbamylation, and the authors pinpointed several Lys residues involved in this process. 

      Overall, the work is interesting as it shows that Cx43 HCs have a significant open probability under resting conditions of physiological levels of CO<sub>2</sub> gassing, probably applicable to the brain, heart, and other Cx43 expressing organs. The paper gives a detailed account of various experiments performed (dye uptake, electrophysiology, ATP release to assess HC function) and results concluded from those. They further consider many candidate carbamylation sites by mutating them to negatively charged Glu residues. The paper ends with hippocampal slice work showing evidence for connexin-dependent increases of the EPSP amplitude that could be inhibited by HC inhibition with Gap26 (Figure 10). Another line of evidence comes from the Cx43-linked ODDD genetic disease, whereby L90V as well as the A44V mutations of Cx43 prevented the CO<sub>2</sub>-induced hemichannel opening response (Figure 11). Although the paper is interesting, in its present state, it suffers from (i) a problematic Figure 3, precluding interpretation of the data shown, and (ii) the poor use of hemichannel inhibitors that are necessary to strengthen the evidence in the crucial experiment of Figure 2 and others. 

      The panels in Figure 3 were mislabelled in the accompanying legend possibly leading to some confusion. This has now been corrected.

      We disagree that hemichannel blockers are needed to strengthen the evidence in Figure 2 and other figures. Our controls show that the CO<sub>2</sub>-sensitive responses absolutely requires expression of Cx43 and was modified by mutations of Cx43. It is hard to see how this evidence would be strengthened by use of peptide inhibitors or other blockers of hemichannels that may not be completely selective.

      Reviewing Editor Comments:

      (1) Improve electrophysiological evidence, addressing concerns about the initial experiment and including peptide inhibitor data where applicable. 

      We think the concerns about the electrophysiological evidence arise from a misunderstanding because we gave insufficient information about how we conducted the experiments. We have now provided a much more complete legend, added explanations in the text and given more detail in the Methods. We further respond to the reviewer below.

      We do not agree on the necessity of the peptide inhibitor to demonstrate dependence on Cx43.  We have shown that parental HeLa cells do not release ATP to changes in PCO<sub>2</sub> or voltage (Fig 2D; Butler & Dale 2023, 10.3389/fncel.2023.1330983; Lovatt et al 2025, 10.1101/2025.03.12.642803, 10.1101/2025.01.23.634273). Our previous papers have shown many times that parental HeLa cells do not load with dye to CO<sub>2</sub> or zero Ca<sup>2+</sup> (e.g. Huckstepp et al 2010, 10.1113/jphysiol.2010.192096; Meigh et al 2013, 10.7554/eLife.01213; Meigh et al 2014, 10.7554/eLife.04249), and we have shown that parental HeLa cells do not exhibit the same CO<sub>2</sub> dependent change in whole cell conductance that the Cx43-expressing cells do (Fig 2B). In addition, we shown that mutating key residues in Cx43 alters both CO<sub>2</sub>-sensitive release of ATP and the CO<sub>2</sub>-dependent dye loading without affecting the respective positive control. To bolster this, we have included data for the K144R mutation as a supplement to Fig 3. Given the expense of Gap26 it is impractical to include this as a standard control and unnecessary given the comprehensive controls outlined.

      Collectively, these data show that the responses to CO<sub>2</sub> require expression of Cx43 and can be modified by mutation of Cx43.

      (2) Strengthen the manuscript by measuring the effects of CO on cytosolic pH and Cx43 hemichannel opening. Consider using tail truncation mutants to assess the role of the C-terminal pH sensor in CO-mediated channel opening.

      We agree and have performed the suggested experiments to address this issue.

      (3) Investigate the effect of expressing the K105E/K109E Cx43 double mutant on cell viability.

      In our experiments the cells look completely healthy based on their morphology in brightfield microscopy and growth rates. 

      (4) Discuss and analyze the uniqueness of Cx43 among alpha connexins in maintaining the carbamylation motif.

      now discuss this -Cx43 is not unique. We have added a molecular phylogenetic survey of the alpha connexin clade in Fig 12. Apart from Cx37, the carbamylation motif appears in all the other members of the clade (but not necessarily in the human orthologue). In a different MS, currently posted on bioRxiv, we have documented the CO<sub>2</sub> sensitivity of Cx50 and its dependence on the motif.

      (5) Consider omitting data on ODDD-associated mutations unless there is evidence linking CO<sub>2</sub> sensitivity to disease pathology.

      This experiment is observational, and we are not making claims that there is a direct causal link. Removing the ODDD mutant findings would lose potentially useful information for anyone studying how these mutations alter channel function. We have reworded the text to ensure that we say that the link between loss of CO<sub>2</sub> sensitivity and ODDD remains unproven.

      (6) Justify the choice of high K<sup>⁺</sup> and low external calcium as a positive control in ATP release experiments.

      These two manipulations can open the hemichannel independently of the CO<sub>2</sub> stimulus. Extracellular Ca<sup>2+</sup> is well known to block all connexin hemichannels, and Cx43 is known to be voltage sensitive. The depolarisation from high K<sup>+</sup> is effective at opening the hemichannel and we preferred this as a more physiological way of opening the Cx43 hemichannel. We have added some explanatory text.

      (7) Clarify whether Cx43A44V or Cx43L90V mutations block gap junctional coupling.

      This is an interesting point. Since Cx43 GJCs are not CO<sub>2</sub> sensitive we feel this is beyond the scope of our paper. 

      (8) Discuss the potential implications of pCO₂ changes on myocardial function through alterations in intracellular pH.

      We have modified the discussion to consider this point.

      Reviewer #1 (Recommendations for the authors):

      (1) Measurements of the effects of CO<sub>2</sub> on cytosolic pH/Cx43 hemichannel opening would strengthen the manuscript. Since the pH sensor of Cx43 is on the C terminus, the authors could consider making tail truncation mutants to see how this affects CO<sub>2</sub>-mediated Cx43 channel opening.

      We have done this (truncating after residue 256) -the channel remains highly CO<sub>2</sub> and voltage sensitive. We have also documented the effect of the  hypercapnic solutions on intracellular pH measured with BCECF. These new data are now included as figure supplements to Figure 2.

      (2) What is the impact of expressing the K105E / K109E Cx43 double mutant on cell viability?

      There was no obvious observed impact, cell density was as expected (no evidence of increased cell death), brightfield and fluorescence visualisation indicated normal healthy cells. We have added a movie (Fig 9, movie supplement 1) to show the effect of La<sup>3+</sup> on the GRAB<sub>ATP</sub> signal in cells expressing Cx43<sup>K105E, K109E</sup> so readers can appreciate the morphology and its stability during the recording.

      (3) A quick look at other alpha connexins suggested that Cx43 was unique among alpha connexins in maintaining the carbamylation motif. This merits additional discussion/ analysis.

      This is an interesting point. Cx43 is not unique in the alpha clade in having the carbamylation motif as a number of other human alpha connexins also possess: Cx50, Cx59 and Cx62, and non-human alpha connexins (Cx40, Cx59, Cx46) also possess the motif. We have shown that Cx50 is CO<sub>2</sub> sensitive. We have performed a brief molecular phylogenetic analysis of the alpha connexon clade to highlight the occurrence of the carbamylation motif. This is now presented as Fig 12 to go with the accompanying discussion.

      (4) There were some minor writing issues that should be addressed. For instance, fEPSP is not defined. Also, insets showing positive controls in some experiments were not described in the figure legends.

      We have corrected these issues.

      Reviewer #2 (Recommendations for the authors):

      (1) I would omit the data on the ODDD-associated mutations since there is no evidence that loss of CO<sub>2</sub> sensitivity plays an important role in the underlying disease pathology.

      We are not making the claim CO<sub>2</sub> loss leads to the underlying pathology and have reviewed the text to ensure that we clearly express that this is a correlation not a cause. We think this is worth retaining as many pathological mutations in other CO<sub>2</sub> sensitive connexins (Cx26, Cx32 and Cx50) cause loss of CO<sub>2</sub> sensitivity, and this information may be helpful to other researchers.

      (2) Why is high K+ rather than low external calcium used as a positive control in ATP release experiments?

      We used of high K<sup>+</sup> and depolarisation as a positive control as regard this as a more physiological stimulus than the low external Ca<sup>2+</sup>.

      (3) Does Cx43A44V or Cx43L90V block gap junctional coupling?

      An interesting question but we have not examined this.

      (4) Provide references for biophysical recordings of Cx43 hemichannels performed in HEPES-buffered salines, which document Cx43 hemichannels as being shut.

      have added the original and some later references which examine Cx43 hemichannel gating in HEPES buffer and shows the need for substantial depolarisation to induce channel opening.

      (5) In the heart muscle, changes in PCO<sub>2</sub> have long been hypothesized to cause changes in myocardial function by changing pHi.

      This is true and we now add some discussion of this point. Now that we know that Cx43 is directly sensitive to CO<sub>2</sub> a direct action of CO<sub>2</sub> cannot be ruled out and careful experimentation is required to test this possibility. 

      Reviewer #3 (Recommendations for the authors):

      (1) Page 3: "... homologs of K125 and R104 ... ": the context is linked to Cx26, so Cx26 needs to be added here.

      Done

      (2) Page 4 text and related Figure 2:

      (a) Figure 2A&B: PCO2-dependent Cx43 HC opening is clearly present in the carboxy-fluorescein dye uptake experiments (Figure 2A) as well as in the electrophysiological experiments (Figure 2B). The curves look quite different between these two distinct readouts: dye uptake doubles from 20 to 70 mmHg in Figure 2A while the electrophysiological data double from 45 to 70 mmHg in Figure 2B. These responses look quite distinct and may be linked to a non-linearity of the dye uptake assay or a problem in the electrophysiological measurements of Figure 2B discussed in the next point.

      Different molecules/ions may have different permeabilities through the channel, which could explain the observed difference. Also, there is some contamination of the whole cell conductance change with another conductance (evident in recordings from parental HeLa cells). This is evident particularly at 70 mmHg. If this contaminating conductance were subtracted from the total conductance in the Cx43 expressing cells, then the dose response relations would be more similar. However, we are reluctant to add this additional data processing step to the paper.

      (b) The traces in Figure 2B show that the HC current is inward at 20 mmHg PCO2, while it switches to an outward current at 55mmHg PCO2. HCs are non-selective channels, so their current should switch direction around 0 mV but not at -50 mV. As such, the -50 mV switching point indicates involvement of another channel distinct from non-selective Cx43 hemichannels.

      We think that our incomplete description in the legend led to this misunderstanding. We used a baseline of 35 mmHg (where the channels will be slightly open) and changed to 20 mmHg to close them (or to higher PCO<sub>2</sub> to open them from this baseline), hence a decrease in conductance and loss of outward current for 20 mmHg. The holding potential for the recordings and voltage steps were the same in all recordings. We have now edited the legend and added more information into the methods to clarify this and how we constructed the dose response curve.

      We agree that Cx43 hemichannels are relatively nonselective and would normally be expected to have a reversal potential around 0 mV, but we are using K-Gluconate and the lowered reversal potential (~-65 mV) is likely due to poor permeation of this anion via Cx43.

      (c) A Hill slope of 6 is reported for this curve, which is extremely steep. The paper does not provide any further consideration, making this an isolated statement without any theoretical framework to understand the present finding in such context (i.e., in relation to the PCO2 dependency of Cx channels).

      Yes, we agree -it seems to be the case with all CO<sub>2</sub> sensitive connexins that we have looked at that the Hill coefficient versus CO<sub>2</sub> is >4. Hemichannels are of course hexameric so there is potential for 6 CO<sub>2</sub> molecules to be bound and extensive cooperativity. We have modified the text to give greater context.

      (d) A further remark to Figure 2 is that it does not contain any experiment showing the effect of Cx43 hemichannel inhibition with a reliable HC inhibitor such as Gap26, which is only used in the penultimate illustration of Figure 10. Gap26 should be used in Figure 2 and most of the other figures to show evidence of HC contribution. The lanthanum ions used in Figure 9 are a very non-specific hemichannel blocker and should be replaced by experiments with Gap26.

      We have addressed the first part of this comment above.

      We agree that La<sup>3+</sup> blocks all hemichannels, but in the context of our experiments and the controls we have performed it is entirely adequate and supports our conclusions. Our controls show (mentioned above and below) show that the expression of Cx43 is absolutely required for CO<sub>2</sub>-dependent ATP release (and dye loading). In Figure 9 our use of La<sup>3+</sup> was to show the presence of a constitutively open Cx43 mutant hemichannel. Gap26 would add little to this. Our further controls show that with expression of Cx43<sup>WT</sup> La<sup>3+</sup> did nothing to the ATP signal under baseline conditions (20 mmHg) supporting our conclusion that the mutant channels are constitutively open.

      (e) As the experiments of Figure 2 form the basis of what is to follow, the above remarks cast doubt on the robustness of the experiments and the data produced.

      We disagree, our results are extremely robust: 1) we have used three independent assays confirm the presence of the response; 2) parental HeLa cells do not release ATP, dye load or show large conductance changes to CO<sub>2</sub> showing the absolute requirement for expression of Cx43; 3) mutations of Cx43 (in the carbamylation motif) alter the CO<sub>2</sub> evoked ATP release and dye loading giving further confirmation of Cx43 as the conduit for ATP release and dye loading; and 4) we use standard positive controls (0 Ca<sup>²</sup>, high K<sup></sup>) to confirm cells still have functional channels for those mutations that modified CO<sub>2</sub> sensitivity.

      (f) The sentence "Cells transfected with GRAB-ATP only, showed ... " should be

      modified to "In contrast, cells not expressing Cx43 showed no responses to any applied CO2 concentration as concluded from GRAB-ATP experiments"

      We have modified the text.

      (3) Page 5 and Figures 3 & 4:

      (a) Figure 3 illustrates results obtained with mutations of 4 distinct Lys residues. However, the corresponding legend indicates mutations that are different from the ones shown in the corresponding illustrations, making it impossible to reliably understand and interpret the results shown in panels A-E.

      Thanks for pointing this out. Our apologies, we modified the figure so that the order of the images matched the order of the graph (and the legend) but then forgot to put the new version of the figure in the text. We have now corrected this so that Figure and legend match.

      (b) Figure 4 lacks control WT traces!

      The controls for this (showing that parental HeLa cells do not release ATP in response to CO<sub>2</sub> or depolarisation) are shown in Figure 2.

      (c) Figure 4, Supplement 1: High Hill coefficients of 10 are shown here, but they are not discussed anywhere, as is also the case for the remark on p.4. A Hill steepness of 10 is huge and points to many processes potentially involved. As reported above, these data are floating around in the manuscript without any connection.

      Yes, we agree this is very high and surprising. It may reflect as mentioned above the hexameric nature of the channel and that 4 Lys residues seem to be involved. We have used this equation to give some quantitative understanding of the effect of the mutations on CO<sub>2</sub> sensitivity and still think this is useful. We have no further evidence to interpret these values one way or the other.

      (4) Page 6: Carbamate bridges are proposed to be formed between K105 and K144, and between K109 and K234. The first three of these Lysine residues are located in the 55aa long cytoplasmic loop of Cx43, while K234 is in the juxta membrane region involved in tubulin interactions. Both K144 and and K234 are involved in Cx43 HC inhibition: K144 is the last aa of the L2 peptide (D119-K144 sequence) that inhibits Cx43 hemichannels while K234 is the first aa of the TM2 peptide that reduces hemichannel presence in the membrane (sequence just after TM4, at the start of the C-tail). This context should be added to increase insight and understanding of the CO2 carbamylation effects on Cx43 hemichannel opening.

      Thanks for suggesting this. We have added some discussion of CT to CL interactions in the context of regulation by pH and [Ca<sup>2+</sup>].

      (5) Page 7: The Cx43 ODDD A44V and L90V mutations lead to loss of pCO2 sensitivity in dye loading and ATP assays. However, A44V located in EL1 is reportedly associated with Cx43 HC activation, while L90V in TM2 is associated with HC inhibition. Remarkably, these mutations are focused on non-Lys residues, which brings up the question of how to link this to the paper's main thread.

      This follows the pattern that we have seen for other mutations such as A40V, A88V in Cx26 and several CMTX mutations of Cx32. Our cryoEM structures of Cx26 suggest that these mutations alter the flexibility of the molecule and hence abolish CO<sub>2</sub> sensitivity. We have reworded the text to avoid giving the impression that there is a demonstrated link between loss of CO<sub>2</sub> sensitivity of Cx43 and pathology.

      (6) Page 8: HCs constitutively open - 'constutively' perhaps does not have the best connotation as it is not related to HC constitution but CO2 partial pressure.

      Yes, we agree and have reworded this.

      (7) Page 9: "in all subtypes" -> not clear what is meant - do you mean "in all cell types"?

      We agree this is unclear -it refers to all astrocytic subtypes. We have amended the text.

      (8) Page 10: Composition of hypocapnic recording solution: bubbling description is incomplete "95%O2/5%" and should be "95%O2/5%CO2".

      Changed.

      (9) Page 11: Composition of zero Ca<sup>²⁺</sup> hypocapnic recording solution: perhaps better to call this "nominally Ca<sup>²⁺</sup>-free hypocapnic recording solution" as no Ca<sup>²⁺</sup> buffer is included in this solution

      Thanks for pointing this out. We did in fact add 1 mM EGTA to the solutions but omitted this from the recipe, this has now been corrected.

      (10) Page 11: in M&M I found that the NaHCO3- is lowered to 10 mM in the zero Ca<sup>²⁺</sup>condition, while the control experimental condition has 26 mM NaHCO3-. The zero Ca condition should be kept at a physiologically normal 26 mM NaHCO3- concentration, so why was this done? Lowering NaHCO3- during hemichannel stimulation may result in smaller responses and introduce non-linearities.

      For the dye loading we used 20 mmHg as the baseline condition and increased PCO<sub>2</sub> from this. Hence for the zero Ca<sup>2+</sup> positive control we modified the 20 mmHg hypocapnic solution by substituting Mg<sup>2+</sup> for Ca<sup>2+</sup> and adding EGTA. We have modified the text in the Methods to clarify this.

      Further remarks on the figures:

      (1) Figure 2A: Add 20 & 70 mmHg to the images, to improve the readability of this illustration.

      Done

      (2) Figure 3: WT responses are shown in panel F, but experimental data (images and curves) are lacking and should be included in a revised version.

      The wild type data is shown in Fig 2A. We have some sympathy for the comment, but we felt that Fig 2 should document CO<sub>2</sub> sensitivity, and then the subsequent Figs should analyse its basis. Hence the separation of Cx43<sup>WT</sup> data from the mutant data. In panel F, we state that we have recalculated the WT data from Fig 2A to allow the comparison.

      (3) Figures 4, 6, 8: Color codes for mmHg CO<sub>2</sub> pressure make reading these figures difficult; perhaps better to add mmHg values directly in relation to the traces.

      We have considered this suggestion but feel that the figures would become very cluttered with the additional labelling.

      (4) I wouldn't use colored lines when not necessary, e.g., Figure 9 100 µM La3+; Figure 10 (add 20->35 mmHg PCO2 switch; add scrGap26 above blue bars); Figure 11C & D.

      We agree and can see that in Figs 9 and 10 this muddles our colour scheme in other figures so have modified these figures. There was not space to put the suggested labels.

      (5) The mechanism of increased HC opening is not clear.

      We agree and have discussed various options and the analogy with what we know about Cx26. Ultimately new cryo-EM data is required.

      (6) Figure 10: 35G/35S are weird abbreviations for 35 mmHg Gap26 and scrambled Gap26.

      Yes, but we used these to fit into the available space.

      (7) Figure 11, legend: '20 mmHg PCO2 for each transfection for 70 mmHg PCO2'. It is not clear what is meant here.

      Thanks for pointing this out, we have reworded this to ensure clarity.

    1. eLife Assessment

      The authors develop an important microfluidic microvascular model called "Vessel-on-Chip", which they use to study Neisseria meningitidis interactions within this in vitro vascular system. Compelling evidence shows that the fabricated channels are lined by endothelial cells, and these can be colonized by N. meningitidis that in turn triggers neutrophil recruitment. This model has advantages over the human skin xenograft mouse model, which requires complex surgical techniques, however, it also carries limitations in that only endothelial cells and supplied specific immune cells in the microfluidics are present, while true vasculature contains a number of other cell types including smooth muscle cells, pericytes, and components of the immune system.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The work by Pinon et al describes the generation of a microvascular model to study Neisseria meningitidis interactions with blood vessels. The model uses a novel and relatively high throughput fabrication method that allows full control over the geometry of the vessels. The model is well characterized from the vascular standpoint and shows improvements when exposed to flow. The authors show that Neisseria binds to the 3D model in a similar geometry that in the animal xenograft model, induces an increase in permeability short after bacterial perfusion, and endothelial cytoskeleton rearrangements including a honeycomb actin structure. Finally, the authors show neutrophil recruitment to bacterial microcolonies and phagocytosis of Neisseria.

      Strengths:

      The article is overall well written, and it is a great advancement in the bioengineering and sepsis infection field. The authors achieved their aim at establishing a good model for Neisseria vascular pathogenesis and the results support the conclusions. I support the publication of the manuscript. I include below some clarifications that I consider would be good for readers.

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. Could this technique be applied in other laboratories? While the revised manuscript includes more technical details as requested, the description remains difficult to follow for readers from a biology background. I recommend revising this section to improve clarity and accessibility for a broader scientific audience.

      The authors suggest that in the animal model, early 3h infection with Neisseria do not show increase in vascular permeability, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. As a bioengineer this seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      One of the great advantages of the system is the possibility of visualizing infection-related events at high resolution. The authors show the formation of actin of a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin in the ERM complex. Does this also occur in the 3D system?

      Significance:

      The manuscript is comprehensive, complete and represents the first bioengineered model of sepsis. One of the major strengths is the carful characterization and benchmarking against the animal xenograft model. Beyond the technical achievement, the manuscript is also highly quantitative and includes advanced image analysis that could benefit many scientists. The authors show a quick photoablation method that would be useful for the bioengineering community and improved the state-of-the-art providing a new experimental model for sepsis.

      My expertise is on infection bioengineered models.

      Comments on revised version:

      The authors have addressed all my concerns.

    3. Reviewer #2 (Public review):

      Pinon and colleagues have developed a Vessel-on-Chip model showcasing geometrical and physical properties similar to the murine vessels used in the study of systemic infections. The authors succeed on their aim of developing an complex, humanized, in vitro model that can faithfully recapitulate the hallmarks of systemic infections.

      The vessel was created via highly controllable laser photoablation in a collagen matrix, subsequent seeding of human endothelial cells, and flow perfusion to induce mechanical cues. This model could be infected with Neisseria meningitidis as a model of systemic infection. In this model, microcolony formation and dynamics, and effects on the host were very similar to those described for the human skin xenograft mouse model (the current gold standard for systemic studies) and were consistent with observations made in patients. The model could also recapitulate the neutrophil response upon N. meningitidis systemic infection.

      The claims and the conclusions are supported by the data, the methods are properly presented, and the data is analyzed adequately. The most important strength of this manuscript is the technology developed to build this model, which is impressive and very innovative. The Vessel-on-Chip can be tuned to acquire complex shapes and, according to the authors, the process has been optimized to produce models very quickly. This is a great advancement compared with the technologies used to produce other equivalent models. This model proves to be equivalent to the most advanced model used to date (skin xenograft mouse model). The human skin xenograft mouse model requires complex surgical techniques and has the practical and ethical limitations associated with the use of animals. However, the Vessel-on-chip model is free of ethical concerns, can be produced quickly, and allows to precisely tune the vessel's geometry and to perform higher resolution microscopy. Both models were comparable in terms of the hallmarks defining the disease, suggesting that the presented model can be an effective replacement of the animal use in this area. In addition, the Vessel-on-Chip allows to perform microscopy with higher resolution and ease, which can in turn allow more complex and precise image-based analysis. The authors leverage the image-based analysis to obtain further insights into the infection, highlighting the capabilities of the model in this aspect.

      A limitation of this model is that it lacks the multicellularity that characterizes other similar models, which could be useful to research disease more extensively. However, the authors discuss the possibilities of adding other cells to the model, for example, fibroblasts. The methodology would allow for integrating many different types of cells into the model, which would increase the scope of scientific questions that can be addressed. In addition, the technology presented in the current paper is also difficult to adapt for standard biology labs. The methodology is complex and requires specialized equipment and personnel, which might hinder its widespread utilization of this model by researchers in the field.

      This manuscript will be of interest for a specialized audience focusing on the development of microphysiological models. The technology presented here can be of great interest to researchers whose main area of interest is the endothelium and the blood vessels, for example, researchers on the study of systemic infections, atherosclerosis, angiogenesis, etc. This manuscript can have great applications for a broad audience focusing on vasculature research. Due to the high degree of expertise required to produce these models, this paper can present an interesting opportunity to begin collaborations with researchers dealing with a wide range of diseases, including atherosclerosis, cancer (metastasis), and systemic infections of all kinds.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript Pinon et al. describe the development of a 3D model of human vasculature within a microchip to study Neisseria meningitidis (Nm)- host interactions and validate it through its comparison to the current gold-standard model consisting of human skin engrafted onto a mouse. There is a pressing need for robust biomimetic models with which to study Nm-host interactions because Nm is a human-specific pathogen for which research has been primarily limited to simple 2D human cell culture assays. Their investigation relies primarily on data derived from microscopy and its quantitative analysis, which support the authors' goal of validating their Vessel-on-Chip (VOC) as a useful tool for studying vascular infections by Nm, and by extension, other pathogens associated with blood vessels.

      Strengths:

      • Introduces a novel human in vitro system that promotes control of experimental variables and permits greater quantitative analysis than previous models<br /> • The VOC model is validated by direct comparison to the state-of-the-art human skin graft on mouse model<br /> • The authors make significant efforts to quantify, model, and statistically analyze their data<br /> • The laser ablation approach permits defining custom vascular architecture<br /> • The VOC model permits the addition and/or alteration of cell types and microbes added to the model<br /> • The VOC model permits the establishment of an endothelium developed by shear stress and active infusion of reagents into the system

      Weaknesses:

      • The VOC model contains one cell type, human umbilical cord vascular endothelial cells (HUVECs), while true vasculature contains a number of other cell types that associate with and affect the endothelium, such as smooth muscle cells, pericytes, and components of the immune system. However, adding such complexity may be a future goal of this VOC model.

      Impact:

      The VOC model presented by Pinon et al. is an exciting advancement in the set of tools available to study human pathogens interacting with the vasculature. This manuscript focuses on validating the model, and as such sets the foundation for impactful research in the future. Of particular value is the photoablation technique that permits the custom design of vascular architecture without the use of artificial scaffolding structures described in previously published works.

      Comments on revised version:

      The authors have nicely addressed my (and other reviewers') comments.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. Could this technique be applied in other laboratories? While the revised manuscript includes more technical details as requested, the description remains difficult to follow for readers from a biology background. I recommend revising this section to improve clarity and accessibility for a broader scientific audience.

      As suggested, we have adapted the paragraph related to the photoablation technique in the Material & Method section, starting line 1147. We believe it is now easier to follow.

      The authors suggest that in the animal model, early 3h infection with Neisseria do not show increase in vascular permeability, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. As a bioengineer this seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection, and these experiments were not carried out. As discussed in the manuscript, bacteria-induced permeability in mice occurs at later time points, 16h post-infection, as shown previously (2). As discussed in the manuscript, this difference between the xenograft model and the chip could reflect the absence of various cell types present in the tissue parenchyma or simply vessel maturation time.

      One of the great advantages of the system is the possibility of visualizing infection-related events at high resolution. The authors show the formation of actin in a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin, in the ERM complex. Does this also occur in the 3D system?

      We imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. These results are in Figure 4E and S4B.

      We also performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Author response image 1), it was not as obvious as other markers under these infected conditions, and we did not include it in the main text. Interpretation of this result is not straightforward, as the substrate of the cells is different, and it would require further studies on the behavior of ERM proteins in these different contexts.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Recommendation to the authors:

      Reviewer #1 (Recommendation to the authors):

      I appreciate that the authors addressed most of my comments, of special relevance are the change of the title and references to infection-on-chip. I think that the current choice of words better acknowledges the incipient but strong bioengineering infection community. I also appreciate the inclusion of a limitation paragraph that better frames the current work and proposes future advancements.

      The addition of more methodological details has improved the manuscript. Although as mentioned earlier the wording needs to be accessible for the biology community. I also appreciated the addition of the quantification of binding under the WSS gradient in the different geometries and shown in Fig 3H. However, the description of the figure and the legend is not clear. What does "vessel" mean on the graph and "normalized histograms ...(blue)" in the figure legend. Could the authors rephrase it?

      In Figure 3F, we investigated whether Neisseria meningitidis exhibits preferential sites of infection. We hypothesized that, if bacteria preferentially adhered to specific regions, the local shear stress at these sites would differ from the overall distribution. To test this, we compared the shear stress at bacterial adhesion sites in the VoC (orange dots and curve) with the shear stress along the entire vascular edges (blue dots and curve). The high Spearman correlation indicates that there is no distinct shear stress value associated with bacterial adhesion. This suggests that bacteria can adhere across all regions, independently of local shear stress. To enhance clarity, the legend of Figure 3 and the related text have been rephrased in the revised manuscript (L289-314).

      Line 415. Should reference to Fig S5B, not Fig 5B. Also, the titles in Supplementary Figure 4 and 5 are duplicated, and the description of the legend inf Fig S5 seems a bit off. A and B seem to be swapped.

      Indeed, the reference to the right figure has been corrected. Also, the title of Figure S4 has been adapted to its contents, and the legend of Figure S5 has been corrected.

      Reviewer #2 (Recommendation to the authors):

      Minor comments to the authors:

      Line 163 "they formed" instead of "formed".

      Line 212 "two days" instead of "two day"

      Line 269 a space between two words is missing.

      These three comments have been addressed in the revised manuscript.

      In addition, I appreciate answering the comments, especially those requiring hypothesizing about including further cells. However, when discussing which other cells could be relevant for the model (lines 631 to 632) it would be beneficial to discuss not only the role of those cells but also how could they be included in the model. I think for the reader, inclusion of further cells could be seen as a challenge or limitation, and addressing these technical points in the discussion could be helpful.

      We thank Reviewer #2 for the insightful suggestion. Indeed, the method of introducing cells into the VoC depends on their type. Fibroblasts and dendritic cells, which are resident tissue cells, should be embedded in the collagen gel before polymerization and UV carving. This requires careful optimization to preserve chip integrity, as these cells exert pulling forces while migrating within the collagen matrix. In contrast, T cells and macrophages should be introduced through the vessel lumen to mimic their circulation in vivo. Pericytes can be co-seeded with endothelial cells, as they have been shown to self-organize within a few hours post-seeding. These important informations are now included in the manuscript (L577-587).

      Reviewer #3 (Recommendation to the authors):

      Suggestions and Recommendations

      Some suggestions related to the VOC itself:

      Figure 1, Fig S1, paragraph starting line 1071: More information would be helpful for the laser photoablation. For instance, is a non-standard UV laser needed? Which form of UV light is used? What is the frequency of laser pulsing? How many pulses/how long is needed to ablate the region of interest?

      The photoablation process requires a focused UV-laser, with high frequency (10 kHz) to lower the carving time while providing the required intensity to degrade collagen gel. To carve a reproducible number of 30 µm-large vessels, we used a 2 µm-large laser beam at an energy of 10 mW and moved the stage (i.e., sample) at a maximum speed of 1 mm/s. This information has been added to the related paragraph starting on line 1147 of the revised manuscript.

      It is difficult to understand the geometry of the VOC. In Figure 1C, is the light coloration representing open space through which medium can flow, and the dark section the collagen? On a single chip, how many vessels are cut through the collagen? It looks as if at least two are cut in Figure 1C in the righthand photo.

      In Figure 1C, the light coloration is the Factin staining. The horizontal upper and lower parts are the 2D lateral channels that also contain endothelial cells, and are connected to inlets and outlets, respectively. In the middle, two vertically carved 3D vessels are shown in the confocal image.

      Technically, we designed the PDMS structures to allow carving of 1 to 3 channels, maximizing the number of vessels that can be imaged while minimizing any loss of permeability at the PDMS/collagen/cells interface. This information has been added in the revised manuscript (L. 1147).

      If multiple vessels are cut in the center channel between the lateral channels, how do you ensure that medium flow is even between all vessels? A single chip with multiple different vessel architectures through the center channel would be expected to have different hydrostatic resistance with different architectures, thereby causing differences in flow rates in each vessel.

      To ensure a consistent flow rate regardless of the number of carved vessels, we opted to control the flow rate directly across the chip with a syringe pump. During experiments, one inlet and one outlet were closed, and a syringe pump was used. Because the carved vessels are arranged in parallel (derivation), the flow rate remains the same in each vessel. If a pressure controller had been used instead, the flow would have been distributed evenly across the different channels. This has been added to the revised manuscript in the paragraph starting on line 1210.

      The figures imply that the laser ablation can be performed at depth within the collagen gel, rather than just etching the surface. If this is the case, it should be stated explicitly. If not, this needs to be clarified.

      One of the main advantages of the photoablation technique is carving the collagen gel in volume, and not only etching the surface. Thanks to the 3D UV degradation, we can form the 3D architecture surrounded by the bulk collagen. This has been added to the revised manuscript, lines 154-155.

      Is the in-vivo-like vessel architecture connected to the lateral channel at an oblique angle, or is the image turned to fit the entire structure? (Figure 1F and 3E). Is that why there is high shear stress at its junction with the lateral channel depicted in Figure 3E?

      All structures require connection to the lateral channels to ensure media circulation and nutrient supply. The in vivo-like design must be rotated to allow the upper and lower branches of the complex structure to pass between the fixed PDMS pillars. To remain consistent with the image and the flow direction, we have kept the same orientation as in the COMSOL simulation. This leads to a locally higher shear stress at the top of the architecture. This has been added in the revised manuscript, in the paragraph starting on line 1474.

      Figure S1F,G: In the legend, shapes are circles, not squares. On the graphs, what do the numbers in parentheses mean?

      Indeed, the terms "squares" have been replaced by "circles" in Figure 1. (1) and (2) refer to the providers of the collagen, FujiFilm and Corning, respectively. We have added this mention in the legend in Figure S1.

      Figure 3B: how do the images on the left and right differ? Each of the 4 images needs to be explained.

      The four images represent the infected VoC from different viewing angles, illustrating the three-dimensional spread of infection throughout the vessel. A more detailed description has been added in the legend of Figure 3.

      Figure S3C is not referenced but should be, likely before sentence starting on line 299.

      Indeed, the reference to Figure S3C has been added line 301 of the revised manuscript.

      Results in Figure 3 with the pilD mutant are very interesting. It is worth commenting in the Discussion about how T4P functionality in addition to the presence of T4P contributes to Nm infection, and how in the future this could be probed with pilT mutants.

      We thank Reviewer #3 for this relevant insight. Following adhesion, a key functionality of Neisseria meningitidis for colony formation and enhanced infection is twitching motility. As suggested, we have added in the Discussion the idea of using a PilT mutant, which can adhere but cannot retract its pili, in the VoC model to investigate the role of motility in colonization in vitro under flow conditions (L611–623).

      Which vessel design was used for the data presented in Figures 4, 5, and 6 and associated supplemental figures?

      Straight channels have been mostly used in figures 4, 5, and 6. Rarely, we used the branched in vivo-like designs to observe potential similar infection patterns to in vivo, and related neutrophil activity. This has been added in the revised manuscript, lines 1435-1439.

      Figure 4B-D: the images presented in Figure 4C are not representative of the averages presented in Figures 4B,D. For instance, the aggregates appear much larger and more elongated in the animal model in Figure 4C, but the animal model and VOC have the colony doubling time (implying same size) in Figure 4B, and same average aggregate elongation in Figure 4D.

      The images in Figure 4C were selected to illustrate the elongation of colonies quantified in Figure 4D. The elongation angles are consistent between both images and align with the channel orientation. Representative images of colony expansion over time, corresponding to Figure 4A and 4B, are provided in Figure S4A.

      Figures 4E-F: dextran does not appear to diffuse in the VOC in response to histamine in these images, yet there is a significant increase in histamine-induced permeability in Figure 4F. Dotted lines should be used to indicate vessel walls for histamine, and/or a more representative image should be selected. A control set of images should also be included for comparison.

      We thank Reviewer #3 for the insightful comment. We confirm that we have carefully selected representative images for the histamine condition and adjusted them to display the same range of gray levels. The apparent increase in permeability with histamine is explained by a slight rise in background fluorescence, combined with the smaller channel size shown in Figure 4E.

      Figure S4 title is a duplicate of Figure S5 and is unrelated to the content of Figure S4. Suggest rewording to mention changes in permeability induced by Nm infection in the VOC and animal model.

      Indeed, the title of Figure S4 did not correspond to its content. We have, thus, changed it in the revised manuscript.

      Line 489 "...our Vessel-on-Chip model has the potential to fully capture the human neutrophil response during vascular infections, in a species-matched microenvironment", is an overstatement. As presented, the VOC model only contains endothelial cells and neutrophils. Many other cell types and structures can affect neutrophil activity. Thus, it is an overstatement to claim that the model can fully capture the human neutrophil response.

      We agree with the Reviewer #3, that neutrophil activity is fully recapitulated with other cell types, such as platelets, pericytes, macrophages, dendritic cells, and fibroblasts, that secrete important molecules such as cytokines, chemokines, TNF-α, and histamine. In our simplified model we were able to reconstitute the complex interaction of neutrophils with endothelial cells and with bacteria. The text was modified accordingly.

      Supplemental Figure 6 - Does CD62E staining overlap with sites of Nm attachment

      E-selectin staining does not systematically colocalize with Neisseria meningitidis colonies although bacterial adhesion is required. Its overall induced expression is heterogeneous across the tissue and shows heterogeneity from cell to cell as seen in vivo.

      Line 475, Figure 6E- Phagocytosis of Nm is described, but it is difficult to see. An arrow should be added to make this clear. Perhaps the reference should have been to Figure 6G? Consider changing the colors in Figure 6G away from red/green to be more color-blind friendly.

      Indeed, the reference to the right figure is Figure 6G, where the phagocytosis event is zoomed in. We have changed it in the text. Adapting the color of this figure 6G would imply to also change all the color codes of the manuscript, as red has been used for actin and green for Neisseria meningitidis.

      Lines 621-632 - This important discussion point should be reworked. Some suggested references to cite and discuss include PMID: 7913984, 15186399, 17991045, 18640287, 19880493.

      We have introduced in the discussion parts the following references as suggested (3–7), and discussed more the importance of introducting of immune cells to study immune cell-bacteria interaction and related immune response (L659-678).

      Minor corrections:

      •  Line 8 - suggest "photoablation-generated" instead of "photoablation-based"

      •  Line 57- remove the word "either", or modify the sentence

      •  Sentence on lines 162-165 needs rewording

      •  Lines 204-205- "loss of vascular permeability" should read "increase in vascular permeability"

      •  Line 293- "Measured" shear stress, should be "computed", since it was not directly measured (according to the Materials & Methods)

      •  Line 304- "consistently" should be "consistent"

      •  Fig. 3 legend, second line: replace "our" with "the VoC"

      •  Line 371, change "our" to "the"

      •  Line 415- Figure 5B doesn’t appear to show 2-D data. Is this in Figure S5B? Some clarification is needed. The quantification of Nm vessel association in both the VOC and the animal model should be shown in Figure 5, for direct comparison.

      •  Supplementary Figure 5C: correlation coefficient with statistical significance should be calculated.

      •  Figure 6 title, rephrase to "The infected VOC model"

      •  Line 450, replace "important" with "statistically significant"

      •  Line 459, suggest rephrasing to "bacterial pilus-mediated adhesion"

      •  Line 533- grammar needs correction

      •  Line 589- should be "sheds"

      •  Line 1106- should be "pellet"

      •  Lines 1223-1224 - is the antibody solution introduced into the inlet of the VOC for staining? Please clarify.

      •  Line 1295-unclear why Figure 2B is being referenced here

      All the suggested minor corrections have been taken into account in the revised manuscript.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Katherine A. Rhodes, Man Cheong Ma, María A. Rendón, and Magdalene So. Neisseria genes required for persistence identified via in vivo screening of a transposon mutant library. PLOS Pathogens, 18(5):1–30, 05 2022. doi: 10.1371/journal.ppat.1010497.

      (4) Heli Uronen-Hansson, Liana Steeghs, Jennifer Allen, Garth L. J. Dixon, Mohamed Osman, Peter Van Der Ley, Simon Y. C. Wong, Robin Callard, and Nigel Klein. Human dendritic cell activation by neisseria meningitidis: phagocytosis depends on expression of lipooligosaccharide (los) by the bacteria and is required for optimal cytokine production. Cellular Microbiology, 6(7):625–637, 2004. doi: https://doi.org/10.1111/j.1462-5822.2004.00387.x.

      (5) M. C. Jacobsen, P. J. Dusart, K. Kotowicz, M. Bajaj-Elliott, S. L. Hart, N. J. Klein, and G. L. Dixon. A critical role for atf2 transcription factor in the regulation of e-selectin expression in response to non-endotoxin components of neisseria meningitidis. Cellular Microbiology, 18(1):66–79, 2016. doi: https://doi.org/10.1111/cmi.12483.

      (6) Andrea Villwock, Corinna Schmitt, Stephanie Schielke, Matthias Frosch, and Oliver Kurzai. Recognition via the class a scavenger receptor modulates cytokine secretion by human dendritic cells after contact with neisseria meningitidis. Microbes and Infection, 10(10):1158–1165, 2008. ISSN 1286-4579. doi: https://doi.org/10.1016/j.micinf.2008.06.009.

      (7) Audrey Varin, Subhankar Mukhopadhyay, Georges Herbein, and Siamon Gordon. Alternative activation of macrophages by il-4 impairs phagocytosis of pathogens but potentiates microbial-induced signalling and cytokine secretion. Blood, 115(2):353–362, Jan 2010. ISSN 0006-4971. doi: 10.1182/blood-2009-08-236711.

    1. eLife Assessment

      This important study characterises the morphogenesis of cortical folding in the ferret and human cerebral cortex using complementary physical and computational modelling. Notably, these approaches are applied to charting, in the ferret model, known abnormalities of cortical folding in humans. The study finds convincing evidence that variation in cortical thickness and expansion account for deviations in morphology, and supports these findings using cutting-edge approaches from both physical gel models and numerical simulations. The study will be of broad interest to the field of developmental neuroscience.

    2. Reviewer #1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain.

    3. Reviewer #2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      Comment on the revised version from the Reviewing Editor:

      The reviewers are happy with the authors replies and the eLife Assessment has been amended accordingly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

      We thank the reviewer for the very positive comments. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Further analysis of normal human brain growth using computational and physical gel models can be found in our companion paper (Yin et al., 2025), now also published to eLife: S. Yin, C. Liu, G. P. T. Choi, Y. Jung, K. Heuer, R. Toro, L. Mahadevan, Morphogenesis and morphometry of brain folding patterns across species. eLife, 14, RP107138, 2025. doi:10.7554/eLife.107138

      In future work, we plan to obtain malformed human cortical surface data, which would allow us to further investigate related alterations more directly. We have added a remark on this in the revised manuscript (please see page 8–9).

      Reviewer 2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      We thank the reviewer for the very positive comments.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences. Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments. I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      We thank the reviewer for the comments. Besides the visual and qualitative comparisons between the models, we would like to point out that we have included the quantification of the shape difference between the real and simulated ferret brain models via spherical parameterization and the curvature-based shape index as detailed in main text Fig. 4 and SI Section 3. We have also utilized spherical harmonics representations for the comparison between the real and simulated ferret brains at different maximum order N. In our revision, we have included more calculations for the comparison between the real and simulated ferret brains at more time points in the SI (please see SI page 6). As for the comparison between the malformation simulation results and human MRI data in the current work, since the human MRI data are two-dimensional while our computational models are threedimensional, we focus on the qualitative comparison between them. In future work, we plan to obtain malformed human cortical surface data, from which we can then perform the parameterization-based and curvature-based shape analysis for a more quantitative assessment.

      I felt that MCDs could have been better contextualized in the introduction.

      We thank the reviewer for the comment. In our revision, we have revised the description of MCDs in the introduction (please see page 2).

      Reviewer #1 (Recommendations for the authors):

      The study is beautifully presented and offers an excellent complement to the work presented by Yin et al. In its current form, the malformation portion of the study appears predominantly reliant on the numerical simulations rather than the gel model. It might be helpful, therefore, to further incorporate the results presented in Figure S5 into the main text, as this seems to be a clear application of the physical gel model to modelling malformations. Any additional use of the gel models in the malformation portion of the study would help to further justify the necessity and complementarity of the dual methodological approaches.

      We thank the reviewer for the suggestion. We have moved Fig. S5 and the associated description to the main text in the revised manuscript (please see the newly added Figure 5 on page 6 and the description on page 5–7). In particular, we have included a new section on the physical gel and computational models for ferret cortical malformations right before the section on the neurology of ferret and human cortical malformations.

      One additional consideration is that the analyses in the current study focus entirely on the ferret cortex. Given the emphasis in the title on the human brain, it may be worthwhile to either consider adding additional modelling of the human cortex or to consider modifying the title to more accurately align with the focus of the methods/results.

      We thank the reviewer for the suggestion. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Therefore, we think that the title of the paper seems reasonable. To further highlight the connection between the ferret brain simulations and human brain growth, we have included an additional comparison between human brain surface reconstructions adapted from a prior study and the ferret simulation results in the SI (please see SI Section S4 and SI Fig. S5 on page 9–10).

      Two additional minor points:

      Table S1 seems sufficiently critical to the motivation for the study and organization of the results section to justify inclusion in the main text. Of course, I would leave any such minor changes to the discretion of the authors.

      We thank the reviewer for the suggestion. We have moved Table S1 and the associated description to the main text in the revised manuscript (please see Table 1 on page 7).

      Page 7, Column 1: “macacques” → “macaques”.

      We thank the reviewer for pointing out the typo. We have fixed it in the revised manuscript (please see page 8).

      Reviewer #2 (Recommendations for the authors):

      The methods lack details on the human MRI data and patients.

      We thank the reviewer for the comment. Note that the human MRI data and patients were from prior works (Smith et al., Neuron 2018; Johnson et al., Nature 2018; Akula et al., Proc. Natl. Acad. Sci. 2023) and were used for the discussion on cortical malformations in Fig. 6. In the revision, we have included a new subsection in the Methods section and provided more details and references of the MRI data and patients (please see page 9–10).

    1. eLife Assessment

      This study thoroughly assesses tactile acuity on women's breasts, for which no dependable data currently exists. The study provides two important contributions, by convincingly showing that tactile acuity on the breast is poor in comparison to other body parts, and that acuity is worst in larger breasts, indicating that the number of tactile sensors is fixed. However, further arguments concerning the role of the nipple in spatial localisation are not well supported by the current evidence, therefore diluting the overall contribution of the study. This study will be of interest to the broader community of touch, as well as those interested in breast reconstruction and sexual function.

    2. Reviewer #1 (Public review):

      The authors investigated tactile spatial perception on the breast using discrimination, categorization, and direct localization tasks. They reach four main conclusions:

      (1) The breast has poor tactile spatial resolution.

      This conclusion is based on comparing just noticeable differences, a marker of tactile spatial resolution, across four body regions, two on the breast. The data compellingly support the conclusion; the study outshines other studies on tactile spatial resolution that tend to use problematic measures of tactile resolution, such as two-point-discrimination thresholds. The result will interest researchers in the field and possibly in other fields due to the intriguing tension between the finding and the sexually arousing function of touching the breast.

      The manuscript incorrectly describes the result as poor spatial acuity. Acuity measures the average absolute error, and acuity is good when response biases are absent. Precision relates to the error variance. It is common to see high precision with low acuity or vice versa. Just noticeable differences assess precision or spatial resolution, while points of subjective equality evaluate acuity or bias. Similar confusions between these terms appear throughout the manuscript.<br /> A paragraph within the next section seems to follow up on this insight by examining the across-participant consistency of the differences in tactile spatial resolution between body parts. To this aim, pairwise rank correlations between body sites are conducted. This analysis raises red flags from a statistical point of view. 1) An ANOVA and its follow-up tests assume no variation in the size of the tested effect but varying base values across participants. Thus, if significant differences between conditions are confirmed by the original statistical analysis, most participants will have better spatial resolution in one condition than the other condition, and the difference between body sites will be similar across participants. 2) Correlations are power-hungry, and non-parametric tests are power-hungry. Thus, the number of participants needed for a reliable rank correlation analysis far exceeds that of the study. In sum, a correlation should emerge between body sites associated with significantly different tactile JNDs; however, these correlations might only be significant for body sites with pronounced differences due to the sample size.

      (2) Larger breasts are associated with lower tactile spatial resolution

      This conclusion is based on a strong correlation between participants' JNDs and the size of their breasts. The depicted correlation convincingly supports the conclusion. The sample size is below that recommended for correlations based on power analyses, but simulations show that spurious correlations of the reported size are extremely unlikely at N=18. Moreover, visual inspection rules out that outliers drive these correlations. Thus, they are convincing. This result is of interest to the field, as it aligns with the hypothesis that nerve fibers are more sparsely distributed across larger body parts.

      (3) The nipple is a unit

      The data do not support this conclusion. The conclusion that the nipple is perceived as a unit is based on poor tactile localization performance for touches on the nipple compared to the areola. The problem is that the localization task is a quadrant identification task with the center being at the nipple. Quadrants for the areola could be significantly larger due to the relative size of the areola and the nipple; the results section seems to suggest this was accounted for when placing the tactile stimuli within the quadrants, but the methods section suggests otherwise. Additionally, the areola has an advantage because of its distance from the nipple, which leads to larger Euclidean distances between the centers of the quadrants than for the nipple. Thus, participants should do better for the areola than for the nipple even if both sites have the same tactile resolution.

      To justify the conclusion that the nipple is a unit, additional data would be required. 1) One could compare psychometric curves with the nipple as the center and psychometric curves with a nearby point on the areola as the center. 2) Performance in the quadrant task could be compared for the nipple and an equally sized portion of the areola and tactile locations that have the same distance to the border between quadrants in skin coordinates. 3) Tactile resolution could be directly measured for both body sites using a tactile orientation task with either a two-dot probe or a haptic grating.

      Categorization accuracy in each area was tested against chance using a Monte Carlo test, which is fine, though the calculation of the test statistic, Z, should be reported in the Methods section, as there are several options. Localization accuracies are then compared between areas using a paired t-test. It is a bit confusing that once a distribution-approximating test is used, and once a test that assumes Gaussian distributions when the data is Bernoulli/Binomial distributed. Sampling-based and t-tests are very robust, so these surprising choices should have hardly any effect on the results.

      A correlation based on N=4 participants is dangerously underpowered. A quick simulation shows that correlation coefficients of randomly sampled numbers are uniformly distributed at such a low sample size. This likely spurious correlation is not analyzed, but quite prominently featured in a figure and discussed in the text, which is worrisome.

      (4) Localization of tactile events on the breast is biased towards the nipple

      The conclusion that tactile percepts are drawn toward the nipple is based on localization biases for tactile stimuli on the breast compared to the back. Unfortunately, the way participants reported the tactile locations introduces a major confound. Participants indicated the perceived locations of the tactile stimulus on 3D models of these body parts. The nipple is a highly distinctive and cognitively represented landmark, far more so than the scapula, making it very likely that responses were biased toward the nipple regardless of the actual percepts. One imperfect but better alternative would have been to ask participants to identify locations on a neutral grey patch and help them relate this patch to their skin by repeatedly tracing its outline on the skin.

      Participants also saw their localization responses for the previously touched locations. This is unlikely to induce bias towards the nipple, but it renders any estimate of the size and variance of the errors unreliable. Participants will always make sure that the marked locations are sufficiently distant from each other.

      The statistical analysis is again a homebrew solution and hard to follow. It remains unclear why standard and straightforward measures of bias, such as regressing reported against actual locations, were not used.

      Null-hypothesis significance testing only lets scientists either reject the null hypothesis or not. The latter does NOT mean the Null hypothesis is true, i.e., it can never be concluded that there is no effect. This rule applies to every NHST test. However, it raises particular concerns with distribution tests. The only conclusion possible is that the data are unlikely from a population with the tested distribution; these tests do not provide insight into the actual distribution of the data, regardless of whether the result is significant or not.