10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This valuable study investigates the neural noise hypothesis of developmental dyslexia using electroencephalography (EEG) and 7T magnetic resonance spectroscopy (MRS). Solid results were reported that indicate no evidence of an imbalance between excitatory and inhibitory (E/I) brain activity in adolescents and young adults with dyslexia compared to controls, thereby challenging the neural noise hypothesis. This research advances our understanding of the neural mechanisms underlying dyslexia and offers broader insights into the neural processes involved in reading development.

    2. Reviewer #1 (Public review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This is study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials using according to the experts' consensus recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on rigor the EEG methods is beyond my expertise as a reviewer.<br /> Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.<br /> The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.<br /> The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, the MRS acquisition was not optimized to quantify GABA, so the findings (or lack thereof) should be interpreted with caution. Specifically, while 7T MRS affords the benefit of quantifying metabolites, such as GABA, without spectral editing, this quantification is best achieved with echo times (TE) of 68 or 80 ms in order to minimize the spectral overlap between glutamate and GABA and reduce contamination from the macromolecular signal (Finkelman et al., 2022, https://doi.org/10.1016/j.neuroimage.2021.118810). The data in the present study were acquired at TE=28 ms, and are therefore likely affected by overlapping Glu and GABA peaks at 2.3 ppm that are much more difficult to resolve at this short TE, which could directly affect the measures that are meant to characterize the Glu/GABA+ ratio/imbalance. In future research, MRS acquisition schemes should be optimized for the acquisition of Glutamate, GABA, and their relative balance.

      As the authors note in the discussion, additional factors such as MRS voxel location, participant age, and participant sex could influence associations between neural noise and reading abilities and should be considered in future studies.

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was not support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplar foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researcher may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility for a role of neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

    3. Reviewer #2 (Public review):

      Summary:

      This study utilized two complimentary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well conceived study with in depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluating the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show convincing evidence of no differences in EI balance between groups, challenging the neural noise hypothesis.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluating the neural noise hypothesis in Dyslexia.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials according to the experts' consensus-recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on the rigor, the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, there are certain aspects that are not clearly described in the Materials & Methods section, such as a description of the statistical analyses used for hypothesis testing.

      Thank you for pointing this out. A description of the statistical models used in the analyses of EEG biomarkers has been added to the Materials and Methods:

      “First, exponent and offset values were averaged across all electrodes and analyzed using a 2x2 repeated measures ANOVA with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task) as a within-subjects factor. Age was included in the analyses as a covariate due to the correlation between variables. Next, exponent and offset values were averaged across electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), and to the left (T7, TP7, TP9) and right superior temporal sulcus (T8, TP8, TP10). The electrodes were selected based on the analyses outlined by Giacometti and colleagues (2014) and Scrivener and Reader (2022). For these analyses, a 2x2x2x2 repeated measures ANOVA with age as a covariate was conducted with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task), hemisphere (left, right), and region (frontal, temporal) as within-subjects factors. Results for the alpha and beta bands were calculated for the same clusters of frontal and temporal electrodes and analyzed with a similar 2x2x2x2 repeated measures ANOVA; however, for these analyses, age was not included as a covariate due to a lack of significant correlations.”

      We also expanded the description of the statistical models used in the analyses of MRS biomarkers:

      “To analyze the metabolite results, separate univariate ANCOVAs were conducted for Glu, GABA+, Glu/GABA+ ratio and Glu/GABA+ imbalance measures with group (control, dyslexic) as a between-subjects factor and voxel gray matter volume (GMV) as a covariate. Additionally, for the Glu analysis, age was included as a covariate due to a correlation between variables. Both frequentist and Bayesian statistics were calculated. Glu/GABA+ imbalance measure was calculated as the square root of the absolute residual value of a linear relationship between Glu and GABA+ (McKeon et al., 2024).”

      With regard to metabolite quantification, it is unclear why the authors chose to analyze and report metabolite values in terms of creatine ratios rather than quantification based on a water reference given that the MRS acquisition appears to support using a water reference.

      We have decided to use the ratio of Glu and GABA to total creatine (tCr), as this is still a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This approach normalizes the signal, reducing the impact of intensity variations across different regions and tissue compositions. Additionally, total creatine concentration is considered relatively stable across different brain regions, which is particularly important in our study, where a functional localizer was used to establish the left STS region individually. Our decision was further influenced by previous studies on dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) which have reported creatine ratios and included GM volume as a covariate in their models, thus providing comparability. It is now indicated in the Results:

      “For comparability with previous studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) we report Glu and GABA as a ratio to total creatine (tCr).”

      and in the Method sections:

      “Glu and GABA+ concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine) following previous MRS studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014).

      We did not estimate absolute concentrations using water signals as a reference, as this would require accounting for water relaxation times, which may vary across our age range. Nevertheless, our dataset has been made publicly available for future researchers to calculate and compare absolute values.

      Del Tufo, S. N., Frost, S. J., Hoeft, F., Cutting, L. E., Molfese, P. J., Mason, G. F., Rothman, D. L., Fulbright, R. K., & Pugh, K. R. (2018). Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration. Frontiers in Psychology, 9, 1507. https://doi.org/10.3389/fpsyg.2018.01507

      Nandi, T., Puonti, O., Clarke, W. T., Nettekoven, C., Barron, H. C., Kolasinski, J., Hanayik, T., Hinson, E. L., Berrington, A., Bachtiar, V., Johnstone, A., Winkler, A. M., Thielscher, A., Johansen-Berg, H., & Stagg, C. J. (2022). tDCS induced GABA change is associated with the simulated electric field in M1, an effect mediated by grey matter volume in the MRS voxel. Brain Stimulation, 15(5), 1153–1162. https://doi.org/10.1016/j.brs.2022.07.049

      Pugh, K. R., Frost, S. J., Rothman, D. L., Hoeft, F., Del Tufo, S. N., Mason, G. F., Molfese, P. J., Mencl, W. E., Grigorenko, E. L., Landi, N., Preston, J. L., Jacobsen, L., Seidenberg, M. S., & Fulbright, R. K. (2014). Glutamate and choline levels predict individual differences in reading ability in emergent readers. Journal of Neuroscience, 34(11), 4082–4089. https://doi.org/10.1523/JNEUROSCI.3907-13.2014

      Smith, G. S., Oeltzschner, G., Gould, N. F., Leoutsakos, J. S., Nassery, N., Joo, J. H., Kraut, M. A., Edden, R. A. E., Barker, P. B., Wijtenburg, S. A., Rowland, L. M., & Workman, C. I. (2021). Neurotransmitters and Neurometabolites in Late-Life Depression: A Preliminary Magnetic Resonance Spectroscopy Study at 7T. Journal of Affective Disorders, 279, 417–425. https://doi.org/10.1016/j.jad.2020.10.011

      GABA is typically quantified using J-editing sequences as lower field strengths (~3T), and there is some evidence that the GABA signal can be reliably measured at 7T without editing, however, the authors should discuss potential limitations, such as reliability of Glu and GABA measurements with short-TE semi-laser at 7T.

      In addition, MRS measurements of GABA are known to be influenced by macromolecules, and GABA is often denoted as GABA+ to indicate that other compounds contribute to the measured signal, especially at a short TE and in the absence of symmetric spectral editing.

      A general discussion of the strengths and limitations of unedited Glu and GABA quantification at 7T is warranted given the interest of this work to researchers who may not be experts in MRS.

      While we agree with the Reviewer that at 3T, it is recommended to use J-edited MRS to measure GABA (Mullins et al., 2014), the better spectral resolution at 7T allows for more reliable results for both metabolites using moderate echo-time, non-edited MRS (Finkelman et al., 2022). In this study, we used a short echo time (TE), which is optimal for Glu but not ideal for GABA, as it interferes with other signals. We are grateful to the Reviewer for suggesting the addition of a short paragraph to the Discussion, describing the practicalities of 3T and 7T MRS and changing the abbreviation to GABA+ to inform readers of possible macromolecule contamination:

      “We chose ultra-high-field MRS to improve data quality (Özütemiz et al., 2023), as the increased sensitivity and spectral resolution at 7T allows for better separation of overlapping metabolites compared to lower field strengths. Additionally, 7T provides a higher signal-to-noise ratio (SNR), improving the reliability of metabolite measurements and enabling the detection of small changes in Glu and GABA concentrations. Despite these theoretical advantages, several practical obstacles should be considered, such as susceptibility artifacts and inhomogeneities at higher field strengths that can impact data quality. Interestingly, actual methodological comparisons (Pradhan et al., 2015; Terpstra et al., 2016) show only a slight practical advantage of 7T single-voxel MRS compared to optimized 3T acquisition. For example, fitting quality yielded reduced estimates of variance in concentration of Glu in 7T (CRLB) and slightly improved reproducibility levels for Glu and GABA (at both fields below 5%). Choosing the appropriate MRS sequence involves a trade-off between the accuracy of Glu and GABA measurements, as different sequences are recommended for each metabolite. J-edited MRS is recommended for measuring GABA, particularly with 3T scanners (Mullins et al., 2014). However, at 7T, more reliable results can be obtained using moderate echo-time, non-edited MRS (Finkelman et al., 2022). We have opted for a short-echo-time sequence, which is optimal for measuring Glu. However, this approach results in macromolecule contamination of the GABA signal (referred to as GABA+).”

      Finkelman, T., Furman-Haran, E., Paz, R., & Tal, A. (2022). Quantifying the excitatory-inhibitory balance: A comparison of SemiLASER and MEGA-SemiLASER for simultaneously measuring GABA and glutamate at 7T. NeuroImage, 247, 118810. https://doi.org/10.1016/j.neuroimage.2021.118810

      Mullins, P. G., McGonigle, D. J., O'Gorman, R. L., Puts, N. A., Vidyasagar, R., Evans, C. J., Cardiff Symposium on MRS of GABA, & Edden, R. A. (2014). Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage, 86, 43–52. https://doi.org/10.1016/j.neuroimage.2012.12.004

      Özütemiz, C., White, M., Elvendahl, W., Eryaman, Y., Marjańska, M., Metzger, G. J., Patriat, R., Kulesa, J., Harel, N., Watanabe, Y., Grant, A., Genovese, G., & Cayci, Z. (2023). Use of a Commercial 7-T MRI Scanner for Clinical Brain Imaging: Indications, Protocols, Challenges, and Solutions-A Single-Center Experience. AJR. American Journal of Roentgenology, 221(6), 788–804. https://doi.org/10.2214/AJR.23.29342

      Pradhan, S., Bonekamp, S., Gillen, J. S., Rowland, L. M., Wijtenburg, S. A., Edden, R. A., & Barker, P. B. (2015). Comparison of single voxel brain MRS AT 3T and 7T using 32-channel head coils. Magnetic Resonance Imaging, 33(8), 1013–1018. https://doi.org/10.1016/j.mri.2015.06.003

      Terpstra, M., Cheong, I., Lyu, T., Deelchand, D. K., Emir, U. E., Bednařík, P., Eberly, L. E., & Öz, G. (2016). Test-retest reproducibility of neurochemical profiles with short-echo, single-voxel MR spectroscopy at 3T and 7T. Magnetic Resonance in Medicine, 76(4), 1083–1091. https://doi.org/10.1002/mrm.26022

      Further, the single MRS voxel location is a limitation of the study as neurochemistry can vary regionally within individuals, and the putative excitatory/inhibitory imbalance in dyslexia may appear in regions outside the left temporal cortex (e.g., network-wide or in frontal regions involved in top-down executive processes). While the functional localization of the MRS voxel is a novelty and a potential advantage, it is unclear whether voxel placement based on left-lateralized reading-related neural activity may bias the experiment to be more sensitive to small, activity-related fluctuations in neurotransmitters in the CON group vs. the DYS group who may have developed an altered, compensatory reading strategy.

      We agree that including only one region of interest for the MRS measurements is a potential limitation of our study, and we have now added this information to the Discussion:

      “Moreover, since the MRS data was collected only from the left STS, it is plausible that other areas might be associated with differences in Glu or GABA concentrations in dyslexia.”

      However, differences in Glu and GABA concentrations in this region were directly predicted by the neural noise hypothesis of dyslexia. We acknowledge that this information was missing in the previous version of the manuscript. It is now included in the Results:

      “Moreover, the neural noise hypothesis of dyslexia identifies perisylvian areas as being affected by increased glutamatergic signaling, and directly predicts associations between Glu and GABA levels in the superior temporal regions and phonological skills (Hancock et al., 2017).”

      as well as in the Discussion:

      “Nevertheless, the neural noise hypothesis predicted increased glutamatergic signaling in perisylvian regions, specifically in the left superior temporal cortex (Hancock et al., 2017).”

      Figure 1 contains a lot of information, and it may be helpful to split it into 2 figures (EEG vs. MRS) so that the plots could be made larger and the reader could more easily digest the information.

      (a) I would also recommend displaying separate metabolite fit plots for each group, since the current presentation in panel F makes it appear that the MRS data is examined by testing differences between groups across the full spectrum (where the lines diverge), which really isn't the case.

      (b) The GABA peak is not visible in the spectrum, and Glutamate and GABA both have multiple peaks that should be shown on the spectrum. This may be best achieved by displaying the individual metabolite sub-spectra below the full spectrum

      Thank you for these suggestions. We have split the information into two Figures following the Reviewer’s recommendations.

      It is not clear why the 3T structural images were used for segmentation and calculation of tissue fraction if 7T structural images were also acquired (which would presumably have higher resolution).

      Generally, T1-weighted images from the 7T scanner exhibit more artifacts than those from the 3T scanner due to higher magnetic field inhomogeneity. These artifacts are especially pronounced in regions near air-tissue interfaces, such as the temporal lobes. Therefore, we chose the 3T structural images for segmentation and tissue fraction calculations and clarified this in the Method section:

      “Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12, as the latter exhibited excessive artifacts and intensity bias in the temporal regions”.

      The basis set includes a large number of metabolites (27), including many low-concentration metabolites/compounds (e.g., bHG, bHB, Citrate, Threonine, ethanol) that are typically only included in studies targeting specific metabolites in disease/pathology. Please justify the inclusion of this maximal set of metabolites in the basis set, given that the inclusion of overlapping low-concentration metabolites may influence metabolite measurements of interest (https://doi.org/10.1002/mrm.10246).

      There is still no consensus in the MR community on which metabolites should be included in the model of human cerebral 1H-MR spectra. Typically, only major contributors such as NAA, Cr, Cho, Lac, mI, and possibly Glx are evaluated. Some studies also include additional metabolites like Ace, Ala, Asp, GABA, Glc, Gly, sI, NAAG, and Tau. In this study, as in a few others, further metabolites such as PCh, GPC, PCr, GSH, PE, and Thr were introduced and this approach seems suitable for high-field spectra (Hofmann et al., 2002).

      Hofmann, L., Slotboom, J., Jung, B., Maloca, P., Boesch, C., & Kreis, R. (2002). Quantitative 1H-magnetic resonance spectroscopy of human brain: Influence of composition and parameterization of the basis set in linear combination model-fitting. Magnetic Resonance in Medicine, 48(3), 440–453. https://doi.org/10.1002/mrm.10246

      Please provide a figure indicating the localization of the MRS voxel for a sample subject.

      A figure indicating the localization of the MRS voxel for a sample subject was added to the MRS checklist.

      It would be helpful to include Table S1 in the main article.

      Table S1 from the Supplementary Material has now been added to the main manuscript as Table 1 in the Results section.

      Please report descriptive statistics for EEG and MRS measures in Table S1.

      We have added a new Table S1 in the Supplementary Material, providing descriptive statistics for EEG and MRS E/I balance measures, presented separately for the dyslexic and control groups.

      I recommend avoiding using the terms "direct" and "indirect" to contrast MRS and EEG measures of E/I balance. Both of these measures are imperfect and it is misleading to say that MRS is a "direct" measure of neurotransmitters. There is also ambiguity in what is meant by "direct": in contrast to EEG, MRS does not measure neural activity and does not provide high-resolution temporal information, so in a sense, it is less direct.

      Thank you for this suggestion. We have replaced the terms 'direct' and 'indirect' biomarkers with 'MRS' and 'EEG' biomarkers throughout the text.

      There are many cases throughout the results in which Bayes and frequentist stats seem to contradict each other in terms of significance and what should be included in the models, especially with regard to the interaction effects (the Bayes factors appear to favor non-significant interactions). I think this is worth considering and describing to offer more clarity for the readers.

      We agree that a discussion of the divergent results between Bayesian and frequentist models was missing in the previous version of the manuscript. To provide greater clarity for the readers, we have conducted follow-up Bayesian t-tests in every case where the results indicated the inclusion of non-significant interactions with the effect of group in the model. These additional analyses have been performed for the exponent, offset, as well as for beta bandwidth in the Supplementary Material. We have also added a paragraph addressing these discrepancies in the Discussion:

      “Remarkably, in some models, results from Bayesian and frequentist statistics yielded divergent conclusions regarding the inclusion of non-significant effects. This was observed in more complex ANOVA models, whereas no such discrepancies appeared in t-tests or correlations. Given reports of high variability in Bayesian ANOVA estimates across repeated runs of the same analysis (Pfister, 2021), these results should be interpreted with caution. Therefore, following the recommendation to simplify complex models into Bayesian t-tests for more reliable estimates (Pfister, 2021), we conducted follow-up Bayesian t-tests in every case that favored the inclusion of non-significant interactions with the group factor. These analyses provided further evidence for the lack of differences between the dyslexic and control groups. Another source of discrepancy between the two methods may stem from the inclusion of interactions between covariates and within-subject effects in frequentist ANOVA, which were not included in Bayesian ANOVA to adhere to the recommendation for simpler Bayesian models (Pfister, 2021).”

      Pfister, R. (2021). Variability of Bayes factor estimates in Bayesian analysis of variance. The Quantitative Methods for Psychology, 17(1), 40-45. doi:10.20982/tqmp.17.1.p040

      It would be helpful to indicate whether participants in the DYS group had a history of reading intervention/remediation. In addition to showing that the DYS group performed lower than the CON group on reading assessments as a whole and given their age, was the performance on the reading assessments at an individual level considered for inclusion in the study? (i.e., were participants' persistent poor reading abilities confirmed with the research assessments?)

      We were unable to assess individual reading skills due to the lack of standardized diagnostic norms for adult dyslexia in Poland. Therefore, participants in the dyslexic group were recruited based on a previous clinical diagnosis of dyslexia, and reading and reading-related tasks were used for group-level comparisons only. This information has been added to the Methods section:

      “Since there are no standardized diagnostic norms for dyslexia in adults in Poland, individuals were assigned to the dyslexic group based on a past diagnosis of dyslexia.”

      Unfortunately, we did not collect information about participants' history of reading intervention or remediation. In this context, we acknowledge that including a sample of adult participants is a potential limitation of our study, however, this was already mentioned in the Discussion.

      Regarding the fMRI task, please indicate whether the participants whose threshold and/or contrast was changed for localization were from the DYS or CON group.

      This information is now added to the Method section:

      “For 6 participants (DYS n = 2, CON n = 4), the threshold was lowered to p < .05 uncorrected, while for another 6 participants (DYS n = 3, CON n = 3) the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.”

      Reviewer #2 (Public Review):

      Summary:

      This study utilized two complementary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well-conceived study with in-depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading-specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      We agree with the Reviewer that including different tasks for the EEG biomarkers assessment would be valuable. However, this limitation was already addressed in the Discussion:

      “Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis.”

      Further, this work does not consider prior studies reporting neural inconsistency; a potential consequence of increased neural noise, which has been reported in several studies and linked with candidate-dyslexia gene variants (e.g., Centanni et al., 2018, 2022; Hornickel & Kraus, 2013; Neef et al., 2017). While E/I imbalance may not be a cause of increased neural noise, other potential mechanisms remain and should be discussed.

      Thank you for referring us to other works reporting neural variability in dyslexia. We agree that a broader context regarding sources of reduced neural synchronization, beyond E/I imbalance, was missing in the previous version of the manuscript. We have now included these references in the Discussion:

      “Furthermore, although our results do not support the idea of E/I balance alterations as a source of neural noise in dyslexia, they do not preclude other mechanisms leading to less synchronous neural firing posited by the hypothesis. In this context, there is evidence showing increased trial-to-trial inconsistency of neural responses in individuals with dyslexia (Centanni et al., 2022) or poor readers (Hornickel and Kraus, 2013) and its associations with specific dyslexia risk genes (Centanni et al., 2018; Neef et al., 2017). At the same time, the observed trial-to-trial inconsistency was either present only in a subset of participants (Centanni et al., 2018), limited to some experimental conditions (Centanni et al., 2022), or specific brain regions – e.g., brainstem in Hornickel and Kraus (2013), left auditory cortex in Centanni et al. (2018), or left supramarginal gyrus in Centanni et al. (2022).”

      A better description of the exponent and offset components is needed at the beginning of the results, given that the methods are presented in detail at the end. I also do not see a clear description of these components in the methods.

      A description of the aperiodic components is now included in the Results:

      “In the initial step of the analysis, we analyzed the aperiodic (exponent and offset) components of the EEG spectrum. The exponent reflects the steepness of the EEG power spectrum, with a higher exponent indicating a steeper signal; while the offset represents a uniform shift in power across frequencies, with a higher offset indicating greater power across the entire EEG spectrum (Donoghue et al., 2020).”

      as well as in the Materials and Methods:

      “Two broadband aperiodic parameters were extracted: the exponent, which quantifies the steepness of the EEG power spectrum, and the offset, which indicates signal’s power across the entire frequency spectrum.”

      Reviewer #3 (Public Review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluate the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show solid 'no evidence' of EI balance differences between groups, challenging the neural noise hypothesis. The work will be of broad interest to neuroscientists, and educational and clinical psychologists.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluate the neural noise hypothesis in Dyslexia.

      Weaknesses:

      The authors may need to provide more data to assess the quality of the MRS data.

      We have addressed the following specific recommendations of the Reviewer providing more data about the quality of the MRS data.

      The authors may need to explain how the number of subjects is determined in the MRS section.

      We have clarified the MRS sample description in the Results section:

      “Due to financial and logistical constraints, 59 out of the 120 recruited subjects, selected progressively as the study unfolded, were examined with MRS. Subjects were matched by age and sex between the dyslexic and control groups. Due to technical issues and to prevent delays and discomfort for the participants, we collected 54 complete sessions. Additionally, four datasets were excluded based on our quality control criteria, and three GABA+ estimates exceeded the selected CRLB threshold. Ultimately, we report 50 estimates for Glu (21 participants with dyslexia) and 47 for GABA+ and Glu/GABA+ ratios (20 participants with dyslexia).”

      Is there a reason why theta and gamma peaks were not observed in the majority of participants? What are the possible reasons that likely caused the discrepancy between this study and previously reported relevant studies?

      We have now added a discussion about the absence of oscillatory peaks in the theta and gamma bands to the Discussion section:

      “We could not perform analyses for the gamma oscillations since in the majority of participants the gamma peak was not detected above the aperiodic component. Due to the 1/f properties of the EEG spectrum, both aperiodic and periodic components should be disentangled to analyze ‘true’ gamma oscillations; however, this approach is not typically recognized in electrophysiology research (Hudson and Jones, 2022). Indeed, previous studies that analyzed gamma activity in dyslexia (Babiloni et al., 2012; Lasnick et al., 2023; Rufener and Zaehle, 2021) did not separate the background aperiodic activity. For the same reason, we could not analyze results for the theta band, which often does not meet the criteria for an oscillatory component manifested as a peak in the power spectrum (Klimesch, 1999). Moreover, results from a study investigating developmental changes in both periodic and aperiodic components suggest that theta oscillations in older participants are mostly observed in frontal midline electrodes (Cellier et al., 2021), which were not analyzed in the current study.”

      Hudson, M. R., & Jones, N. C. (2022). Deciphering the code: Identifying true gamma neural oscillations. Experimental Neurology357, 114205. https://doi.org/10.1016/j.expneurol.2022.114205

      Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews29(2-3), 169-195. https://doi.org/10.1016/S0165-0173(98)00056-3

      Based on Figure 1F, the quality of the MRS data may be contaminated by the lipid signal, especially for the DYS group. To better evaluate the MRS data, especially the GABA measurements, the authors need to show:

      (a) the placement of the MRS voxel on the anatomical images;

      Averaged MRS voxel placement was already presented in Figure 1 (now Figure 2) in the manuscript. Now, we have also added exemplary single-subject images to the MRS checklist in the Supplement.

      (b) Glu and GABA model functions

      We have now provided more meaningful Glu and GABA indications in Figure 2.

      (c) CRLB for GABA

      We have added respective estimates to the Supplement:

      %CRLB of Glu: mean 2.96, SD = 0.79

      %CRLB of GABA: mean 10.59, SD = 2.76

      %CRLB of NAA: 1.76 SD = 0.46

      Further, the authors added voxel's gray matter volume as a covariate when performing separate ANCOVAs. The authors may need to use alpha correction or 1-fCSF correction to corroborate these results.

      We chose to use the ratio of Glu and GABA to total creatine (tCr), as this remains a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This decision was also influenced by previous dyslexia studies (Del Tufo et al., 2018; Pugh et al., 2014) and is now clarified in the Results and Methods sections.

      Regarding alpha correction, a recent paper (García-Pérez et al., 2023) recommends: 'In general, avoid corrections for multiple testing if statistical claims are to be made for each individual test, in the absence of an omnibus null hypothesis.' Since we report null findings, further alpha correction would not significantly impact the results.

      García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology8, 100120. https://doi.org/10.1016/j.metip.2023.100120

    1. eLife Assessment

      This study reported that cold exposure induced mRNA expression of genes related to lipid metabolism in the paraventricular nucleus of the hypothalamus (PVH). The authors provide useful data highlighting the potential role of lipid metabolism in the brain during cold exposure. In the revised manuscript, the authors made adequate editions, such as new immunostaining and immunoblotting of AGTL and HSL in the PVH, and pharmacological inhibition of lipid peroxidation and lipolysis. The authors also increased the sample size of some experiments and revised the text to limit their data interpretation. However, the reviewers consider that some key issues, such as cell type specificity and the functional role of lipids on PVH, remain incomplete. Thus, the main conclusion is only partially supported by the data presented.

    2. Reviewer #1 (Public review):

      Summary:

      This study focuses on metabolic changes in the paraventricular hypothalamic (PVH) region of the brain during acute periods of cold exposure. The authors point out that in comparison to the extensive literature on the effects of cold exposure in peripheral tissues, we know relatively little about its effects on the brain. They specifically focus on the hypothalamus, and identify the PVH as having changes in Atgl and Hsl gene expression changes during cold exposure. They then go on to show accumulation of lipid droplets, increased Fos expression, and increased lipid peroxidation during cold exposure. Further, they show that neuronal activation is required for the formation of lipid droplets and lipid peroxidation.

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      Comments on revised version:

      The authors have attempted to address concerns brought to their attention in the initial review. They have performed one or two additional experiments to address concerns (e.g. adding fiber photometry of PVH neurons and trying to manipulate lipid peroxidation) though many of the concerns from the original review stand. The authors have also revised the text to limit the extent of their claims and to improve clarity, which is appreciated.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We were pleased that many of the critical comments of the reviewers have allowed us to improve our manuscript. In addition to revise the originally submitted figures, we performed new experiments (e.g. new Fig.2, Fig.3, Fig.4, and Fig.6) and revised the manuscript substantially following the reviewers’ comments and suggestions to our initial submission. A point-by-point response to the reviewers’ critiques are summarized below, and new supportive data are provided in this revised manuscript. Per the Reviewers’ comments and revisions, we revised the title to be “Cold induces brain region-selective cell activity-dependent lipid metabolism”. 

      Reviewer #1:

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of a BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      We thank the Reviewer so much for the positive comments on this interesting study on metabolism in the brain.

      Weaknesses:

      One weakness was many of the experiments were done in a manner that could not distinguish between the contributions of neurons and glial cells, limiting the extent of conclusions that could be made. While this is not easily doable for all experiments, it can be done for some. For example, the Fos experiments in Figure 3 would be more conclusive if done with the labeling of neuronal nuclei with NeuN, as glial cells can also express Fos. To similarly show more conclusively that neurons are being activated during cold exposure, the calcium imaging experiments in Figure S3 can be done with cold exposure. 

      We agreed with the Reviewers’ comments. We revised the original Figure 3 (new Figure 6) and Figure S3 (new Figure S4). Our data show that cold increased Fos-positive cells in the PVH (Figure 6) and increased neuronal Ca2+ signals (new Figure S4). As it is difficult to exclude the involvements of astrocytes in the cold-induced lipid metabolism, and to address this reviewer’s questions, we revised the title and the text with replacing “neuronal” with “‘cell” activity, and we concluded that cold induced lipid metabolism depending on “cell activity” instead of “neuronal activity”. Studying cell type-specific contributions to the cold-induced effects on lipid metabolism will require many efforts beyond the scope of this study, to which we assumed that both neurons and glial cells contribute.

      Additionally, many experiments are only done with the minimal three animals required for statistics and could be more robust with additional animals included.  

      We thank this reviewer for the comments. We added the sample sizes accordingly in this revised manuscript.

      Another weakness is that the authors do not address whether manipulating lipid droplet accumulation or lipid peroxidation has any effect on PVH function (e.g. does it change neuronal activity in the region?).

      We thank this reviewer for bringing up this interesting point. The focus of this study was to examine how cold modulates lipid metabolism in the brain, while it is another interesting project studying how brain lipid metabolism (e.g. manipulating LD accumulation or lipid peroxidation) modulates neuronal activity, which however will require many efforts beyond the scope of this study. Manipulating LD or peroxidation would affect multiple cellular signaling pathways and physiological experimental conditions need to be developed. However, to address this reviewer’s questions, we performed preliminary studies with treating brain slices with the lipid peroxidation inhibitor a-TP and recorded PVH neurons, but did not observe differences in firing rates in a-TPtreated brain slices and controls (Data not shown).  

      Reviewer #2:

      Strengths:

      A set of relatively novel and interesting observations. Creative use of several in vivo sensors and techniques.

      We thank the Reviewer so much for the positive comments on our studies in both concept and techniques. 

      Weaknesses:  

      (1) The physiological relevance of lipolysis and thermogenesis genes in the PVH. The authors need to provide quantitative and substantial characterizations of lipid metabolism in the brain beyond a panel of qPCRs, especially considering these genes are likely expressed at very low levels. mRNA and protein level quantification of genes in Fig 1, in direct comparison to BAT/iWAT, should be provided. Besides bulk mRNA/protein, IHC/ISH-based characterization should be added to confirm to cellular expression of these genes.

      We agreed with the Reviewer’s comments and thank this reviewer for the constructive suggestions. To address this reviewer’s comments and suggestions, we performed additional experiments to verify cold-induced expressions of lipid lipolytic genes and proteins. For example, we stained ATGL and HSL in both neurons and astrocytes in the PVH. Matching with the increased gene expressions, cold increased protein expressions of ATGL (new Figure 2) and HSL (new Figure 3) in both neurons and astrocytes. We also performed western blots of p-HSL and HSL and observed that cold increased the expression level of p-HSL (new Figure 4). These new results support our conclusions and further demonstrate that cold increases lipid metabolism in the PVH.   

      (2) The fiberphotometry work they cited (Chen 2022, Andersen 2023, Sun 2018) used well-established, genetically encoded neuropeptide sensors (e.g., GRABs). The authors need to first quantitatively demonstrate that adapting BD-C11 and EnzCheck for in vivo brain FP could effectively and accurately report peroxidation and lipolysis. For example, the sensitivity, dynamic range, and off-time should all be calibrated with mass spectrometry measurements before any conclusions can be made based on plots in Figures 4, 5, and 6. This is particularly important because the main hypothesis heavily relies on this unvalidated technique.

      We thank this reviewer’s comments. Fiber photometry has been well demonstrated to detect fluorescent-labelled biomolecules in my laboratory and other labs, as indicated in the above stated publications. In this study, we combined photometry with the well commercially developed and validated lipid metabolic fluorescent-labelled biomarkers to monitor lipid metabolic dynamics in vivo. We indeed verified this approach in both brain (this study) and peripheral adipose tissues (another project). Particularly, our data in this study show that lipid peroxidation inhibitor a-TP blocked the cold-induced lipid peroxidation signals (Fig. 7A-C) and the pan-lipase inhibitor DEUP blocked the cold-induced lipolytic signals (Fig. 8A-C). These results demonstrate that the signals detected by photometry indeed reflect lipid peroxidation and lipolysis respectively in the brain. Meanwhile, we agreed with the reviewer’s suggestions on mass spectrometry measurements, while it is not feasible for us to perform the spectrometry in the brain in vivo at this moment.       

      (3) Generally, the histology data need significant improvement. It was not convincing, for example, in Figure 3, how the Fos+ neurons can be quantified based on the poor IF images where most red signals were not in the neurons. 

      We thank this reviewer for this comment. We performed additional experiments to add sample size and presented high quality images. 

      (4) The hypothesis regarding the direct role of brain temperature in cold-induced lipid metabolism is puzzling. From the introduction and discussion, the authors seem to suggest that there are direct brain temperature changes in responses to cold, which could be quite striking. However, this was not supported by any data or experiments. The authors should consolidate their ideas and update a coherent hypothesis based on the actual data presented in the manuscript. 

      We thank this reviewer for bringing up this comment and constructive suggestions. To make this study more concise on the cold-induced lipid metabolism, we removed the statements related to the brain temperature.

      Reviewer #1 (Recommendations For The Authors):

      An additional minor weakness is that the authors are redundant in their discussion, sometimes repeating sections from the introduction (e.g. this line in the discussion "Evidence shows that the brain's energy expenditure efficiency largely depends on the temperature (Yu et al., 2012), and temperature gradients between different brain regions exist (Anderson and Moser, 1995; Delgado and Hanai, 1966; Hayward and Baker, 1968; McElligott and Melzak, 1967; Moser and Mathiesen, 1996; Thornton, 2003)"). 

      We thank the Reviewer for these comments. We revised the text following the suggestions accordingly and removed the statements and references related to brain temperatures.

    1. eLife Assessment

      This is a useful analysis of STORM data that characterizes the clustering of active zones in retinogeniculate terminals across ages and in the absence of retinal waves. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling. However, the evidence is incomplete, weakening the claims the authors make regarding how activity influences the clustering of these synapses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses the question of whether spontaneous activity contributes to the clustering of retinogeniculate synapses before eye opening. The authors re-analyze a previously published dataset to answer the question. The authors conclude that synaptic clustering is eye-specific and activity dependent during the first postnatal week. While there is useful information in this manuscript, I don't see how the data meaningfully supports the claims made about clustering.<br /> In adult retinogeniculate connections, functionally specificity is supported by select pairings of retinal ganglion cells and thalamocortical cells forming dozens of synaptic connections in subcellular microcircuits called glomeruli. In this manuscript, the authors measure whether the frequency of nearby synapses is higher in the observed data than in a model where synapses are randomly distributed throughout the volume. Any real anatomical data will deviate from such a model. The interesting biological question is not whether a developmental state deviates from random. The interesting question is how much of the adult clustering occurs before eye opening. In trying to decode the analysis in this manuscript, I can't tell if the answer is 99% or 0.001%.

      Strengths:

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model system.

      Weaknesses:

      I don't think the analysis of clustering within this dataset improves our understanding of how the system works. It is possible that the result is clear to the authors based on looking at the images. As a reader trying to interpret the analysis, I ran into the following problems:

      • It is not possible to estimate biologically meaningful effect sizes from the data provided. Spontaneous activity in the post natal week could be responsible for 99% or 0.001% of RGC synapse clustering.<br /> • There is no clear biological interpretation of the core measure of the publication, the normalized clustering index. The normalized clustering index starts with counting the fraction of single active zone synapses within various distances to the edge of synapses. This frequency is compared to a randomization model in which the positions of synapses are randomized throughout a volume. The authors found that the biggest deviation between the observed and randomized proximity frequency using a distance threshold of 1.5 um. They consider the deviation from the random model to be a sign of clustering. However, two RGC synapses 1.5 um apart have a good chance of coming from the same RGC axon. At this scale, real observations will, therefore, always look more clustered than a model where synapses are randomly placed in a volume. If you randomly place synapses on an axon, they will be much closer together than if you randomly place synapses within a volume. The authors normalize their clustering measure by dividing by the frequency of clustering in the normalized model. That makes the measure of clustering an ambiguous mix of synapse clustering, axon morphology, and synaptic density.<br /> • Other measures are also very derived. For instance, one argument is based on determining that the cumulative distribution of the distance of dominant-eye multi-active zone synapses with nearby single-active zone synapses from dominant-eye multi-active zone synapses is statistically different from the cumulative distribution of the distance of dominant-eye multi-active zones without nearby single-active zone synapses from dominant-eye multi-active zones. Multiple permutations of this measure are compared.<br /> • The sample size is too small for the kinds of comparisons being made. The authors point out that many STORM studies use an n of 1 while the authors have n = 3 for each of their six experimental groups. However, the critical bit is what kinds of questions you are trying to answer with a given sample size. This study depends on determining whether the differences between groups are due to age, genotype, or individual variation. This study also makes multiple comparisons of many different noisy parameters that test the same or similar hypothesis. In this context, it is unlikely that n = 3 sufficiently controls for individual variation.<br /> • There are major biological differences between groups that are difficult to control for. Between P2, P4, and P8, there are changes in cell morphology and synaptic density. There are also large differences in synapse density between wild type and KO mice. It is difficult to be confident that these differences are not responsible for the relatively subtle changes in clustering indices.<br /> • Many claims are based on complicated comparisons between groups rather than the predominating effects within the data. It is noted that: "In KO mice, dominant eye projections showed increased clustering around mAZ synapses compared to sAC synapses suggesting partial maintenance of synaptic clustering despite retinal wave defects". In contrast, I did not notice any discussion of the fact that the most striking trend in those measures is that the clustering index decreases from P2 to P8.<br /> • Statistics are improperly applied. In my first review I tried to push the authors to calculate confidence intervals for two reasons. First, I believed the reader should be able to answer questions such as whether 99% or 0.01% of RGC synaptic clustering occurred in the first postnatal week. Second, I wanted the authors to deal with the fact that n=3 is underpowered for many of the questions they were asking. While many confidence intervals can now be found leading up to a claim, it is difficult to find claims that are directly supported by the correct confidence interval. Many claims are still incorrectly based on which combinations of comparisons produced statistically significant differences and which combinations did not.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a valuable data set showing changes in the spatial organization of synaptic proteins at the retinogeniculate connection during a developmental period of active axonal and synaptic remodeling. The data collected by STORM microscopy is state-of-the-art in terms of the high-resolution view of the presynaptic components of a plastic synapse. The revision has addressed many, but not all, of the initial concerns about the authors interpretation of their data. However, with the revisions, the manuscript has become very dense and difficult to follow.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      From these data the authors conclude that eye-specific increase in mAZ synapse density occur over retinogeniculate refinement, that sAZ synapses cluster close to mAZ synapses over age, and that this process depends on spontaneous activity and proximity to eye-specific mAZ synapses. While the interpretation of this data set is much more grounded in this revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence.<br /> This includes:

      (1) The conclusion that multi-active zone synapses are loci for synaptic clustering. This statement, or similar ones (e.g., line 407) suggest that mAZ synapses actively or through some indirect way influence the clustering of sAZ synapses. There is no evidence for this. Clustering of retinal synapses are in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps sAZ synapses mature into mAZ synapses. This scenario could also explain a large part of this data set.

      (2) The conclusion that, "clustering depends on spontaneous retinal activity" could be misleading to the reader given that the authors acknowledge that their data is most consistent with a failure of synaptogenesis in the mutant mice (in the rebuttal). Additionally clustering does occur in CTB+ projections around mAZ synapses.

      (3). Line 403: "Since mAZ synapses are expected to have a higher release probability, they likely play an important role in driving plasticity mechanisms reliant on neurotransmission.":What evidence do the authors have that mAZ are expected to have higher release probability?

    4. Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contra-projecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters, providing key insight into how neural activity drives synapse maturation.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is what makes this data set unique over previous structural work. The addition of example images from EM data set provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of synaptic clusters are important and represent a significant advance, the authors conclusions regarding the biological processes driving these clusters are not testable by such a small sample. This limitation is expected given the massive effort that goes into generating this data set. Of course the authors are free to speculate, but many of the conclusions of the paper are not statistically supported.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      This publication applies 3D super-resolution STORM imaging to understanding the role of developmental neural activity in the clustering of retinal inputs to the mouse dorsal lateral geniculate nucleus (dLGN). The authors argue that retinal ganglion cell (RGC) synaptic boutons start forming clusters early in postnatal development (P2). They then argue that these clusters contribute to eye-specific segregation of retinal inputs by activity-dependent stabilization of nearby boutons from the same eye. The data provided is N=3 animals for each condition of P2, P4, and P8 animals in wild-type mice and in mice where early patterns of structured retinal activity are blocked.

      Strengths:

      The 3D storm imaging of pre and postsynaptic elements provides convincing high-resolution localization of synapses.

      The experimental design of comparing ipsilateral and contralateral RGC axon boutons in a region of the dLGN that is known to become contralateral is elegant. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling.

      Weaknesses:

      Based on previous literature, it is known that synapse density, synapse clustering, and synaptic specificity increase during postnatal development. Previous work has also shown that both the changes in synaptic clustering and synaptic specificity are affected by retinal activity. The data and analysis provided by the authors add little unambiguous evidence that advances this understanding.

      We agree with the reviewer that previous literature shows that synapse density, synapse clustering, and synaptic specificity increase during postnatal development and that these processes are affected by retinal activity. The majority of studies on synaptic refinement have been performed after eye-opening, when eye-specific segregation is already complete. In contrast, most studies of eye-specific segregation focus on axonal refinement phenotypes. To our knowledge, only a small number of experiments have examined retinogeniculate synaptic properties at the nanoscale during eye-specific segregation (1-4). Our broad goal is to understand the mechanisms of synaptogenesis and competition at the earliest stages of eye-specific refinement, when spontaneous retinal activity is a major driver of activity-dependent remodeling. We hope that readers will appreciate that there is still much to discover in this fascinating model system of synaptic competition.

      General problem 1: Most of the statistical analysis is limited to ANOVA comparison of axons from the contralateral and ipsilateral retina in the contralateral dLGN. The hypothesis that ipsilateral and contralateral axons would be statistically identical in the contralateral dLGN is not a plausible hypothesis so rejecting the hypothesis with P < X does not advance the authors' arguments beyond what was already known.

      General problem 2: Most of the interpretation of data is qualitative. While error bars are provided, these error bars are not used to draw conclusions. Given the small sample size (N=3), there is a large degree of uncertainty regarding the magnitude of changes (synapse size, number, specificity). The authors base their conclusions on the averages of these values when the likely degree of uncertainty could allow for the opposite interpretation.

      We appreciate the reviewer’s concerns regarding the use of ANOVA for statistical testing in the original submission. We have generated new figures that show confidence intervals for each analysis in the manuscript and these are included in the response to reviewers document below. To address the underlying concern that our N=3 sample size limits the interpretation of our results, we have revised the manuscript to be cautious in our interpretations and to discuss additional possibilities that are consistent with the anatomical data.

      General problem 3: Two of the four results sections depend on using the frequency of single active zone vGlut2 clusters near multiple active zone vGlut2 as a proxy for synaptic stabilization of the single active zone vGlut2 clusters by the multiple active zone vGlut2 clusters. The authors argue that the increased frequency of same-eye single active zone clusters relative to opposite-eye single active zone clusters means that multiple active zone vGlut2 clusters are selectively stabilizing single active zone clusters. There are other plausible explanations for this observation that are not eliminated. An increased frequency of nearby single active zone clusters would also occur if RGC axons form more than one synapse in the dLGN. Eye-specific segregation is, by definition, a relative increase in the frequency of nearby boutons from the same eye. The authors were, therefore, guaranteed to observe a non-random relationship between boutons from the same eye. The authors do compare their measures to a random model, but I could not find a description of the model. I would expect that the model would need to account for RGC arbor size, arbor structure, bouton number, and segregation independent of multi-active-zone vGlut2 clusters. The most common randomization for the type of analysis described here, a shift in the positions of single-active zone boutons, would not be adequate.<br /> In discussing the claimed cluster-induced stabilization of nearby boutons, the authors state that the specificity increases with age due to activity-dependent refinement. Their quantification does not support an increase in specificity with age. In fact, the high degree of clustering "specificity" they observe at P2 argues for the trivial same axon explanation.

      We agree with the reviewer that individual RGC axons form multiple synapses and that, over time, eye-specific segregation must increase the frequency of like-eye synapses relative to opposite-eye synapses. Indeed, our previous study of eye-specific refinement showed that at P8, the density of eye-specific inputs had increased for the dominant-eye and decreased for the non-dominant-eye (1). However, at postnatal day 4, contralateral and ipsilateral input densities were the same in the future contralateral-eye territory. One of our goals in this study was to determine if the process of synaptic clustering begins at these earliest stages of synaptic competition and, if so, whether it is influenced by retinal wave activity. It is plausible that the RGC axons from the same eye could initially form synapses randomly and, at some later stage, synapses may be selectively added to produce mature glomeruli. Consistent with this possibility, previous analysis of JAM-B RGC axon refinement showed the progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5).

      Regarding the randomization that we employed, we performed a repositioning of synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. We agree that this type of randomization cannot account for the fine scale structure of axons and dendrites, which we did not have access to in this four-color volumetric super-resolution data set. To address this, we have performed additional clustering analyses surrounding both single-active zone and multi-active zone synapses. This new analysis showed that there is a modest clustering effect around single-active zone synapses compared to complete randomization described above. We now present this information using a normalized clustering index for direct comparison of clustering between multi-active zone and single-active zone synapses. We have measured effect sizes and confidence intervals, which we present in point-by-point responses below. We have restructured the manuscript figures and discussion to provide a balanced interpretation of our results and the limitations of our study.

      Analysis of specific claims:

      Result Section 1

      Most of the figures show mean, error bars, and asterisks, but not the three data points from which these statistics are derived. Large changes in variance from condition to condition suggest that displaying the data points would provide more useful information.

      We thank the reviewer for their suggestion. We have updated all figures to display the means of all biological replicates as individual data points.

      Claim 1: Contralateral density increases more than ipsilateral in the contralateral region over the course of development. This claim is supported by the qualitative comparison of means and error bars in Figure 2D. The argument could be made quantitative by providing a confidence interval for synapse density increase for dominant and non-dominant synapse density. A confidence interval could then be generated for the difference in this change between the two groups. Currently, the most striking effect is a big difference in variance between P4 and P8 for dominant eye complex synapses. Given that N=3, I assume there is one extreme outlier here.

      We appreciate the comment and believe the reviewer was referring to the data presented in the original Figure 1D, rather than Figure 2D.

      We agree with the reviewer that our comment on the change in synapse density across ages was not quantitatively supported by the figure as we did not perform a proper age-wise statistical comparison. We have removed this claim in the revised manuscript.

      We also appreciate the suggestions to clarify the presentation of our statistical analyses and to utilize confidence interval measurements wherever possible. We present Author response image 1 below, showing the density of multi-AZ synapses in the contralateral-eye territory over time (P2-P8), for both CTB(+) contralateral (black) and CTB(-) ipsilateral inputs (red) featuring 5/95% confidence intervals:

      Author response image 1.

      More broadly, the reviewer has raised the concern that the low number of biological replicates (N=3) presents challenges in the use of ANOVA for statistical testing. We agree with the concern and have revised the manuscript to be cautious in our statistical tests and resulting claims. We have chosen to use paired T-tests to compare measurements of eye-specific synapse properties because these measurements were always made within each individual biological replicate (paired measurements). Below, we discuss our logic for this change and the effects on the results we present in the revised manuscript.

      Considering the above image:

      (1) ANOVA: In our initial submission, we used an ANOVA test which showed P<0.05 for the CTB(+) P4 vs. P8 comparison above, leading to our statement about an age-dependent increase in multi-AZ density. However, the figure above shows that P8 data has higher variance. Thus, the homogeneity of variance assumption of ANOVA may lead to false positives in this comparison.

      (2) Confidence interval for N=3: We calculated confidence intervals for P4 and P8 data (5/95% CI shown above). Overlap between the two groups indicates the true mean values of the two groups could be identical. However, the P8 confidence intervals (as well as other confidence intervals across other comparisons in the manuscript) also include the value of 0. This indicates there actually might be no multi-active zone synapses in the mouse dLGN. The failure arises because the low number of biological replicates (N=3 data points) precludes a reliable confidence interval measurement. CI measurements require sufficient sample sizes to determine the true population variance.

      (3) Difficulty in achieving sufficient sample sizes for CI analysis in ultrastructural studies of the brain: volumetric STORM experiments are technically complex and make use of sample preparation and analysis methods that are similar to volumetric electron microscopy (physical ultrathin sectioning and computational 3D stack alignment). For these technical reasons, it is difficult to collect imaging data from >10 mice for each group of data (e.g. age and tissue location) in one single project. Because of the technical challenges, most ultrastructural studies published to date present results from single biological replicates. In our STORM dataset, we collected imaging data of N=3 biological replicates for each age and genotype. We agree that in the future the collection of additional replicates will be important for improving the reliability of statistical comparisons in super-resolution and electron-microscopy studies. Continued advances in the throughput of imaging/analysis should help to make this easier over time. 

      (4) The use of paired T-tests: In this study, we have eye-specific CTB(+) and CTB(-) synapse imaging data from the same STORM fields within single biological replicates. When there is only one measurement from each replicate (e.g. synapse density, ratio of total synapses), using paired tests to compare these groups increases statistical power and does not assume similar variance. However, this limits our analysis to comparisons within each age, and not between ages. Accordingly, we have revised our discussion of the results and interpretations throughout the manuscript. When there are thousands of measurements of synapses from each replicate (e.g. Figure 2A-B on synapse volumes), we use a mixed linear model to analyze the variance. In the revised figures we present the results using standard error of the mean and link measurements from within the same individual replicates to show the paired data structure. In cases where specific comparisons are made across ages, we present 5/95% confidence interval measurements.

      Claim 2: The fraction of multiple-active zone vGlut2 clusters increases with age. This claim is weakly supported by a qualitative reading of panel 1E. The error bars overlap so it is difficult to know what the range of possible increases could be. In the text, the authors report mean differences without confidence intervals (or any other statistics). The reported results should, therefore, be interpreted as a description of their three mice and not as evidence about mice in general.

      We appreciate the reviewer’s concern that statistical accuracy of our synapse density comparisons over age is limited by the small sample size as discussed above. We have removed all strong claims about age-dependent changes in the density of multi-active zone and single-active zone synapses. Instead, we focus our analyses on comparisons between CTB(+) and CTB(-) synapse measurements, which are paired within each biological replicate. To specifically address the reviewer’s concern about figure panel 1E, we present Author response image 2 with confidence intervals below.

      Author response image 2.

      Figure S1. Panel A makes the point that the study could not be done without STORM by comparing the STORM images to "Conventional" images. The images are over-saturated low-resolution images. A reasonable comparison would be to a high-quality quality confocal image acquired with a high NA objective (~1.4) and low laser power (PSF ~ 0.2 x 0.2 x 0.6 um) that was acquired over the same amount of time it takes to acquire a STORM volume.

      We agree with the reviewer that the presentation of low-resolution conventional images is not necessary. We have deleted the panel and modified the text accordingly.

      Result section 2.

      Claim 1: The ipsi/contra (in contra LGN) difference in VGluT2 cluster volume increases with development. While there are many p-values listed, the main point is not directly quantified. A reasonable way to quantify the relative increase in volume could be in the form: the non-dominant volumes were 75%-95%(?) of the dominant volume at P2 and 60%-80% (?) at P8. The difference in change was -5 to 15%(?).

      We thank the reviewer for their helpful suggestion to improve the clarity of the results presented in this analysis of eye-specific synapse volumes. In our original report, we found differences in eye-specific VGluT2 volume at each time point (P2/P4/P8) in control mice (1). The original measurements used the entire synapse population. Here, we aimed to determine whether eye-specific differences in VGluT2 volumes were present for both multi-AZ synapses and single-AZ synapses, and whether one population may have a greater contribution to the previous population measurement that we reported. We found that at P4 (a time when the overall eye-specific synapse density is equivalent for both eyes in the dLGN), WT multi-AZ synapses showed a greater difference (372%) in eye-specific VGluT2 volume compared with single-AZ synapses (135%). In β2KO mice multi-AZ synapses showed a greater difference (110%) in eye-specific VGluT2 volume compared with single-AZ synapses (41%). In our initial manuscript submission, we included statistical comparisons of eye-specific volume differences across ages, but we did not highlight these differences in our discussion of the results. For clarity, we have removed all statistical comparisons across ages in the revised manuscript. We have modified the text to focus on eye-specific VGluT2 volume differences at P4 described above. To specifically address the reviewer’s question, we provide the percentage differences between multi- and single-AZ eye-specific synapses for each age/genotype below:

      Author response table 1.

      Claim 2: Complex synapses (vGlut2 clusters with multiple active zones) represent clusters of simple synapses and not single large boutons with multiple active zones. The authors argue that because vGlut2 cluster volume scales roughly linearly with active zone number, the vGlut2 clusters are composed of multiple boutons each containing a single active zone. Their analysis does not rule out the (known to be true) possibility that RGC bouton sizes are much larger in boutons with multiple active zones. The correlation of volume and active zone number, by itself, does not resolve the issue. A good argument for multiple boutons might be that the variance is smallest in clusters with 4 active zones (looks like it in the plot) since they would be the average of four active zones to vesicle pool ratios. It is very likely that the multi-active zone vGlut2 clusters represent some clustering and some multi-synaptic boutons. The reference cited by the authors as evidence for the presence of single active zone boutons in young tissue does not rule out the existence of multiple active zone boutons.

      We agree with the reviewer’s comments on the challenges of classifying multi-active zone synapses in STORM images as single terminals versus aggregates of terminals. To help address this, we have performed electron microscopy imaging of genetically labeled RGC axons and identified the existence of single retinogeniculate terminals with multiple active zones. Our EM imaging was limited to 2D sections and does not rule out the clustering of small, single- active zone synapses within 3D volumes. Future volumetric EM reconstructions will be informative for this question. We have significantly updated the figures and text to discuss the new results and provide a careful interpretation of the nature of multi-AZ synapses in STORM imaging data. 

      Several arguments are made that depend on the interpretation of "not statistically significant" (n.s.) meaning that "two groups are the same" instead of "we don't know if they are different". This interpretation is incorrect and materially impacts the conclusions.

      Several arguments are made that interpret statistical significance for one group and a lack of statistical significance for another group meaning that the effect was bigger in the first group. This interpretation is incorrect and materially impacts the conclusions.

      We thank the reviewer for raising these concerns. We have extensively revised the manuscript text to report the data in a more precise way without overinterpreting the results. All references to “N.S.” and associated conclusions have been either removed or substantiated with 5/95% confidence interval testing.

      Result Section 3.

      Claim 1: Complex synapses stabilize simple synapses. There are alternative explanations (mentioned above) for the observed clustering that negate the conclusions. 1) Boutons from the same axon tend to be found near one another. 2) Any form of eye-specific segregation would produce non-random associations in the analysis as performed. The authors compare each observation to a random model, but I cannot determine from the text if the model adequately accounts for alternative explanations.

      We thank the reviewer for their suggestion to consider alternative explanations for our results. We agree that our study does not provide direct molecular mechanistic data demonstrating synaptic stabilization effects. We have significantly revised the manuscript to be more cautious in our interpretations and specifically address alternative biological mechanisms that are consistent with the non-random arrangement of retinogeniculate synapses in our data.

      We agree with the reviewer that individual RGC axons form multiple synapses, however, nascent synapses might not always form close together. If synapses are initially added randomly within RGC axons, eye-specific segregation may conclude with a still-random pattern of dominant-eye inputs. At some later stage, synapses may be selectively refined to produce mature glomeruli. Consistent with this, individual RGCs undergo progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5). One of our goals in this work was to determine if the process of synaptic clustering begins at the earliest stages of synapse formation and, if so, whether it is influenced by retinal wave activity.

      To measure synaptic clustering in our STORM data, we used a randomization of single-AZ synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. Multi-AZ centroid positions were held fixed. Comparing the randomized result to the original distribution, we found a higher fraction of single-AZ synapse associated with multi-AZ synapses, arguing for a non-random clustering effect. However, we agree with the reviewer’s concern that this type of randomization cannot account for the fine scale structure of axons, which we did not have access to in this four-color volumetric super-resolution data set. Thus, there could still be errors in a purely volumetric randomization (e.g. the assignment of synapses to regions in the volume that would not be synaptic locations in the original neuropil), which would effectively decrease the measured degree of clustering after the randomization. To address this, we have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ position with the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, any measured differences in the degree of clustering reflect the synapse type.

      We have updated Figure 3 in the revised manuscript to present the relative clustering index described above. We have updated the results, discussion, and methods sections accordingly.

      The authors claim that specificity increases over time. Figure 3b (middle) shows that the number of synapses near complex synapses might increase with time (needs confidence interval for effect size), but does not show that specificity (original relative to randomized) increases with time. The fact that nearby simple synapse density is always (P2) very different from random suggests a primarily non-activity-dependent explanation. The simplest explanation is that same-side boutons could be from the same axon whereas different-side axons could not be.

      We have significantly revised the analysis and presentation of results in Figure 3 to include a comparative measurement of synaptic clustering between multi-AZ and single-AZ synapses (discussed above). The data presented in the original Figure 3B have been moved to Supplemental Figure 4. Statistical comparisons in Figure S4 between the original and randomized synapse distributions are limited to within-age measurements. Cross-age comparisons were not performed or presented. To address the reviewer’s question concerning CI analysis in the original Figure 3B, we provide Author response image 3 below showing 5/95% confidence intervals for WT mice:

      Author response image 3.

      Claim 2: vGlut2 clusters more than 1.5 um away from multi-active zone vGlut2 clusters are not statistically significantly different in size than vGlut2 clusters within 1.5 um of multi-active zone vGlut2 clusters. Therefore "activity-dependent synapse stabilization mechanisms do not impact simple synapse vesicle pool size". The specific measure of 1.5 um from multi-active zone vGlut2 clusters does not represent all possible synapse stabilization mechanisms.

      We agree with the reviewer that this specific measure does not capture all possible synapse stabilization mechanisms. We have modified the text in the revised manuscript throughout to be more cautious in our data interpretation and have included additional discussion of alternative mechanisms consistent with our results.

      Result Section 4.

      Claim: The proximity of complex synapses with nearby simple synapses to other complex synapses with nearby simple synapses from the same eye is used to argue that activity is responsible for all this clustering.

      It is difficult to derive anything from the quantification besides 'not-random'. That is a problem because we already know that axons from the left and right eye segregate during the period being studied. All the measures in Section 4 are influenced by eye-specific segregation. Given this known bias, demonstrating a non-random relationship (P<X) doesn't mean anything. The test will reveal any non-random spatial relationship between same-eye and opposite-eye synapses.

      The results can be stated as: If you are a contralateral complex synapse, contralateral complex synapses that are also close to contralateral simple synapses will, on average, be slightly closer to you than contralateral complex synapses that are not close to contralateral ipsilateral synapses. That would be true if there is any eye-specific segregation (which there is).

      We appreciate the reviewer’s comments that our anatomical data are consistent with several possible mechanisms, suggesting the need for alternative interpretations of the results. In the original writing, we interpreted our results in the context of activity-dependent mechanisms of like-eye stabilization and opposite-eye competition. However, our results are also consistent with other mechanisms, including non-random molecular specification of eye-specific inputs onto subregions of postsynaptic target cells (e.g. distinct relay neuron dendrites). We have rewritten the manuscript to be more cautious in our interpretations and to provide a balanced discussion of alternative possibilities.

      Regarding the concern that the data in section four are influenced by eye-specific segregation, we previously found synapse density from both eyes is equivalent in the contralateral region at the P4 time point presented (1), which is consistent with binocular axonal overlap at this age. Within our imaging volumes, ipsilateral and contralateral inputs were broadly intermingled throughout the volume, and we did not find evidence for regional segregation with the imaging fields. By these metrics, retraction of ipsilateral inputs from the contralateral territory has not yet occurred.

      It is an overinterpretation of the data to claim that the lack of a clear correlation between vGlut2 cluster volume and distance to vGlut2 clusters with multiple active zones provides support for the claim that "presynaptic protein organization is not influenced by mechanisms governing synaptic clustering".

      We agree with the reviewer that our original language was imprecise in referring to presynaptic protein organization broadly. We have revised this text to present a more accurate description of the results.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye-specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes in this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of presynaptic organization based on Bassoon clustering, the complex and the simple synapse. By analyzing the relative densities and distances between these proteins over age, the authors conclude that the complex synapses promote the clustering of simple synapses nearby to form the future mature glomerular synaptic structure.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises. Using this approach, the authors find that simple synapses cluster close to complex synapses over age, that complex synapse density increases with age.

      Weaknesses:

      From these data, the authors conclude that the complex synapse serves to "promote clustering of like-eye synapses and prohibit synapse clustering from the opposite eye". However, the authors show no causal data to support these ideas. There are a number of issues that the authors should consider:

      (1) Clustering of retinal synapses is in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps simple synapses mature into complex synapses. Simple synapses may also represent ones that are in the process of being eliminated as previously described by Campbell and Shatz, JNeurosci 1992 (consider citing). Can the authors distinguish these scenarios from the ones that they conclude?

      We thank the reviewer for their thoughtful commentary and suggestions to improve our manuscript. We agree with the reviewer that our original interpretation of synaptic clustering by activity-dependent stabilization and punishment mechanisms is not directly supported by causal data. We have extensively revised the manuscript to take a more cautious view of the results and to discuss alternative mechanisms that are consistent with our data.

      During eye-specific circuit development, there is indeed increased synaptogenesis and, ultimately, RGC terminals are closely clustered within synaptic glomeruli. This process involves the selective addition and elimination of synapses. Bouton clustering has been shown to occur within individual RGC axons after eye-opening in the mouse (5). The convergence of other RGC types into clustered boutons has been shown at eye-opening by light and electron microscopy (3). There is also qualitative evidence that synaptic clusters may form earlier during eye-specific segregation in the cat (4). Our data provide additional evidence that synaptic clustering begins prior to eye-opening in the mouse (P2-P8). Although synapse numbers also increase during this period, the distribution of synapse addition is non-random. 

      Single-active zone synapses (we previously called these “simple”) may indeed mature into multi-active zone synapses (we previously called these “complex”). At the same time, single-active zone synapses may be eliminated. We believe that each of these events occurs as part of the synaptic refinement process. Our STORM images are static snapshots of eye-specific refinement, and we cannot infer the dynamic developmental trajectory of an individual synapse in our data. Future live imaging experiments in vivo/in situ will be needed to track the maturation and pruning of individual connections. We have expanded our discussion of these limitations and future directions in the manuscript.

      (2) The argument that "complex" synapses are the aggregate of "simple" synapses (Fig 2, S2) is not convincing.

      We agree with the reviewer’s concern about the ambiguous identity of complex synapses. To clarify the nature of multi-active zone synapses, we have performed RGC-specific dAPEX2 labeling to visualize retinogeniculate terminals by electron microscopy (EM). These experiments revealed the presence of synaptic terminals with multiple active zones. We have added images and text to the results section describing these findings. Our 2D EM images do not rule out the possibility that some multi-active zone synapses observed in STORM images are in fact clusters of individual RGC terminals. We have revised the text to provide a more accurate discussion of the nature of multi-active zone synapses.  

      (3) The authors use of the β2KO mice to assess changes in the organization of synaptic proteins in retinal terminals that have disrupted retinal waves. However, β2-nAChRs are also expressed in the dLGN and other areas of the brain and glutamatergic synapse development has been reported in the CNS independent of the disruption in retinal waves. This issue should be considered when interpreting the total reduced retinal synapse density in the dLGN of the mutant.

      We thank the reviewer for their suggestion to consider non-retinal effects of the germline deletion of the beta 2 subunit of the nicotinic acetylcholine receptor. Previously, Xu and colleagues reported the development of a conditional transgenic mouse model lacking β2-nAChR expression specifically in the retina (6). These retina-specific β2-nAChR mutant mice (Rx-β2cKO) have disrupted retinal wave properties and defects in eye-specific axonal segregation in binocular anterograde tracing experiments. This work suggests that the defects seen in germline β2-nAChR KO mice arise from defects in retinal wave activity rather than the loss of nicotinic receptors elsewhere in the brain. Additionally, the development of brainstem cholinergic inputs to the dLGN is delayed until the closure of the eye-specific segregation period (7), further suggesting a limited role for cholinergic transmission in the retinogeniculate refinement process.

      (4) Outside of a total synapse density difference between WT and β2KO mice, the changes in the spatial organization of synaptic proteins over development do not seem that different. In fact % simple synapses near complex synapses from the non-dominant eye in the mutant is not that different from WT at P8 (Fig 3C), an age when eye-specific segregation is very different between the genotypes. Can the authors explain this discrepancy?

      We thank the reviewer for their question concerning differences between synapse organization in WT versus β2KO mice. In the original presentation of Figure 3C at P4, the percentage of non-dominant eye single-AZ synapses near multi-AZ synapses increased at P4 in WT mice, but this did not occur in β2KO mice. This is consistent with our previous results showing that there is an increase in non-dominant eye synaptic density at this age, which does not occur in β2KO mice (1). At P8, this clustering effect is lost in WT as eye-specific segregation has taken place and non-dominant eye inputs have been eliminated. However, in β2KO mice, the overall synapse density is still low at this age. We interpret this result as a failure of synaptogenesis in the β2KO line, which leads to increased growth of individual RGC axons (8) and eye-specific overlap at P8 (9, 10). Evidence in support of this interpretation comes from live dynamic imaging studies of RGC axon branching in Xenopus and Zebrafish, showing that synapse formation stabilizes local axon branching and that disruptions of synapse formation or neurotransmission lead to enlarged axons (11-13).

      Our anatomical results do not provide a specific biological mechanism for the remaining clustering observed in the β2KO mice. We have revised our discussion of the fact that individual RGC axons may form multiple synaptic connections leading to clustering, which may be independent of changes in retinal wave properties in the β2KO mouse. We have also extensively revised the analysis and presentation of results in Figure 3 to directly compare synaptic clustering around both multi-AZ synapses and single-AZ synapses within the same imaging volumes.

      (5) The authors use nomenclature that has been previously used and associated with other aspects of retinogeniculate properties. For example, the phrases "simple" and "complex" synapses have been used to describe single boutons or aggregates of boutons from numerous retinal axons, whereas in this manuscript the phrases are used to describe vesicle clusters/release sites with no knowledge of whether they are from single or multiple boutons. Likewise, the use of the word "glomerulus" has been used in the context of the retinogeniculate synapse to refer to a specific pattern of bouton aggregates that involves inhibitory and neuromodulatory inputs. It is not clear how the release sites described by the authors fit in this picture. Finally the use of the word "punishment" is associated with a body of literature regarding the immune system and retinogeniculate refinement-which is not addressed in this study. This double use of the phrases can lead to confusion in the field and should be clarified by clear definitions of how they are used in the current study.

      We appreciate the reviewer’s concern that the terminology we used in the initial submission may cause confusion. We have revised the text throughout for clarity. “Simple” synapses are now referred to as “single-active zone synapses”. “Complex” synapses are now referred to as “multi-active zone synapses”. We have removed all text that previously referred to synaptic clusters in STORM images as glomeruli. We agree that we have not provided causal evidence for synaptic stabilization and punishment mechanisms, which would require additional molecular genetic studies. We have restructured the manuscript to remove these references and discuss our anatomical results impartially.  

      Reviewer #3 (Public Review):

      This manuscript is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label a number of active zones, and anti-Homer to identify postsynaptic densities. In their previous study, they compared the detailed synaptic structure across the development of synapses made with contra-projecting vs ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new analysis on the same data set in which they classify synapses into "complex" vs. "simple" and assess the number and spacing of these synapses. From these measurements, they make conclusions regarding the processes that lead to synapse competition/stabilization.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is also a plus.

      Weaknesses:

      The lack of details provided for the classification scheme as well as the interpretation of small effect sizes limit the interpretations that can be made based on these findings.

      We thank the reviewer for their reading of the manuscript and helpful comments to improve the work. We provide details on how single-active zone and multi-active zone synapses are classified in the methods section. We agree with the suggestion to be more careful in interpreting the results. We have extensively revised the manuscript to 1) include additional electron microscopy data demonstrating the presence of multi-active zone retinogeniculate synapses, 2) extend the synaptic clustering analysis to both single-active zone and multi-active zone synapses for comparison, and 3) improve the clarity and accuracy of the discussion throughout the manuscript.

      (1) The criteria to classify synapses as simple vs. complex is critical for all of the analysis in this study. Therefore this criteria for classification should be much more explicit and tested for robustness. As stated in the methods, it is based on the number of active zones which are designated by the number of Bassoon clusters associated with a Vglut2 cluster (line 697). A second part of the criteria is the size of the presynaptic terminal as assayed by "greater Vglut2 signal" (line 116). So how are these thresholds determined? For Bassoon clusters, is one voxel sufficient? Two? If it's one, how often do they see a Bassoon positive voxel with no Vglut2 cluster and therefore may represent "noise"? There is no distribution of Bassoon volumes that is provided that might be the basis for selecting this number of sites. Unfortunately, the images are not helpful. For example, does P8 WT in Figure 1B have 7 or 2? According to Figure 2C, it appears the numbers are closer to 2-4.

      The Vglut volume measurements also do not seem to provide a clear criterion. Figure 2 shows that the distributions of Vglut2 cluster volumes for complex and for simple synapses are significantly overlapping.

      The authors need to clarify the quantitative approach used for this classification strategy and test how sensitive the results of the study are to how robust this strategy is

      We thank the reviewer for their question concerning the STORM data analysis. Here we provide a brief overview of the complete analysis details, which are provided in the methods section.

      Our raw STORM data sets consisted of spectrally separate volumetric imaging channels of VGluT2, Bassoon, and Homer1 signals. For each of these channels, raw STORM data were processed by 1) application of the corresponding low-resolution conventional image of each physical section to the STORM data to filter artifacts in the STORM image which do not appear in the conventional image, 2) STORM images are then thresholded using a 2-factor Otsu threshold that removes low-intensity background noise while preserving all single-molecule localizations that correspond to genuine antibody labeling as well as non-specific antibody labeling in the tissue, 3) application of the MATLAB function “conncomp” to identify connected component voxel in 3D across the image stack. Clusters are only kept for further analysis steps if they are connected across at least 2 continuous physical sections (140 nm Z depth). 4) for every connected component (clusters corresponding to genuine antibody labeling and background labeling), we measure the volume and signal density (intensity/volume) for every cluster in the dataset, 5) a threshold is applied to retain clusters that have a higher volume and lower signal density. We exclude signals that have low-volume and high-density, which correspond to single antibody labels. This analysis retains larger clusters that correspond to synaptic objects and excludes non-specific antibody background. 

      The average size of WT synaptic Bassoon clusters ranges from 55 - 3532 voxels (0.00092~0.059 μm<sup>3</sup>), with a median size of 460 voxels (0.0077 μm<sup>3</sup>).

      The average size of WT synaptic VGluT2 clusters ranges from 50 -73752 voxels (0.00084~1.2 μm<sup>3</sup>), with a median size of 980 voxels (0.016 μm<sup>3</sup>).

      The average size of WT synaptic Homer1 clusters ranges from 63-7118 (0.0010~0.12 μm3), with a median size of 654 voxels (0.011 μm<sup>3</sup>).

      In practice, any Bassoon/VGluT2/Homer1 clusters with <10 voxels are immediately filtered at the Otsu thresholding step (2) above.

      The reviewer is correct that we often see Bassoon(+) clusters that are not associated with VGluT2, and these may reflect synapses of non-retinal origin or retinogeniculate synapses that lack VGluT2 expression. To identify retinogeniculate synapses containing VGluT2, we performed a synapse pairing analysis that measured the association between VGluT2 and Bassoon clusters after the synapse cluster filtering described above. We first measured the centroid-centroid distance from each VGluT2 cluster to the closest cluster in the Bassoon channel. We next quantified the signal intensity of the Bassoon channel within a 140 nm shell surrounding each VGluT2 cluster. A 2D histogram was plotted based on the measured centroid-centroid distances and opposing channel signal densities of each cluster. Paired clusters with closely positioned centroids and high intensities of apposed channel signal were identified using the OPTICS algorithm (14).

      In the original Figure 1B, the multi-active zone synapse in WT at P8 had two Bassoon clusters. To clarify this, we have revised the images in Figure 1 to include arrowheads that point to individual active zones. We have also revised Supplemental Figure 1 to show volumetric renderings of individual example synapses that help illustrate the 3D structure of these multi-active zone inputs. All details about synapse analysis and synapse pairing are provided in the methods section.

      (2) Effect sizes are quite small and all comparisons are made on medians of distributions. This leads to an n=3 biological replicates for all comparisons. Hence this small n may lead to significant results based on ANOVAS/t-tests, but the statistical power of these effects is quite weak. To accurately represent the variance in their data, the authors should show all three data points for each category (with a SD error bar when possible). They should also include the number of synapses in each category (e.g. the numerators in Figure 1D and the denominators for Figure 1E). For other figures, there are additional statistical questions described below.

      We thank the reviewer for their suggestion to improve the presentation of our results. We have added all three data points (individual biological replicates) to each figure plot when applicable. We have also included a supplemental table (Table S1) listing total eye-specific synapse numbers of each type (mAZ and sAZ) and AZ number for each biological replicate in both genotypes.

      (3) The authors need to add a caveat regarding their classification of synapses as "complex" vs. "simple" since this is a terminology that already exists in the field and it is not clear that these STORM images are measuring the same thing. For example, in EM studies, "complex" refers to multiple RGCs converging on the same single postsynaptic site. The authors here acknowledge that they cannot assign different AZs to different RGCs so this comparison is an assumption. In Figure 2 they argue this is a good assumption based on the finding that the Vglut column/active zone is constant and therefore each represents a single RGC. However, the authors should acknowledge that they are actually seeing quite different percentages than those in EM studies. For example, in Monavarfeshani et al, eLife 2018, there were no complex synapses found at P8. (Note this study also found many more complex vs. simple synapses in the adult - 70% vs. the 20% found in the current study - but this difference could be a developmental effect). In the future, the authors may want to take another data set in the adult dLGN to make a direct comparison based on numbers and see if their classification method for complex/simple maps onto the one that currently exists in the literature.

      We appreciate the reviewer’s comment that the use of the terms “complex” and “simple” may cause confusion. We have significantly revised the manuscript for clarity: 1) we now refer to “complex” synapses as “multi-active zone synapses” and “simple” synapses as “single-active zone synapses. 2) We have performed electron microscopy analysis of dAPEX2-labeled retinogeniculate projections to confirm the existence of large synaptic terminals with multiple active zones. 3) We have expanded our discussion of previous electron microscopy results describing a lack of axonal convergence at P8 (3). 4) We have added a discussion on how individual RGCs may form multiple synapses in close proximity within their axonal arbor, which would create a clustering effect.

      We agree that it will be informative to collect a STORM data set in the adult mouse dLGN and we look forward to working on this project to compare with EM results in the future.  

      (4) Figure 3 assays the relative distribution of simple vs. complex synapses. They found that a larger percentage of simple synapses were within 1.5 microns of complex synapses than you would expect by chance for both ipsi and contra projecting RGCs, and hence conclude that complex synapses are sites of synaptic clustering. In contrast, there was no clustering of ipsi-simple to contra-complex synapses and vice versa. The authors also argue that this clustering decreases between P4 and P8 for ipsi projecting RGCs.

      This analysis needs much more rigor before any conclusions can be drawn. First, the authors need to justify the 1.5-micron criteria for clustering and how robust their results are to variations in this distance. Second, these age effects need to be tested for statistical significance with an ANOVA (all the stats presented are pairwise comparisons to means expected by random distributions at each age). Finally, the authors should consider what n's to use here - is it still grouped by biological replicate? Why not use individual synapses across mice? If they do biological replicates, then they should again show error bars for each data point in their biological replicates. And they should include the number of synapses that went into these measurements in the caption.

      We appreciate the suggestion to improve the rigor of our analysis of synaptic clustering presented in Figure 3. We have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ synapses and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ positions within the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, the measured differences in the degree of clustering reflect a synapse type-specific effect.

      We have also updated Supplemental Figure 3 showing the results of varying the search radius from 1-4 μm for both contralateral- and ipsilateral-eye synapses. The results showed that a search radius of 1.5 μm resulted in the largest difference between the original synapse distribution and a randomized synapse distribution (shuffling of single-active zone synapse position while holding multi-active zone synapse position fixed).

      Finally, we have removed all statistical comparisons of single measurements (means or ratios) across ages from the manuscript. We focus our statistical analysis on paired data comparisons within individual biological replicates.

      For the analysis of synapse clustering, we grouped the data by biological replicates (N=3) to look for a global effect on synapse clustering. In the revised manuscript, we added data points for each replicate in the figure and included the number of synapses in Supplementary Table 1.

      (5) Line 211-212 - the authors conclude that the absence of clustered ipsi-simple synapses indicates a failure to stabilize (Figure 3). Yet, the link between this measurement and synapse stabilization is not clear. In particular, the conclusion that "isolated" synapses are the ones that will be eliminated seems to be countered by their finding in Figure 3D/E which shows that there is no difference in vesicle pool volume between near and far synapses. If isolated synapses are indeed the ones that fail to stabilize by P8, wouldn't you expect them to be weaker/have fewer vesicles? Also, it's hard to tell if there is an age-dependent effect since the data presented in Figures 3D/E are merged across ages.

      We thank the reviewer for their suggestion to clarify the results in Figure 3. Based on the measured eye-specific differences in vesicle pool size and organization, we also expected that synapses outside of clusters would show a reduced vesicle population. However, across all ages, we found no differences in the vesicle pool size of single-active zone synapses based on their proximity to multi-active zone synapses. Below, we show cumulative distributions of these results across all ages (P2/P4/P8) for WT mice CTB(+) data. Statistical tests (Kolmogorov-Smirnov tests) show no significant differences. P = 0.880, 0.767, 0.494 respectively. Separate 5/95% confidence interval calculations showed overlap between far and near populations at each age.

      Author response image 4.

      To clarify the presentation of the results, we have changed the text to state that the “vesicle pool size of sAZ synapses is independent of their distance to mAZ synapses”. We have removed references to stabilization and punishment from the results section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Because none of the phenomena being measured can be expected to behave randomly (given what is already known about the system) and the sample size is small, I believe quantification of the data requires confidence intervals for effect sizes. Resolving the multi-bouton vs multi-active zone bouton with EM would also help.

      We thank the reviewer for their thorough reading of the manuscript and many helpful suggestions. We provide analysis with confidence intervals in a point-by-point response below. In the manuscript we revised our results and focused our statistical analyses on comparisons within the same biological replicate (paired effects). In addition, we have performed electron microscopy of RGC inputs to the dLGN at postnatal day 8 to demonstrate the presence of retinogeniculate synapses with multiple active zones.

      Figure 1:

      Please show data points in scatter bar plots and not just error bars.

      We have updated all plots to show data points for independent biological replicates.

      Please describe the image processing in more detail and provide an image in which the degree of off-target labeling can be evaluated.

      We have updated the description of the image processing in the methods sections. We have made all the code used in this analysis freely available on GitHub (https://github.com/SpeerLab). We have uploaded the raw STORM images of the full data set to the open-access Brain Imaging Library (16). These images can be accessed here: https://api.brainimagelibrary.org/web/view?bildid=ace-dud-lid (WTP2A data for example). All 18 datasets are currently searchable on the BIL by keyword “dLGN” or PI last name “Speer” and a DOI for the grouped dataset is pending.

      How does panel 1D get very small error bars with N = 3? Please provide scatter plots.

      We have updated panel 1D to show the means for each independent biological replicate.

      Line 129: over what volume is density measured? What are the n's? What is the magnitude (with confidence intervals) of increase?

      The volume we collected from each replicate was ~80μm*80μm*7μm (total volume ~44,800 μm3). N=3 biological replicates for each age, genotype, and tissue location. Because of concerns with the use of ANOVA for low sample numbers, we have removed a majority of the age-wise comparisons from the manuscript and instead focus on within-replicate paired data comparisons. Author response image 5 showa 5/95% confidence intervals for WT data (left panel) and β2KO data (right panel) is shown below:

      Author response image 5.

      The 5/95% CI range for the increase in synapse density from P2 to P8 for CTB(+) synapses is ~ -0.001 ~ 0.037 synapses / μm<sup>3</sup>.

      Line 131: You say that non-dominant increases and then decreases. It appears that the error bars argue that you do not have enough information to reliably determine how much or little density changes.

      Line 140: No confidence intervals. It appears the error bars allow both for the claimed effect of increased fraction and the opposite effect of decreased density.

      Because of concerns with the use of ANOVA for low sample numbers, we have removed age-wise comparisons of single-measurements (means and ratios) from the manuscript and instead focus on within-replicate paired data comparisons.

      Line 144: Confidence intervals would be a reasonable way to argue that fraction is not changed in KO: normal fraction XX%-XX%. KO fraction XX%-XX%.

      Author response image 6 shows panels for WT (left) and β2KO mice (right) with 5/95% CIs.

      Author response image 6.

      In the revised manuscript, we have updated the text to report the measurements, but we do not draw conclusions about changes over development.

      I find it hard to estimate magnitudes on a log scale.

      We appreciate the reviewer’s concern with the presentation of results on a log scale. Because the measured synapse properties are distributed logarithmically, we have elected to present the data on a log scale so that the distribution(s) can be seen clearly. Lognormal distributions enable us to use a mixed linear model for statistical analysis.

      Line 156: Needs confidence interval for difference.

      Line 158: Needs confidence interval for difference of differences.

      Line 160: Needs confidence interval for difference of differences.

      Why only compare at P4 where there is the biggest difference? The activity hypothesis would predict an even bigger effect at P8.

      Below is a table listing the mean volume (log10μm3) and [5/95%] confidence intervals for comparisons of VGluT2 signal between CTB(+) and CTB(-) synapses from Figure 2A and 2B:

      Author response table 2.

      Based on the values given above, the mean difference of differences and [5/95%] confidence intervals are listed below:

      Author response table 3.

      We added these values to the manuscript. We have also reported the difference in median values on a linear scale (as below) so that the readers can have a straightforward understanding of the magnitude.

      Author response table 4.

      We elected to highlight the results at P4 based on our previous finding that the synapse density from each eye-of-origin is similar at this time point (1).

      At P8, there is a decrease in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to an increase in VGluT2 volume within non-dominant eye synapses that survive competition between P4-P8.

      At P8 in the mutant, there is an increase in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to delayed synaptic maturation in β2KO mice.

      Line 171: The correct statistical comparison was not performed for the claim. Lack of * at P2 does not mean they are the same. Why do you get the same result for KO?

      We have revised the statistical analysis, figure presentation, and text to remove discussion of changes in the number of active zones per synapse over development based on ANOVA. We now report eye-specific differences at each time point using paired T-test analysis, which is mathematically equivalent to comparing the 5/95% confidence interval in the difference.

      Line 175: Qualitative claim. Correlation coefficients and magnitudes of correlation coefficients are not reported.

      Linear fitting slop and R square values are attached:

      Author response table 5.

      The values are added to the manuscript to support the conclusions.

      Line 177: n.s. does not mean that you have demonstrated the values are the same. An argument for similarity could be made by calculating a confidence interval a for potential range of differences. Example: Complex were 60%-170% of Simple.

      Author response image 7 with 5/95% CI is shown below (WT and B2KO):

      Author response image 7.

      Comparing the difference between multi-AZ synapse and single-AZ synapse revealed that the difference in average VGluT2 cluster volume per AZ is:

      Author response table 6.

      The values are added to the manuscript for discussion.

      Line 178: There is no reason to think that the vesical pool for a single bouton does not scale with active zone number within the range of uncertainty presented here.

      We have collected EM images of multi-AZ zone synapses and modified our discussion and conclusions in the revised text.

      Line 196: "non-random clustering increased progressively" is misleading. The density of the boutons increases for both the Original and Randomized. Given the increase in variance at P8, it is unlikely that the data supports the claim that the non-randomness increased. Would be easy to quantify with confidence intervals for a measure of specificity (O/R).

      We have revised the manuscript to remove analysis and discussion of changes in clustering over development. We have modified this section of the manuscript and figures to present a normalized clustering index that describes the non-random clustering effect present at each time point.

      Line 209: Evidence is for correlation, not causation and there is a trivial potential explanation for correlation.

      We appreciate the reviewer’s concern with over interpretation of the results. We have changed the text to more accurately reflect the data.

      Line 238:239: Authors failed to show effect is activity-dependent. Near/Far distinction is not necessarily a criterion for the effect of activity. The claim is likely false in other systems.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to more accurately reflect the data. 

      Line 265-266: Assumes previous result is correct and measure of vGlut2 provides information about all presynaptic protein organization.

      We thank the reviewer for pointing out the incorrect reference to all presynaptic protein organization. We have corrected the text to reference only the VGluT2 and Bassoon signals that were measured.

      Line 276: There are many other interpretations that include trivial causes. It is unclear what the measure indicates about the biology and there is no interpretable magnitude of effect.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to remove references to mechanisms of synaptic stabilization.

      Line 289: Differences cannot be demonstrated by comparing P-values. Try comparing confidence intervals for effect size or generate a confidence interval for the difference between the two groups.

      5/95% confidence intervals are given below for Figure 4C/D:

      Author response table 7.

      We have added these values to the manuscript to support our conclusion.

      Line 305: "This suggests that complex synapses from the non-dominant-eye do not exert a punishment effect on synapses from the dominant-eye" Even if all the other assumptions in this claim were true, "n.s." just means you don't know something. It cannot be compared with an asterisk to claim a lack of effect.

      We thank the reviewer for raising this concern. We have modified the text to remove references to synaptic punishment mechanisms in the results section.

      Below are the 5/95% confidence intervals for the results in Figure 4F:

      Author response table 8.

      We have added these values to the manuscript to support our conclusion.

      Line 308: "mechanisms that act locally". 6 microns is introduced based on differences in curves above(?). I don't see any analysis that would argue that longer-distance effects were not present.

      The original reference referred to the differences in the cumulative distribution measurements between multi-active zone synapses versus single-active zone synapses in their distance to the nearest neighboring multi-active zone synapse. For clarity, we have deleted the reference to the 6 micron distance in the revised text.

      Reviewer #2 (Recommendations For The Authors):

      (1) This data set would be valuable to the community. However, unless the authors can show experiments that manipulate the presence of complex synapses to test their concluding claims, the manuscript should be rewritten with a reassessment of the conclusions that is more grounded in the data.

      We thank the reviewer for their careful reading of the manuscript and we agree the original interpretations were not causally supported by the experimental results. We have made substantial changes to the text throughout the introduction, results, and discussion sections so that the conclusions accurately reflect the data.

      (2) To convincingly address the claim that "complex synapse" are aggregates of simple synapses, the authors should perform experiments at the EM level showing what the bouton correlates are to these synapses.

      We thank the reviewer for their suggestion to perform EM to gain a better understanding of retinogeniculate terminal structure. We generated an RGC-specific transgenic line expressing the EM reporter dAPEX2 localized to mitochondria. We have collected EM images of retinogeniculate terminals that demonstrate the presence of multiple active zones within individual synapses. These results are now presented in Figure 1. The text has been updated to reflect the new results.

      (3) Experiments using the conditional β2KO mice would help address questions of the contribution of β2-nAChRs in dLGN to the synaptic phenotype.

      We appreciate the reviewer’s concern that the germline β2KO model may show effects that are not retina-specific. To address this, Xu and colleagues generated a retina-specific conditional β2KO transgenic and characterized wave properties and defective eye-specific segregation at the level of bulk axonal tracing (6). The results from the conditional mutant study suggest that the main effects on eye-specific axon refinement in the germline β2KO model are likely of retinal origin through impacts on retinal wave activity. Additionally, anatomical data shows that brainstem cholinergic axons innervate the dLGN toward the second half of eye-specific segregation and are not fully mature at P8 when eye-specific refinement is largely complete (7). We agree with the reviewer that future synaptic studies of previously published wave mutants, including the conditional reporter line, would be needed to conclusively assess a contribution of non-retinal nAChRs. These experiments will take significant time and resources and we respectfully suggest this is beyond the scope of the current manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to be more transparent that they are using the same data set from the previous publication (right now it does not appear until line 471) and clarify what was found in that study vs what is being tested here.

      We thank the reviewer for their thoughtful reading of the manuscript and helpful recommendations to improve the clarity of the work. We have edited the text to make it clear that this study is a reanalysis of an existing data set. We have revised the text to discuss the results from our previous study and more clearly define how the current analysis builds upon that initial work. 

      (2) The authors restricted their competition argument in Figure 4 to complex synapses, but why not include the simple ones? This seems like a straightforward analysis to do.

      We appreciate the reviewer’s suggestion to measure spatial relationships between “clustered” and “isolated” single-AZ synapses as we have done for multi-AZ synapses in Figure 4. However, we are not able to perform a direct and interpretable comparison with the results shown for multi-AZ synapses. First, we would need to classify “clustered” and “isolated” single-AZ synapses. This classification convolves two effects: 1) a distance threshold to define clustering and 2) subsequent distance measurements between clustered synapses.

      If we apply an equivalent 1.5 μm distance threshold (or any other threshold) to define clustered synapses, the distance from each “clustered” single-AZ synapse to the nearest other single-AZ synapse will always be smaller than the defined threshold (1.5 μm). Alternatively, if all of the single-AZ synapses within each local 1.5 μm shell are excluded from the subsequent intersynaptic distance measurements, this will set a hard lower boundary on the distance between synaptic clusters (1.5 μm minimum). The two effects discussed above were separated in our original analysis of multi-AZ synapses defined as “clustered” and “isolated” based on their relationship to single-AZ synapses, but these effects cannot be separated when analyzing single-AZ distributions alone.

      (3) The Discussion seems much too long and speculative from the current data that is represented - particularly without verification of complex synapses actually being inputs from different RGCs. Along the same lines, figure captions are misleading. For example, for Figure 4 - the title indicates that the complex synapses are driving the rearrangements. But of course, these are static images. The authors should use titles that are more reflective of their findings rather than this interpretation.

      We thank the reviewer for these helpful suggestions. We have changed each of the figure captions to more accurately reflect the results. We have deleted all of the speculative discussion and revised the remaining text to improve the accuracy of the presentation.

      (4) In the future, the authors may want to consider an analysis as to whether ipsi and contra projection contribute to the same synapses

      We agree with the reviewer that it is of interest to investigate the contribution of binocular inputs to retinogeniculate synaptic clusters during development. At maturity, some weak binocular input remains in the dominant-eye territory (15). To look for evidence of binocular synaptic interactions, we measured the percentage of the total small single-active zone synapses that were within 1.5 micrometers of larger multi-active zone synapses of the opposite eye. On average, ~10% or less of the single-active zone synapses were near multi-active zone synapses of the opposite eye. This analysis is presented in Supplemental Figure S3C/D.

      It is possible that some large mAZ synapses might reflect the convergence of two or more smaller inputs from the two eyes. Our current analyses do not rule this out. However, previous EM studies have found limited evidence for convergence of multiple RGCs (3) at P8 and our own EM images show that larger terminals with multiple active zones are formed by a single RGC bouton. Future volumetric EM reconstructions with eye-specific labels will be informative to address this question.

      References

      (1) Zhang C, Yadav S, Speer CM. The synaptic basis of activity-dependent eye-specific competition. Cell Rep. 2023;42(2):112085.

      (2) Bickford ME, Slusarczyk A, Dilger EK, Krahe TE, Kucuk C, Guido W. Synaptic development of the mouse dorsal lateral geniculate nucleus. J Comp Neurol. 2010;518(5):622-35.

      (3)Monavarfeshani A, Stanton G, Van Name J, Su K, Mills WA, 3rd, Swilling K, et al. LRRTM1 underlies synaptic convergence in visual thalamus. Elife. 2018;7.

      (4) Campbell G, Shatz CJ. Synapses formed by identified retinogeniculate axons during the segregation of eye input. J Neurosci. 1992;12(5):1847-58.

      (5) Hong YK, Park S, Litvina EY, Morales J, Sanes JR, Chen C. Refinement of the retinogeniculate synapse by bouton clustering. Neuron. 2014;84(2):332-9.

      (6) Xu HP, Burbridge TJ, Chen MG, Ge X, Zhang Y, Zhou ZJ, et al. Spatial pattern of spontaneous retinal waves instructs retinotopic map refinement more than activity frequency. Dev Neurobiol. 2015;75(6):621-40.

      (7) Sokhadze G, Seabrook TA, Guido W. The absence of retinal input disrupts the development of cholinergic brainstem projections in the mouse dorsal lateral geniculate nucleus. Neural Dev. 2018;13(1):27.

      (8) Dhande OS, Hua EW, Guh E, Yeh J, Bhatt S, Zhang Y, et al. Development of single retinofugal axon arbors in normal and beta2 knock-out mice. J Neurosci. 2011;31(9):3384-99.

      (9) Rossi FM, Pizzorusso T, Porciatti V, Marubio LM, Maffei L, Changeux JP. Requirement of the nicotinic acetylcholine receptor beta 2 subunit for the anatomical and functional development of the visual system. Proc Natl Acad Sci U S A. 2001;98(11):6453-8.

      (10) Muir-Robinson G, Hwang BJ, Feller MB. Retinogeniculate axons undergo eye-specific segregation in the absence of eye-specific layers. J Neurosci. 2002;22(13):5259-64.

      (11) Fredj NB, Hammond S, Otsuna H, Chien C-B, Burrone J, Meyer MP. Synaptic Activity and Activity-Dependent Competition Regulates Axon Arbor Maturation, Growth Arrest, and Territory in the Retinotectal Projection. J Neurosci. 2010;30(32):10939.

      (12) Hua JY, Smear MC, Baier H, Smith SJ. Regulation of axon growth in vivo by activity-based competition. Nature. 2005;434(7036):1022-6.

      (13) Rahman TN, Munz M, Kutsarova E, Bilash OM, Ruthazer ES. Stentian structural plasticity in the developing visual system. Proc Natl Acad Sci U S A. 2020;117(20):10636-8.

      (14) Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 1999;28(2):49–60.

      (15) Bauer J, Weiler S, Fernholz MHP, Laubender D, Scheuss V, Hübener M, et al. Limited functional convergence of eye-specific inputs in the retinogeniculate pathway of the mouse. Neuron. 2021;109(15):2457-68.e12.

      (16) Benninger K, Hood G, Simmel D, Tuite L, Wetzel A, Ropelewski A, et al. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research.  Practice and Experience in Advanced Research Computing; Portland, OR, USA: Association for Computing Machinery; 2020. p. 1–7.

    1. eLife Assessment

      This is a fundamental study that addresses the key question of how the tetraspanin Tspan12 functions biochemically as a co-receptor for Norrin to initiate β-catenin signaling. The strength of the work lies in the rigorous and compelling binding analyses involving various purified receptors, co-receptors, and ligands, as well as molecular modeling by AlphaFold that was subsequently validated by an extensive series of mutagenesis experiments. The study advances the field by providing a novel mechanism of co-receptor function and shedding new light on how signaling specificity is achieved in the complex Wnt/Norrin signaling system.

    2. Joint Public Review:

      Though the Norrin protein is structurally unrelated to the Wnt ligands, it can activate the Wnt/β-catenin pathway by binding to the canonical Wnt receptors Fzd4 and Lrp5/6, as well as the tetraspanin Tspan12 co-receptor. Understanding the biochemical mechanisms by which Norrin engages Tspan12 to initiate signaling is important, as this pathway plays an important role in regulating retinal angiogenesis and maintaining the blood-retina-barrier. Numerous mutations in this signaling pathway have also been found in human patients with ocular diseases. The overarching goal of the study is to define the biochemical mechanisms by which Tspan12 mediates Norrin signaling. Using purified Tspan12 reconstituted in lipid nanodiscs, the authors conducted detailed binding experiments to document the direct, high-affinity interactions between purified Tspan12 and Norrin. To further model this binding event, they used AlphaFold to dock Norrin and Tspan12 and identified four putative binding sites. They went on to validate these sites through mutagenesis experiments. Using the information obtained from the AlphaFold modeling and through additional binding competition experiments, it was further demonstrated that Tspan12 and Fzd4 can bind Norrin simultaneously, but Tspan12 binding to Norrin is competitive with other known co-receptors, such as HSPGs and Lrp5/6. Collectively, the authors proposed that the main function of Tspan12 is to capture low concentrations of Norrin at the early stage of signaling, and then "hand over" Norrin to Fzd4 and Lrp5/6 for further signal propagation. Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Though the Norrin protein is structurally unrelated to the Wnt ligands, it can activate the Wnt/βcatenin pathway by binding to the canonical Wnt receptors Fzd4 and Lrp5/6, as well as the tetraspanin Tspan12 co-receptor. Understanding the biochemical mechanisms by which Norrin engages Tspan12 to initiate signaling is important, as this pathway plays an important role in regulating retinal angiogenesis and maintaining the blood-retina-barrier. Numerous mutations in this signaling pathway have also been found in human patients with ocular diseases. The overarching goal of the study is to define the biochemical mechanisms by which Tspan12 mediates Norrin signaling. Using purified Tspan12 reconstituted in lipid nanodiscs, the authors conducted detailed binding experiments to document the direct, high-affinity interactions between purified Tspan12 and Norrin. To further model this binding event, they used AlphaFold to dock Norrin and Tspan12 and identified four putative binding sites. They went on to validate these sites through mutagenesis experiments. Using the information obtained from the AlphaFold modeling and through additional binding competition experiments, it was further demonstrated that Tspan12 and Fzd4 can bind Norrin simultaneously, but Tspan12 binding to Norrin is competitive with other known co-receptors, such as HSPGs and Lrp5/6. Collectively, the authors proposed that the main function of Tspan12 is to capture low concentrations of Norrin at the early stage of signaling, and then "hand over" Norrin to Fzd4 and Lrp5/6 for further signal propagation. Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Strengths: 

      • Biochemical reconstitution of Tspan12 and Fzd4 in lipid nanodiscs is an elegant approach for testing the direct binding interaction between Norrin and its co-receptors. The proteins used for the study seem to be of high purity and quality. 

      • The various binding experiments presented throughout the study were carried out rigorously. In particular, BLI allows accurate measurement of equilibrium binding constants as well as on and off rates. 

      • It is nice to see that the authors followed up on their AlphaFold modeling with an extensive series of mutagenesis studies to experimentally validate the potential binding sites. This adds credence to the AlphaFold models. 

      • Table S1 is a further testament to the rigor of the study. 

      • Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Suggestions for improvement: 

      • It would be helpful to show Coomassie-stained gels of the key mutant Norrin and Tspan12 proteins presented in Figures 2E and 2F. 

      We have included Stain-Free SDS-PAGE gels from the purification of the Norrin and Tspan12 mutants in a new Figure S4.

      • Many Norrin and Tspan12 mutations have been identified in human patients with FEVR. It would be interesting to comment on whether any of the mutations might affect the NorrinTspan12 binding sites described in this study. 

      Thank you for this suggestion. We have inspected human mutation databases gnomAD, ClinVar, and HGMD for known mutations in the predicted Tspan12-Norrin binding interface and their occurrence in human patients with FEVR or Norrie disease.

      While a number of Tspan12 residues that we predict to interact with Norrin are impacted by rare mutations in humans (e.g., L169M, E170V, E173K, D175N, E196G, S199C, as found in the gnomAD database), these alleles are of unknown clinical significance (as found in ClinVar or HGMD databases). It is possible that mutations that slightly weaken the Norrin-Tspan12 interface may not produce a strong phenotype, especially given the avidity we expect from this system. By our examination, the missense variants of clinical significance that have been found in the Tspan12 LEL would be expected to destabilize the protein (i.e., mutations to or from cysteine or proline, or mutations to residues involved in packing interactions within the LEL fold), and therefore these mutations may produce a disease phenotype by impacting Tspan12 protein expression levels.  

      Several Norrin mutations that are associated with Norrie disease, FEVR, or other diseases of the retinal vasculature have been found in the predicted Tspan12 binding site. For example, Norrin mutations at positions L103 (L103Q, L103V), K104 (K104N, K104Q), and A105 (A105T, A105P, A105E, A105S, A105T, A105V) have been found in patients, all of which may disrupt binding to Tspan12. However, the deleterious effect of K104 mutations on Norrin-stimulated signaling could also be explained by a weakened Norrin-Fzd4 binding interface. Norrin mutations at R115 (R115L and R115Q), as well as R121 (R121L, R121G, R121Q, and R121W) have also been found in patients with various diseases of the retinal vasculature. Additionally, the Norrin mutation T119P has been found in patients with Norrie disease, but we would expect this mutation to destabilize Norrin in addition to disrupting the Tspan12 binding site. 

      While we commented briefly on mutations R115L and R121W in the original draft (page 5, paragraphs 4 and 1, respectively), we have updated the manuscript with more comments on disease-associated mutations to the predicted Tspan12 binding site on Norrin (page 5, first partial paragraph; page 9, first partial paragraph). 

      • Some of the negative conclusions (e.g. the lack of involvement of Tspan12 in the formation of the Norrin-Lrp5/6-Fzd4-Dvl signaling complex) can be difficult to interpret. There are many possible reasons as to why certain biological effects are not recapitulated in a reconstitution experiment. For instance, the recombinant proteins used in the experiment may not be presented in the correct configurations, and certain biochemical modifications, such as phosphorylation, may also be missing. 

      We agree that different Tspan12 and Fzd4 stoichiometries, lipid compositions, and posttranslational modifications could impact the results of our study, and that it is important to mention these possibilities. We have added these caveats to the discussion section (page 10, last paragraph).  

      Reviewer #2 (Public Review): 

      This is an interesting study of high quality with important and novel findings. Bruguera et al. report a biochemical and structural analysis of the Tspan12 co-receptor for norrin. Major findings are that Norrin directly binds Tspan12 with high affinity (this is consistent with a report on BioRxiv: Antibody Display of cell surface receptor Tetraspanin12 and SARS-CoV-2 spike protein) and a predicted structure of Tspan12 alone or in complex with Norrin. The

      Norrin/Tspan12 binding interface is largely verified by mutational analysis. An interaction of the Tspan12 large extracellular loop (LEL) with Fzd4 cannot be detected and interactions of fulllength Tspan12 and Fzd4 cannot be tested using nano-disc based BLI, however, Fzd4/Tspan12 heterodimers can be purified and inserted into nanodiscs when aided by split GFP tags. An analysis of a potential composite binding site of a Fzd4/Tspan12 complex is somewhat inconclusive, as no major increase in affinity is detected for the complex compared to the individual components. A caveat to this data is that affinity measurements were performed for complexes with approximately 1 molecule Tspan12 and FZD4 per nanodisc, while the composite binding site could potentially be formed only in higher order complexes, e.g., 2:2 Fzd4/Tspan12 complexes. Interestingly, the authors find that the Norrin/Tspan12 binding site and the Norrin/Lrp6 binding site partially overlap and that the Lrp6 ectodomain competes with Tspan12 for Norrin binding. This result leads the authors to propose a model according to which Tspan12 captures Norrin and then has to "hand it off" to allow for Fzd4/Lrp6 formation. By increasing the local concentration of Norrin, Tspan12 would enhance the formation of the Fzd4/Lrp5 or Fzd4/Lrp6 complex. 

      Thank you for pointing out the BioRxiv report showing Norrin-Tspan12 LEL binding. We have cited this in the introduction of our revised manuscript (page 2, paragraph 3).

      The experiments based on membrane proteins inserted into nano-discs and the structure prediction using AlphaFold yield important new insights into a protein complex that has critical roles in normal CNS vascular biology, retinal vascular disease, and is a target for therapeutic intervention. However, it remains unclear how Norrin would be "handed off" from Tspan12 or Tspan12/Fzd4 complexes to Fzd4/Lrp6 complexes, as the relatively high affinity of Norrin to Fzd4/Tspan12 dimers likely does not favor the "handing off" to Fzd4/Lrp6 complexes. 

      While the Fzd4-Tspan12 interaction is strong, our data suggest that Fzd4 and Tspan12 bind Norrin with negative cooperativity, suggesting that Fzd4 binding may enhance Norrin-Tspan12 dissociation to facilitate handoff. This model is based on 1) the dissociation of Norrin from beadbound Tspan12 in the presence of saturating Fzd4 CRD (Figure 3D), and 2) a weaker measured affinity of Norrin-Tspan12LEL in the presence of saturating Fzd4 CRD (Figure 3F). We have now added wording to emphasize this in the discussion section (page 9, end of first full paragraph).

      However, as you note, the Norrin-Tspan12 affinity that we measured in the presence of Fzd CRD (tens of nM) is still much stronger than the known Norrin-LRP6 affinity (0.5-1µM), which predicts that the efficiency of this handoff may be low. We have now commented on this in the discussion section and mentioned an alternative model in which Tspan12 presents the second Norrin protomer to LRP5/6 for signaling, instead of dissociating (page 9, paragraph 2). However, the handoff efficiency could also be impacted by other factors such as the relative abundance and surface distribution of Tspan12, Fzd4, LRP6 and HSPGs.  

      Areas that would benefit from further experiments, or a discussion, include: 

      -  The authors test a potential composite binding site of Fzd4/Tspan12 heterodimers for norrin using nanodiscs that contain on average about 1 molecule Fzd4 and 1 molecule Tspan12. The Fzd4/Tspan12 heterodimer is co-inserted into the nanodiscs supported by split-GFP tags on Fzd4 and Tspan12. The authors find no major increase in affinity, although they find changes to the Hill slope, reflecting better binding of norrin at low norrin concentrations. In 293F cells overexpressing Fzd4 and Tspan12 (which may result in a different stoichiometry) they find more pronounced effects of norrin binding to Fzd4/Tspan12. This raises the possibility that the formation of a composite binding requires Fzd4/Tspan12 complexes of higher order, for example, 2:2 Fzd4/Tspan12 complexes, where the composite binding site may involve residues of each Fzd4 and Tspan12 molecule in the complex. This could be tested in nanodiscs in which Fzd4 and Tspan12 are inserted at higher concentrations or using Fzd4 and Tspan12 that contain additional tags for oligomerization. 

      It is quite possible that Tspan12 and Fzd4 cluster into complexes with a stoichiometry greater than 1:1 in cells (this is supported by e.g., BRET experiments in (Ke et al., 2013)), and we mention in the discussion that that receptor clustering may be an additional mechanism by which Tspan12 exerts its function (page 10, paragraph 4). We would be quite interested to know the stoichiometry of Fzd4 and Tspan12 complexes in cells at endogenous expression levels, both in the presence and absence of Norrin, and to biochemically characterize these putative larger complexes in the future. We have amended the discussion to mention the caveat that our reconstitution experiments do not test higher-stoichiometry Fzd4/Tspan12 complexes (page 10, last paragraph).

      - While Tspan12 LEL does not bind to Fzd4, the successful reconstitution of GFP from Tspan12 and Fzd4 tagged with split GFP components provides evidence for Fzd4/Tspan12 complex formation. As a negative control, e.g., Fzd5, or Tspan11 with split GFP tags (Fzd5/Tspan12 or Fzd4/Tspan11) would clarify if FZD4/Tspan12 heterodimers are an artefact of the split GFP system. 

      The split-GFP system allows us to co-purify receptors that do not normally co-localize (for example, as we have shown with Fzd4 and LRP6 in the absence of ligand (Bruguera et al., 2022)) so we do not mean to claim that it provides evidence for Fzd4/Tspan12 complex formation. In fact, we were unable to co-purify co-expressed Fzd4 and Tspan12 unless they were tethered with the split GFP system, and separately-purified Fzd4 and Tspan12 did not incorporate into nanodiscs together unless they were tethered by split GFP. Based on these experiments, we expect that the purported Fzd4-Tspan12 interaction that others have found by co-IP or co-localization is easily disrupted by detergent, may require a specific lipid, and/or may not be direct.

      To clarify this point, we have noted in the results section that without the split GFP tags, Tspan12 and Fzd4 did not co-purify or co-reconstitute into nanodiscs, and that co-reconstitution was enabled by the split GFP system (page 6, first full paragraph).   

      - Fzd4/Tspan12 heterodimers stabilized by split GFP may be locked into an unfavorable orientation that does not allow for the formation of a composite binding site of FZD4 and Tspan12, this is another caveat for the interpretation that Fzd4/Tspan12 do not form a composite binding site. This is not discussed. 

      While the split GFP does enforce a Fzd4/Tspan12 dimer, the split GFP is removed by protease cleavage during the final step of the purification process, after the dimer is contained in a nanodisc. This should allow Fzd4 and Tspan12 to freely adopt any pose and to diffuse within the confines of the nanodisc lipid bilayer. However, it has been shown that the phospholipid bilayer in small nanodiscs is not as fluid as the physiological plasma membrane, and although we used the slightly larger belt protein (MSP1E3D1, 13 nm diameter nanodiscs), perhaps the receptors are indeed locked in some unfavorable state for this reason. Additionally, the nanodiscs are planar, so if the formation of a composite binding site requires membrane curvature, this would not be recapitulated in our system. We have cited these caveats in the discussion section (page 10, last paragraph).  

      - Mutations that affect the affinity of norrin/fzd4 are not used to further test if Fzd4 and Tspan12 form a composite binding site. Norrin R41E or Fzd4 M105V were previously reported to reduce norrin/frizzled4 interactions and signaling, and both interaction and signaling were restored by Tspan12 (Lai et al. 2017). Whether a Fzd4/Tspan12 heterodimer has increased affinity for Norrin R41E was not tested. Similarly, affinity of FZD4 M105V vs a Fzd4 M105V/Tspan12 heterodimer were not tested. 

      Since the high affinity of Norrin for both Fzd4 and Tspan12 may have obscured any enhancement of Norrin affinity for Fzd4/Tspan12 compared to either receptor alone, we did consider weakening Fzd-Norrin affinity to sensitize this experiment, inspired by the experiments you mention in (Lai et al., 2017). However, we suspected that the slight increase in Norrin affinity for the Fzd4/Tspan12 dimer compared to Fzd4 alone was driven mainly by increased avidity that enhanced binding of low Norrin concentrations, and this avidity effect would likely confound the interpretation of any experiment monitoring 2:2 complex formation. Additionally, on the basis that soluble Fzd4 extracellular domain and Tspan12 bind Norrin with negative cooperativity (Figures 3D and 3F), we concluded that this composite binding site was unlikely.

      - An important conclusion of the study is that Tspan12 or Lrp6 binding to Norrin is mutually exclusive. This could be corroborated by an experiment in which LRP5/6 is inserted into nanodiscs for BLI binding tests with Norrin, or Tspan12 LEL, or a combination of both. Soluble LRP6 may remove norrin from equilibrium binding/unbinding to Tspan12, therefore presenting LRP6 in a non-soluble form may yield different results. 

      We agree that testing this conclusion in an orthogonal experiment would be a valuable addition to this study. We have now performed a similar experiment to the one you described, but with Norrin immobilized on biosensors, and with LRP6 in detergent competing with Tspan12 LEL for Norrin binding (Figure S12, discussed on page 8, first full paragraph). The results of this experiment show that biosensor-immobilized Norrin will bind LRP6, and that soluble Tspan12 inhibits LRP6 binding in a concentration-dependent manner. The LRP6 construct we use (residues 20-1439) includes the transmembrane domain but has a truncated C terminus, since LRP6 constructs containing the full C terminus tend to aggregate during purification. We chose to immobilize Norrin to make the experiment as interpretable as possible, since immobilizing LRP6 and competing Norrin off with the LEL could result in an increase in signal (from the LEL binding the second available Norrin protomer) as well as a decrease (from Norrin being competed off of the immobilized LRP6). We conducted the experiment in detergent (DDM) instead of nanodiscs to be able to test higher concentrations of LRP6.

      - The authors use LRP6 instead of LRP5 for their experiments. Tspan12 is less effective in increasing the Norrin/Fzd4/Lrp6 signaling amplitude compared to Norrin/Fzd4/Lrp5 signaling, and human genetic evidence (FEVR) implicates LRP5, not LRP6, in Norrin/Frizzled4 signaling. The authors find that Norrin binding to LRP6 and Tspan12 is mutually exclusive, however this may not be the case for Lrp5. 

      This is an important point which we have now addressed in the text (page 8, end of first full paragraph). LRP5 is indeed the receptor implicated in FEVR and expressed in the relevant tissues for Tspan12/Norrin signaling. Unfortunately, LRP5 expresses poorly and we are unable to purify sufficient quantities to perform these experiments. However, LRP5 and LRP6 both transduce Tspan12-enhanced Norrin signaling in TOPFLASH assays (as you mention and as shown by (Zhou and Nathans, 2014)), bind Norrin, and are highly similar (they share 71% sequence identity overall and 73% sequence identity in the extracellular domain), so we expect their Norrin-binding sites to be conserved.

      - The biochemical data are largely not correlated with functional data. The authors suggest that the Norrin R115L FEVR mutation could be due to reduced norrin binding to tspan12, but do not test if Tspan12-mediated enhancement of the norrin signaling amplitude is reduced by the R115L mutation. Similarly, the impressive restoration of binding by charge reversal mutations in site 3 is not corroborated in signaling assays. 

      We agree that testing the impact of Norrin mutations in cell-based signaling assays would be an informative way to further test our model. However, the Norrin mutants we tested generated poor TopFlash signals in all conditions tested. This may be due to general protein instability, weakened affinity for LRP, or weaker interactions with HSPGs. Whatever the cause, the low signal made it challenging to conclusively say whether the Norrin mutations affected Tspan12mediated signaling enhancement.

      When expressed for purification, Tspan12 mutants generally expressed poorly compared to WT Tspan12, so we were concerned that differences in protein stability or trafficking would lead to lower cell-surface levels of mutant Tspan12 relative to WT in TopFlash signaling assays, which would confound interpretation of mutant Tspan’s ability to enhance Norrin signaling.

      Because of these challenges, follow-up experiments to investigate the signaling capabilities of Norrin and Tspan12 mutants were not informative and we have not included them in the revised manuscript.

      Reviewer #3 (Public Review): 

      Brugeuera et al present an impressive series of biochemical experiments that address the question of how Tspan12 acts to promote signaling by Norrin, a highly divergent TGF-beta family member that serves as a ligand for Fzd4 and Lrp5/6 to promote canonical Wnt signaling during CNS (and especially retinal) vascular development. The present study is distinguished from those of the past 15 years by its quantitative precision and its high-quality analyses of concentration dependencies, its use of well-characterized nano-disc-incorporated membrane proteins and various soluble binding partners, and its use of structure prediction (by AlphaFold) to guide experiments. The authors start by measuring the binding affinity of Norrin to Tspan12 in nanodiscs (~10 nM), and they then model this interaction with AlphaFold and test the predicted interface with various charge and size swap mutations. The test suggests that the prediction is approximately correct, but in one region (site 1) the experimental data do not support the model. [As noted by the authors, a failure of swap mutations to support a docking model is open to various interpretations. As AlphFold docking predictions come increasingly into common use, the compendium of mutational tests and their interpretations will become an important object of study.] Next, the authors show that Tspan12 and Fzd4 can simultaneously bind Norrin, with modest negative cooperativity, and that together they enhance Norrin capture by cells expressing both Tspan12 and Fzd4 compared to Fzd4 alone, an effect that is most pronounced at low Norrin concentration. Similarly, at low Norrin concentration (~1 nM), signaling is substantially enhanced by Tspan12. By contrast, the authors show that LRP6 competes with Tspan12 for Norrin binding, implying a hand-off of Norrin from a Tspan12+Fzd4+Norrin complex to a LRP5/6+Fzd4+Norrin complex. Thanks to the authors' careful dose-response analyses, they observed that Norrin-induced signaling and Tspan12 enhancement of signaling both have bell-shaped dose-response curves, with strong inhibition at higher levels of Norrin or Tspan12. The implication is that the signaling system has been built for optimal detection of low concentrations of Norrin (most likely the situation in vivo), and that excess Tspan12 can titrate Norrin at the expense of LRP5/6 binding (i.e., reduction in the formation of the LRP5/6+Fzd4+Norrin signaling complex). In the view of this reviewer, the present work represents a foundational advance in understanding Norrin signaling and the role of Tspan12. It will also serve as an important point of comparison for thinking about signaling complexes in other ligand-receptor systems. 

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors):   

      - In Figure 5F high concentrations of transfected Tspan12 plasmid inhibit signaling, which the authors interpret to support the model that Tspan12/Norrin binding prevents Norrin/LRP6/FZD4 complex formation. Alternatively, the cells do not tolerate the expression of the tetraspanin at high levels, for example, due to misfolding and aggregate formation. To distinguish these possibilities: Do high levels of Tspan12 overexpression also inhibit signaling induced by Wnt3a and appropriate Frizzled receptors, even though Tspan12 has no influence on Wnt/LRP6 binding? 

      We thank the reviewer for suggesting this important control experiment. We have added the Wnt-simulated TOPFLASH values to the figure in 5F for all conditions. In repeating this experiment, we noticed that high levels of transfected Tspan12 may decrease cell viability and therefore have adjusted the range of transfected Tspan12 in the new Figure 5F (discussed on page 8, second full paragraph). Under this new protocol, both Norrin- and Wnt-stimulated signaling were inhibited by the highest amount of transfected Tspan12. However, Norrinstimulated signaling is inhibited by lower amounts of transfected Tspan12 than Wnt-stimulated signaling, and to a greater extent, supporting our proposed model that Tspan12 competes with LRP for Norrin binding.

      - Is Tspan12 with c-terminal rho-tag (the form incorporated into nanodiscs) also used for functional luciferase assays, or was untagged Tspan12 used for the luciferase assays in Fig 4D and 5F? Does the c-terminal tag interfere with Tspan12-mediated enhancement of Norrin signaling? 

      For the luciferase assays included in this manuscript, wildtype, full-length, untagged Tspan12 is used. We have clarified this in our methods section. When we tested the wildtype vs Cterminally rho1D4-tagged version of Tspan12 in TOPFLASH assays, we saw that the enhancement of Norrin signaling by Tspan12-1D4 was weaker than enhancement by untagged Tspan12. This is consistent with the finding reported in Cell Reports (Lai et al., 2017) that a chimeric Tspan12 receptor with its C-terminus replaced with that of Tspan11 was still capable of enhancing Norrin signaling, though to a lesser extent than WT Tspan12. The deficiency of signaling by our rho1D4-tagged Tspan12 could be due to a difference in receptor expression level or trafficking, but in the absence of a reliable antibody against Tspan12, we were unable to assess the expression levels or localization of the untagged Tspan12 to compare it to the rho1D4-tagged version. (For binding experiments, we reasoned that the C-terminal tag should not affect Tspan12’s ability to bind Norrin extracellularly, especially as we found that purified fulllength Tspan12 and Tspan12∆C (residues 1-252) bound Norrin equally well; we have added this comparison to table S1.)  

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments. 

      Based on the Fzd4-Dvl binding experiment, the authors might state explicitly the possibility that Tspan12's relevance is entirely accounted for by extracellular ligand capture. 

      We have stated this possibility explicitly in the discussion section (page 9, last paragraph). 

      Page 4, 3rd paragraph. I suggest "To experimentally test this structural prediction..." rather than "validate". 

      Thank you for this suggestion; we have replaced this wording. 

      This next item is optional, but I hope that the authors will consider it. This manuscript provides an opportunity for the authors to be more expansive in their thinking, and to put their work into the larger context of ligand+receptor+accessory protein interactions. The authors describe the Wnt7a/7b-Gpr124-RECK system and the role of HSPs in Norrin and Wnt signaling, but perhaps they can also comment on non-Wnt ligand-receptor systems where accessory proteins are found. They might add a figure (or supplemental figure) with a schematic showing the roles of HSP and Gpr124-RECK, and some non-Wnt ligand-receptor systems. This would help to make the present work more widely influential.

      Thank you for this suggestion. We have added a figure (Figure 6, discussed on page 10, paragraphs 2 and 3) and expanded our discussion to include other co-receptor systems. We have specifically focused on co-receptors that both capture ligands and interact with their primary receptor(s), thus delivering ligands to their receptors, as we have proposed for Tspan12. Within Wnt signaling, other co-receptor systems with this mechanism are RECK/Gpr124 (for Wnt7a/b) and Glypican-3. We found it interesting that this mechanism is also shared by several growth factor pathways with cystine knot ligands (like Norrin), so we have illustrated and mentioned three of these examples.

    1. eLife Assessment

      This important study provides insights and strategies for assessing laminar structure in vivo in the visual cortex of the macaque monkey with high-density linear electrode arrays. The paper provides convincing evidence demonstrating that signals in higher frequency bands, related to the discharge of action potentials, are of substantially better use for achieving well-resolved cortical layer identification than are signals in lower frequency bands typically associated with local field potentials and standard-practice Current Source Density (CSD) analyses. These findings are of interest to a wide range of neuroscientists making comparisons between cortical layers or recording with array electrodes.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

    3. Reviewer #2 (Public review):

      Summary:

      This paper documents a compelling attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors therefore turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity, which taken together can advance the state-of-the-art accuracy in making layer assignments from in vivo recordings.

      Strengths:

      There is a lot of nice data to look at in this paper that show interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Considerable space is taken in pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Perhaps more emphasis could be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal, but this paper certainly makes a substantial push in this direction.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in vivo can be compared to anatomical layering. However, histological methods also suffer from distortion and noise, thus it remains to be seen how much can ultimately be gained by integrating histology with the physiological methods explored here.

      Overall

      Overall, this paper makes a compelling argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. The rich presentation of data, combined with the authors' highly educated interpretation and speculation about how useful such measurements will be for layer assignment make this an important paper for many labs using high-density electrodes. It is easy to agree with much of what is postulated here and to hope that we will soon have reliable, quantitative methods to make layer assignments that will be meaningful in terms of the differentiated roles of single neurons in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades will be interesting to see.

      Comments on revisions:

      I found that the authors addressed my main concerns to the degree they were able. They improved the consistency of language and figures, and they added some useful quantification.

    4. Reviewer #3 (Public review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, at a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries, and the strategy serves as a substantial advancement over consideration of CSD signals alone to match electrophysiological recordings with cortical layers.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The suggested analyses can be used to reliably identify certain landmarks (the positions of layer 4c and the layer 6-white matter transition), which provide very useful constraints for specifying the remaining laminar boundaries, and consideration of average anatomical patterns makes it unlikely that the remaining laminar boundaries will be far from their true locations. Overall, the widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, in particular the unit density distribution, are likely to be sensitive to the methods and criteria used for spike sorting, which differ among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community.<br /> More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct comparison with histologically identified boundaries along each penetration's trajectory. Ultimately, the absence of this type of independent confirmation limits the strength of the claim that veridical laminar boundaries can be precisely identified from electrophysiological signals alone.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

      While we agree that it would be helpful to adopt a more direct method for linking laminar changes observed with electrophysiology to anatomical layers observed in postmortem histology, we do not believe that the approach suggested by the reviewer would be particularly helpful. The approach suggested involves making lesions, which are known to be quite variable in size, asymmetric in shape, and do not have a predictable geometry relative to the location of the electrode tip. In contrast, our electrophysiology measures have identified clear boundaries which precisely match the known widths and relative positions of all the layers of V1, including layer 4A, which is only 50 microns thick, much smaller than the resolution of lesion methods.

      Reviewer #2 (Public Review):

      Summary:

      This paper documents an attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors thus turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity.

      Strengths:

      There is a lot of nice data to look at in this paper that shows interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electrophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Much effort is spent pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Rather than showing examples of what is already accepted, the emphasis should be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal. Ultimately, a more quantitative approach to the question of consistency is needed to assess the value of the methods proposed here.

      We thank the reviewer for suggesting the addition of quantitative metrics to allow more substantive comparisons between various measures within and between penetrations. We have added quantification and describe this in the context of more specific comments made by this reviewer. We have retained descriptions of metrics that are well established because they provide an important validation of our approaches and laminar assignments.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in this paper could be compared to anatomical layering.

      We are not aware of any approach that would provide such information at sufficient resolution. For example, it is well known that electrolytic lesions often do not match to the locations expected from electrophysiological changes observed with single electrodes. As noted above, our observation that the laminar changes in electrophysiology precisely match the known widths and relative positions of all the layers of V1, including layer 4A, provides confidence in our laminar assignments.

      On line 162, the text states that there is a clear lack of consistency across penetrations, but why should there be consistency: how far apart in the cortex were the penetrations? How long were the electrodes allowed to settle before recording, how much damage was done to tissue during insertion? Do you have data taken over time - how consistent is the pattern across several hours, and how long was the time between the collection of the penetrations shown here?

      Answers to most of these questions can be found within the manuscript text. We have added text describing distance between electrode penetrations (at least 1mm, typically far more) and added a figure which shows a map of the penetration locations. The Methods section describes electrode penetration methods to minimize damage and settling times of penetrations. Data are provided regarding changes in recordings over time (see Methods, Drift Correction). The stimuli used to generate the data described are presented within a total of 30 minutes or less, minimizing any changes that might occur due to electrode drift. There is a minimum of 3 hours between different penetrations from the same animal.

      The impact of the paper is lessened because it emphasizes consistency but not in a consistent manner. Some demonstrations of consistency are shown for CSDs, but not quantified. Figure 4A is used to make a point about consistency in cell density, but across animals, whereas the previous text was pointing out inconsistency across penetrations. What if you took a 40 or 60 um column of tissue and computed cell density, then you would be comparing consistency across potentially similar scales. Overall, it is not clear how all of these different metrics compare quantitatively to each other in terms of consistency.

      As noted above, we have now added quantitative comparisons of consistency between different metrics. It is unclear why the reviewer felt that we use Figure 4A to describe consistency. That figure was a photograph from a previous publication simply showing the known differences in neuron density that are used to define layers in anatomical studies. This was intended to introduce the reader to known laminar differences. At any rate, we have been unable to contact the previous publishers of that work to obtain permission to use the figure. So we have removed that figure as it is unnecessary to illustrate the known differences in cell density that are used to define layers. We have kept the citation so that interested readers can refer to the publication.

      In many places, the text makes assertions that A is a consistent indicator of B, but then there appear to be clear counterexamples in the data shown in the figures. There is some sense that the reasoning is relying too much on examples, and not enough on statistical quantities.

      Without reference to specific examples we are not able to address this point.

      Overall

      Overall, this paper makes a solid argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. It is nice to look at the data in this paper and to read the authors' highly educated interpretation and speculation about how useful such measurements will be in general to make layer assignments. It is easy to agree with much of what they say, and to hope that in the future there will be reliable, quantitative methods to make meaningful segmentations of neurons in terms of their differentiated roles in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades now remains unclear.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of single- and multi-units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, and in particular the unit density distribution, are likely to be sensitive to the criteria used for spike sorting, which differ widely among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community. The analysis of detected unit density yields fluctuations across cortical depth which the authors attribute to variations in neural density across layers; however, these patterns seemed particularly variable across penetrations and did not consistently yield peaks at depths that should have high neuronal density, such as layer 2. Therefore, this measure has limited interpretability.

      While we agree that our electrophysiological measure of unit density does not strictly reflect anatomical neuronal density, we would like to remind the reader that we use this measure only to roughly estimate the correspondence between changes in density and likely layer assignments. We rely on other measures (e.g. AP power, AP power changes in response to visual stimuli) that have sharp borders and more clear transitions to assign laminar boundaries. Further, as noted in the reviewer’s list of strengths, the laminar assignments made with these measures are cross validated by differences in response latencies and sensitivity to different types of stimuli that are observed at different electrode depths.

      More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct per-penetration comparison with histologically identified boundaries. Ultimately, the absence of this type of independent confirmation limits the strength of their claim that veridical laminar boundaries can be identified from electrophysiological signals alone.

      As we have noted in response to similar comments from other reviewers, we are not aware of a method that would make this possible with sufficient resolution.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers have indicated that their assessment would potentially be stronger if their advice for quantitative, statistically validated comparisons was followed, for example, to demonstrate variability or consistency of certain measures that are currently only asserted. Also, if available, some histological confirmation would be beneficial. It was requested that the use and modification of the layering from Balaram & Kaas is addressed, as well as dealing with inconsistencies in the scale bars on those figures. There are two figure permission issues that need to be resolved prior to publication: Balaram & Kaas 2014 in Fig 1A, Kelly & Hawken 2017 in Fig. 4A.

      Please see detailed responses to reviewer comments below. We have added new supplemental figures to quantitatively compare variability among metrics. As noted above, the suggested addition of data linking the electrophysiology directly to anatomical observations of laminar borders from the same electrode penetration is not feasible. The figure reused in Figure 1A is from open-access (CC BY) publication (Balaram & Kaas 2014). After reexamining the figure in the original study, we found that the inferred scale bar would give an obviously inaccurate result. So, we decided to remove the scale bar in Figure 1A. We haven’t received any reply from Springer Nature for Figure 4A permission, so we decided to remove the reused figure from our article (Kelly & Hawken 2017).

      Reviewer #1 (Recommendations For The Authors):<br /> Figure 4A has a different scale to Figure 4B-4F. It is better to add dashed lines to indicate the relationship between the cortical layers or overall range from Figure 4A to the corresponding layers in 4B to 4F.

      The reused figure in Figure 4A is removed due to permission issue. See also comments above.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      This paper demonstrates that voltage signals in frequency bands higher than those used for LFP/CSD analysis can be used from high-density electrical contact recording to generate a map of cortical layering in macaque V1 at a higher spatial resolution than previously attained.

      My main concern is that much of this paper seems to show that properties of voltage signals recorded by electrodes change with depth in V1. This of course is well known and has been mapped by many who have advanced a single electrode micron-by-micron through the cortex, listening and recording as they go. Figure 4 shows that spike shapes can give a clear indication of GM to WM borders, and this is certainly true and well known. Figures 5 and 6 show that activity level on electrodes can indicate layers related to LGN input, and this is known. Figure 7 shows that latencies vary with layer, and this is certainly true as we know. A main point seems to be that CSD is highly inconsistent. This is important to know if CSD is simply never going to be a good measure for layering in V1, but it would require quantification and statistics to make a fair comparison.

      We are glad to see that the reviewer understands that changes in electrical signals across layers are well known and are expected to have particular traits that change across layers. We do not claim that have discovered anything that is unexpected or unknown. Instead, we introduce quantitative measures that are sensitive to these known differences (historically, often just heard with an audio monitor e.g. “LGN axon hash”). While the primary aim of this paper is not to show that Neuropixels probes can record some voltage signal properties that cannot be recorded with a single electrode before, we would like to point out that multi-electrode arrays have a very different sampling bias and also allow comparisons of simultaneous recordings across contacts with known fixed distances between them. For example our measure of “unit spread” could not be estimated with a single electrode.

      We’ve added Figure S3 to show quantitative comparison of variation between CSD and AP metrics. These figures add support to our prior, more anecdotal descriptions showing that CSDs are inconsistent and lack the resolution needed to identify thin layers.

      Some things are not explained very clearly. Like achromatic regions, and eye dominance - these are not quantified, and we don't know if they are mutually consistent - are achromatic/chromatic the same when tested through separate eyes? How consistent are these basic definitions? How definitive are they?

      The quantitative definitions of achromatic region/COFD and eye dominance column can be found in our previous paper (Li et al., 2022) cited in this article. The main theme of this study is to develop a strategy for accurately identifying layers, the more detailed functional analysis will be described in future publications.

      Specific comments

      The abstract refers to CSD analysis and CSD signals. Can you be more precise - do you aim to say that LFP signals in certain frequency bands are already known to lack spatial localization, or are you claiming to be showing that LFP signals lack spatial resolution? A major point of the results appears to be lack of consistency of CSD, but I do not see that in the Abstract. The first sentence in the abstract appears to be questionable based on the results shown here for V1.

      We have updated the Abstract to minimize confusion and misunderstanding.

      Scale bar on Fig 1A implies that layers 2-5 are nearly 3 mm thick. Can you explain this thickness? Other figures here suggest layers 1-6 is less than 2 mm thick. Note, in a paper by the same authors (Balaram et al) the scale bar (100 um, Figure 4) on similar macaque tissue suggests that the cortex is much thinner than this. Perhaps neither is correct, but you should attempt to determine an approximately accurate scale. The text defines granular as Layer 4, but the scale bar in A implies layer 4 is 1 mm thick, but this does not match the ~0.5 mm thickness consistent with Figure 1E, F. The text states that L4A is less then 100 um thick, but the markings and scale bar in Figure 1A suggests that it could be more than 100 um thick.

      We thank the reviewer for pointing out that there are clearly errors in the scale bars used in these previously published figures from another group. In the original figure 1(Balaram & Kaas 2014), histological slices were all scaled to one of the samples (Chimpanzee) without scale bar. After reexamining the scale bar we derived based on figure 2 of the original study, we found the same problem. Since relative widths of layers are more important than absolute widths in our study, we decided to remove the scale bar that we had derived and added to the Figure 1A.

      Line 157. Fix "The most commonly visual stimulus"

      Text has been changed

      Line 161. Fix "through dominate eye"

      Text has been changed

      Line 166. Please specify if the methods established and validated below are histological, or tell something about their nature here.

      The Abstract and Introduction already described the nature of our methods

      Line 184. Text is mixing 'dominant' and 'dominate', the former is better.

      Text has been changed accordingly

      Line 188. Can you clarify "beyond the time before a new stimulus transition". Are you generally referring to the fact that neuronal responses outlast the time between changes in the stimulus?

      That is correct. We are referring to the fact that neuronal responses outlast the time between changes in the stimulus. We have edited the text for clarity.

      Line 196. Fix "dominate eye" in two places.

      Text has been changed

      Line 196. The text seems to imply it is striking to find different response patterns for the two eyes, but given the OD columns, why should this be surprising?

      Since we didn’t find systematic comparison for CSD depth profiles of dominant/non-dominant eyes, or black/white in the past studies, we just describe what we saw in our data. The rational for testing each eye is that it is known that LGN projections from two eyes remain separated in direct input layer of V1, so comparing CSDs from two eyes could potentially help identifying input layers, such as L4C. Here we provide evidence showing that CSD profiles from two eyes deviate from naive expectations. For example, CSDs from black stimulus show less variation between two eyes, whereas CSDs from white stimulus could range from similar profile to drastically different ones across eyes.

      Line 198. Text like, "The most consistent..." is stating overall conclusions drawn by the authors before pointing the reader specifically to the evidence or the quantification that supports the statement.

      We’ve adjusted the text pointing to Figure S2, where depth profiles of all penetrations are visualized, and a newly added Figure S3, where the coefficients of variation for several metric profiles were shown.

      Line 200. "white stimulus is more variable" - the text does not tell us where/how this is supported with quantitative analysis/statistics.

      We’ve adjusted the text pointing to Figure S2, S3

      The metric in 4B is not explained, the text mentions the plot but the reader is unable to make any judgement without knowledge of the method, nor any estimate of error bars.

      The figure is first mentioned in section: Unit Density, and text in this section already described the definition of neuron density and unit density.  We’ve also modified the text pointing to the method section for details.

      Line 236. The text states the peak corresponds to L4C, but does not explain how the layer lines were determined.

      As described early in the CSD section, all layer boundaries are determined following the guide which layouts the strategy for how to draw borders by combining all metrics.

      At Line 296 the spike metrics section ends without providing a clear quantification of how useful the metrics will be. It is clear that the GM to WM boundary can be identified, but that can be found with single electrodes as well, as neurophysiologists get to see/hear the change in waveform as the electrode is advanced in even finer spatial increments than the 20 um spacing of the contacts here.

      The aim of this study is to develop an approach for accurately delineating layers simultaneously. The metrics we explored are considered estimation of well-known properties, so they can provide support for the correctness we hope to achieve. Here we first demonstrate the usefulness and later show the average across penetrations (Figure 9C-F). We are less concerned in quantification of how different factors affect precision and consistency of these metrics or how useful a single metric is, but rather, as described in the guide section, whether we can delineate all layers given all metrics.

      Line 302-306. Why this statement is made here is unclear, it interrupts the flow for a reason that perhaps will be explained later.

      This statement notes the insensitivity of this measure to temporal differences, introducing the value of incorporating a measure of how AP powers changes over time in the next section of the manuscript.

      Line 311. What is the reason to speculate about no canceling because of temporal overlap? Are you assuming a very sparse multi unit firing rate such that collisions do not happen?

      Here we describe a simple theoretical model in which spike waveforms only add without cancelling, then the power would be proportional to the number of spikes. In reality, spike waveform sometimes cancels causing the theoretical relationship to deteriorate to some degree.

      Lines 327-346. There is a considerable amount of speculation and arguing based on particular examples and there is a lack of quantification. Neuron density is mentioned, but not firing rate. would responses from fewer neurons with higher firing rate not be similar to more neurons with lower firing rates?

      According to the theoretical model we described, power is proportional to numbers of spikes which then depend on both neuron density and firing rate. So fewer neurons with higher firing rate would generate similar power to more neurons with lower firing rate. We’ve expanded the explanation of the model and added Figure S4 about the depth profile of firing rate. Text has also been adjusted pointing to the Figure S2, S3 about quantitively comparisons of variability.

      Line 348 states there is a precise link between properties and cortical layers, but the manuscript has not, up to this point, shown how that link was determined or quantified it.

      Through our generative model of power and the similarity between depth profile of firing rate and depth profile of neuron density (Figure S4), depth profile of power can be used to approximate depth profile of neuron density which is known to be closely correlated to cortical layering.

      Line 350. What is meant by "stochastic variability"?

      The text essentially says distances from electrode contact to nearby cell bodies were random, so closer cells have higher spike amplitudes and in turn result in higher power on a channel.

      The figures showing the two metrics, Pf and Cf, should be shown for the same data sets. The markings indicate that Fig 5 and Fig 6 show results from non-overlapping data sets. This does not build confidence about the results in the paper.

      Here we use typical profiles to demonstrate the characteristics of the power spectrum/coherence spectrum because of the variation across penetrations. We show later, in the guide section, all metrics for one penetration (another two cases in supplemental figures) and how to combine all metrics to derive layer delineations.

      Line 375 the statement is somewhat vague, "there are nevertheless sometimes cases where they can resolve uncertainties," can you please provide some quantitative support?

      We provided 3 examples in Figure 6, and more examples are shown in Figure 8, Figure S5, S6.

      Line 379. I believe the change you want to describe here is a change associated with a transition in the visual stimulus. It would be good to clarify this in the first several sentences here. Baseline can mean different things. I got the impression that your stimuli flip between states at a rate fast enough that signals do not really have time to return to a baseline.

      We rephrased the sentence to describe the metric more precisely. A pair of uniform colors flipping in 1.5 second intervals is usually long enough for spiking activities to decay to a saturated level.

      This section (379 - 398) continues a qualitative show-and-tell feel. There appears to be a lot of variability across the examples in Figure 7. How could you try to quantify this variability versus the variability in LFP? And, in this section overall, the text and figure legend don't really describe what the baseline is.

      Text adjustments are made to briefly describe the baseline window and point to the Method section where definitions are described in detail. We’ve added Figure S3 together with Figure S2 to address the variability across penetrations, stimuli, and metrics.

      Line 405 - 415. The discussion here does not consider that layers may not have well defined boundaries, the text gives the impression that there is some ultimate ground truth to which the metrics are being compared, but that may not be accurate.

      Except for a few layers/sublayers, such as L2, L3A, L3B, most layer boundaries of neocortex are well defined (Figure 1A) and histological staining of neurons/density and correlated changes in chemical content show very sharp transitions. The best of these staining methods is cytochrome oxidase, which shows sharp borders at the top and bottom of layer 4A, top and bottom of layer 4C, and the layer 5/6 border. There is also a sharp transition in neuronal cell body size and density at the top and bottom of layer 4Cb. The definition and delineation of all possible layers are constantly being refined, especially by accumulated knowledge of genetic markers of different cell types and connection patterns. In our study, we develop metrics to estimate well known anatomical and functional properties of different layers. We have also discussed layer boundaries that have been ambiguous to date and explained the reason and criteria to resolve them.

      Line 423. The text references Figure 1A in stating that relative thickness and position is crucial, but FIgure 1A does not provide that information and does not explain how it might be determined, or how much of a consensus there is. Also, the text does not consider that the electrode may go through the cortex at oblique angles, and not the same angle in each layer, and the relative thickness may not be a dependable reference.

      There are numerous studies that describe criteria to delineate cortical layers, the referenced article (Balaram & Kaas 2014) is used here as an example. We are not aware of any publication that has systematically compared the relative thickness of layers across the V1 surface of a given animal or across animals. Nevertheless, it is clear from the literature that there is considerable similarity across animals. Accordingly, we cannot know what the source of variability in overall cortical thickness in our samples is, but we do see considerable consistency in the relative thickness of the layers we infer from our measures. We illustrate the differences that we see across penetrations and consider likely causes, such as the extent to which the coverslip pressing down on the cortex might differentially compress the cortex at different locations within the chamber.

      The angle deviation of probe from surface will not change the relative thickness of layers, and the rigid linear probe is unlikely to bend in the cortex.

      Line 433. The term "Coherence" is used, clarify is this is you Cf from Figure 6. The text states, "marked decrease at the bottom of layer 6". Please clarify this, I do not see that in Figure 6.

      Text has been adjusted.

      In Figure 6, the locations of the lines between L1 and 2 do not seem to be consistent with respect to the subtle changes in light blue shading, across all three examples, yet the text on line 436 states that there is a clear transition.

      We feel that the language used accurately reflects what is shown in the figure. While the transition is not sharp, it is clear that there is a transition. This transition is not used to define this laminar border. We have edited the text to clarify that the L1/2 border is better defined based on the change in AP power which shows a sharp transition (Figure 7). 

      The text states that the boundary is also "always clear" from metrics... and sites Figure 5, but I do not see that this boundary is clear for all three examples in Figure 5.

      Text has been adjusted.

      Line 438. The text states that "it is not unusual for unit density to fall to zero below the L1/2 border (Figure 8E)", but surprisingly, the line in Figure 8 E does not even cover the indicated boundary between L1 and L2.

      At this point, the number of statements in the text that do not clearly and precisely correlate to the data in the figures is worrisome, and I think you could lose the confidence of readers at this point.

      We do not see any inconstancy between what is stated in our text and what is noted by the reviewer. The termination of the blue line corresponds to the location where no units are detected. This is the location where “unit density falls to zero”.  In this example, no units resolved through spike sorting until ~100mm beneath the L1/L2 boundary, which is exactly zero unity density (Figure 8E). That there are electrical signals in this region is clear from the AP power change (Figure 8C) which also shows the location of the L1/L2 border.

      Line 448. Text states that the 6A/B border is defined by a sharp boundary in AP power, but Figure 8A "AP power spectrum" does not show a sharp change at the A/B line. There is a peak in this metric in the middle to upper middle of 6A, but nothing so sharp to define a boundary between distinct layers, at least for penetration A2.

      Text has been adjusted.

      In Figure 8, the layer labels are not clear, whereas they are reasonably clear in the other figures.

      This is a technical problem regarding vector graphics that were not properly converted in PDF generation. We will upload each high-quality vector graphics when we finalize the version of record.

      The text emphasizes differences in L4B and L4C with respect to average power and coherence, but the transition seems a bit gradual from layer 3B to 4C in some examples in Figure 6. And in Figure 5, A3, there doesn't appear to be any particular transition along the line between 4B and 4C.

      In this guide section, we pointed out early that some metrics are good for some boundaries and variation exists between penetrations. We’ve expanded text emphasizing the importance of timing differences in DP/P for differentiating sublayers in L4. Lastly, in case of several unresolvable boundaries given all the metrics, the prior knowledge of relative thickness should be used.

      Line 466 provides prescriptions in absolute linear distances, but this is unwise given that cortex may be crossed at oblique angles by electrodes, particularly for parts of V1 that are not on the surface of the brain. Other parts of the text have emphasized relative measurements.

      Text has been changed using relative measurements.

      Line 507. The text says 9C and 4A are a good match, but the match does not look that good (4A has substantial dips at 0.5 and 0.75, and substantial peaks), and there is no quantification of fit. The error bars on 9C do not help show the variability across penetrations, they appear to be SEM, which shows that error bars get smaller as you average more data. It would seem more important to understand what is the variance in the density from one penetration to the next compared to the variance in density across layers.

      We have replaced “good match” with “roughly corresponds”. We note that we do not use unit density as a metric for identification of laminar borders and instead show that the expected locations of layers with higher neuronal density correspond to the locations where there are similar changes in unit density. It should be noted that Figure 9C is an average across many penetrations so should not be expected to show transitions that are as sharp in individual penetrations. Because of the figure permission issue, we have removed Figure 4A, and changed the text accordingly.

      Figure 9C-F show a lot of variability in the individual curves (dim gray lines) compared to the overall average. Does this show that these metrics are not reliable indicators at the level of single penetration, but show some trends across larger averages?

      In the beginning of the guide, we emphasized that all metrics should be combined for individual penetration, because some metrics are only reliable for delineating certain layer boundaries and the quality of data for the various measures varies between penetrations. The penetration average serves the same purpose explained in the previous question as an indicator that our layer delineation was not far off.

      The discussion mentions improvements in layer identification made here. Did this work check the assignments for these penetration against assignments made based on some form of ground truth? Previous methods would advance electrodes steadily, and make lesions, and carry out histology. Is there any way to tell how this method would compare to that?

      Even electrolytic lesions do not necessarily reveal ground truth and can be quite misleading. And their resolution is limited by lesion size. Lesions are typically variable in size, asymmetric and have variable shape and position relative to the location of the electrode tip, likely affected by the quality and location of electrical grounding and variations in current flow due to locations of blood vessels. A review of the published literature with electrode lesions shows that electrophysiological transitions are likely a far more accurate indicator of recording locations than post-mortem histology from electrolytic lesions. It is extremely rare for the locations of lesions to be precisely aligned to expected laminar transitions. See for example Chatterjee et al (Nature 2004). Also see several manuscripts from the Shapley lab. The lone rare exception of which we are aware is Blasdel and Fitzpatrick1984 in which consistently small and round lesions were produced and even these would be too large (~100 microns) to accurately identify layers if it were not for the fact that the electrode penetrations were very long and tangential to the cortical layers. 

      Reviewer #3 (Recommendations For The Authors):

      - The authors say (lines 360-362) that "Assuming spikes of a neuron spread to at least two adjacent recording channels, then the coherence between the two channels would be directly proportional to number of spikes, independent of spike amplitude." Has this been demonstrated? Very large amplitude spikes should show up on more channels than small amplitude spikes. Do waveform amplitudes and unit densities from the spike waveform analyses show consistent relationships to the power and/or coherence distributions over depth across penetrations?

      This part of the manuscript is providing a theoretical rational for what might be expected to affect the measures that we have derived. That is why we begin by stating that we are making an assumption. The answers to the reviewer’s questions are not known and have not been demonstrated. By beginning with this theoretical preface, we can point to cases where the data match these expectations as well as other cases where the data differ from the theoretical expectations.

      Coherence, by definition, is a normalized metric that is insensitive to amplitude. Spike amplitude mainly depends on how close the signal source is to electrode, and spike spread mainly depends on cell body size and shape given the same distance to electrode. Therefore, a very large spike amplitude could stem from a very close small cell to electrode, but would result in a small spike spread, especially axonal spikes (Figure 4B, red spike). Spike amplitudes on average are higher in L4C which matches the expectation that higher cell density would result, on average, closer cell body to electrode (Figure S4A). Nonetheless, the high-density small cell bodies in L4C result in a small spike spread (Figure 9D).

      - I suggest clarifying what is defined as the baseline window for the ΔP/P measure - is it the entire 10-150 ms response window used for the power spectrum analysis?

      Text adjustments are made in the Methods where the time windows are defined at the beginning of the CSD section. Only temporal change metrics (ΔCSD and ΔP/P) use the baseline window ([-40, 10]ms). The other two spectrum metrics (Power and Coherence) use the response window ([10, 150]ms).

      - Firing rate differs by cell type and, on average, differs by layer in V1. Many layer 2/3 neurons, for example, have low maximum firing rates when driven with optimized achromatic grating stimuli. To the extent that the generative models explaining the sources of power and coherence signals rely on the assumption that firing rates are matched across cortical depth, these models may be inaccurate. This assumption is declared only subtly, and late in the paper, but it is relevant to earlier claims.

      Text adjustments are made to explicitly describe the possibility that uneven depth profile of firing rate could counteract the depth profile of neuron density, resulting distorted or even a flat depth profile of power/coherence that deviates far from the depth profile of neuron density. In a newly added Figure S4, we first show the average firing rate profile during a set of stimuli (uniform color, static/drifting, achromatic/chromatic gratings), then specifically the PSTHs of the same stimuli shown in this study. It can be seen that layers receiving direct LGN inputs tend to fire at a higher rate (L4C, L6A). Firing rates in the PSTHs either roughly match across layers or are much higher in the densely packed layers. Therefore, the depth profile of firing rate contributes to rather than counteracting that of neuron density, enhancing the utility of the power/coherence profile for identification of correct layer boundaries.

      - Given the acute preparation used for recordings, I wonder whether tissue is available for histological evaluation. Although the layers identified are generally appropriate in relative size, it would be particularly compelling if the authors could demonstrate that the fraction of the cortical thickness occupied by each layer corresponded to the proportion occupied by that layer along the probe trajectory in histological sections. This would lend strength to the claim that these analyses can be used to identify layers in the absence of histology. Furthermore, variations in apparent cortical thickness could arise from different degrees of deviation from surface normal approach angles, which might be apparent by evaluation of histological material. I would add that variation in thickness on the scale shown in Fig. S4 is more likely to have an explanation of this kind.

      To serve other purposes unrelated to this study (identification of CO blobs), we cut the postmortem tissue in horizontal slices, so the histological comparison suggested cannot be made. The cortical thickness measured in this study had been affected not only by the angle deviation from the surface normal but also the swelling and compression of cortex. Nevertheless, evaluating the absolute thickness of cortex is not the main purpose of this study.

      Text and figure suggestions:

      - Fig 1A has been modified from Balaram & Kaas (2014) to revert to the Brodmann nomenclature scheme they argue against using in that paper; I wonder if they would object to this modification without explanation. Related, in the main text the authors initially refer to layers using Brodmann's labels with a secondary scheme (Hassler's) in parentheses and later drop the parenthetical labels; these conventions are not described or explained. Readers less familiar with the multiple nomenclature schemes for monkey V1 layers might be confused by the multiple labels without context, and could benefit from a brief description of the convention the authors have adopted.

      Throughout our article, we only used Brodmann’s naming convention because it has historically been adopted for old world monkey which we use in our study, whereas Hassler’s naming convention is more commonly used for new world monkey. Different naming conventions do not change our result, and it is out of scope for our study to discuss which nomenclature is more appropriate.

      - References to "dominate eye" throughout the text and figure legends should be replaced with "dominant eye."

      It has been changed throughout the article.

      - It is a bit odd to duplicate the same example in Fig. 2C and 2E. Perhaps a unique example would be a better use of the space.

      Here we first demonstrate the filtering effect, then compare profiles across different penetrations. The same example bridges the transition allowing side-by-side comparison.

      - The legend for Fig. 3 might be clearer if it simply listed the stimulus transitions for each column left to right, i.e. "black to white (non-dominant eye), white to black (non-dominant eye), black to white (dominant eye), ..."

      We feel that the icons are helpful. Here we want to show the stimulus colors directly to readers.

      - The misalignment between Fig. 4A vs. 4B-F, combined with the very small font size of the layer labels in Fig. 4B-F, make the visual comparison difficult. In Figs. 7 and 8, layer labels (and most labels in general) are much too small and/or low resolution to read easily. Overall, I would recommend increasing font size of labels in figures throughout the paper.

      The reused figure in Figure 4A is removed due to permission issue. Font sizes are adjusted.

      - Line 591 "using of high-density probes" should be "using high-density probes"

      Text has been changed accordingly

    1. eLife Assessment

      This important study provides solid evidence for a non-genomic action of progesterone in Xenopus oocyte activation. The findings demonstrate that two non-genomic progesterone receptors, ABHD2 and mPRb, function as a novel progesterone-stimulated phospholipase A2. The findings will be of broad interest to reproductive endocrinologists and physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Numerous pathways have been proposed to elucidate the nongenomic actions of progesterone within both male and female reproductive tissues. The authors employed the Xenopus oocyte system to investigate the PLA2 activity of ABHD2 and the downstream lipid mediators in conjunction with mPRb and P4, on their significance in meiosis. The research has been conducted extensively and is presented clearly.

      Strengths:

      While the interaction between membranous PR and ABHD2 is not a novel concept, this present study exhibits several strengths:

      (1) mPRbeta, a member of the PAQR family, has been elusive in terms of detailed signal transduction. Through mutation studies involving the Zn binding domain, the authors discovered that the hydrolase activity of mPRbeta is not essential for meiosis and oocyte maturation. Instead, they suggest that ABHD2, acting as a coreceptor of mPRbeta, demonstrates phospholipase activity, indicating that downstream lipid mediators may play a dominant role when stimulated by progesterone.<br /> (2) Extensive exploration of downstream signaling pathways and the identification of several potential meiotic activity-related lipid mediators make this aspect of the study novel and potentially significant.

      Weaknesses:

      However, there are some weaknesses and areas that need further clarification:

      (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.

      (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2F and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a. Unfortunately, even in the revision, authors did not adequately address the last issue (differential measurements of PGD2 and E2, 6-keto-PG!alpha be determined instead of PGI2).

      (3) Although they propose PGs, LPA and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.

      These clarifications and further experiments would enhance the overall impact and comprehensiveness of the study.

      Comments on revisions:

      Need correction and addition for differential analyses of PGD2 and PGE2, and measurement of 6-keto-PGF1alpha instead of PGI2 (Figure 2F). PGI2 is extremely unstable (T1/2, 1 min in neutral buffer) and rapidly converted nonenzymically to 6-keto-PGF1a.

    3. Reviewer #2 (Public review):

      Summary:

      This interesting paper examines the earliest steps in progesterone-induced frog oocyte maturation, an example of non-genomic steroid hormone signaling that has been studied for decades but is still very incompletely understood. In fish and frog oocytes it seems clear that mPR proteins are involved, but exactly how they relay signals is less clear. In human sperm, the lipid hydrolase ABHD2 has been identified as a receptor for progesterone, and so the authors here examine whether ABHD2 might contribute to progesterone-induced oocyte maturation as well. The main results are:

      (1) Knocking down ABHD2 makes oocytes less responsive to progesterone, and ectopically expressing ABHD2.S (but not the shorter ABHD2.L gene product) partially rescues responsiveness. The rescue depends upon the presence of critical residues in the protein's conserved lipid hydrolase domain, but not upon the presence of critical residues in its acyltransferase domain.

      (2) Treatment of oocytes with progesterone causes a decrease in sphingolipid and glycerophospholipid content within 5 min. This is accompanied by an increase in LPA content and arachidonic acid metabolites. These species may contribute to signaling through GPCRs. Perhaps surprisingly, there was no detectable increase in sphingosine-1-phosphate, which might have been expected given the apparent substantial hydrolysis of sphingolipids. The authors speculate that S1P is formed and contributes to signaling but diffuses away.

      (3) Pharmacological inhibitors of lipid-metabolizing enzymes support, for the most part, the inferences from the lipidomics studies, although there are some puzzling findings. The puzzling findings may be due to uncertainty about whether the inhbitors are working as advertised.

      (4) Pharmacological inhibitors of G-protein signaling support a role for G-proteins and GPCRs in this signaling, although again there are some puzzling findings.

      (5) Reticulocyte expression supports the idea that mPRβ and ABHD2 function together to generate a progesterone-regulated PLA2 activity.

      (6) Knocking down or inhibiting ABHD2 inhibited progesterone-induced mPRβ internalization, and knocking down ABHD2 inhibited SNAP25∆20-induced maturation.

      Strengths:<br /> All in all, this could be a very interesting paper and a nice contribution. The data add a lot to our understanding of the process, and, given how ubiquitous mPR and AdipoQ receptor signaling appear to be, something like this may be happening in many other physiological contexts.

      Weaknesses:

      I have several suggestions for how to make the main points more convincing.

      Main criticisms:

      (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype, since they both should sequester the AS oligo? Maybe I'm missing something here.

      In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.

      Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.

      (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.

      [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. And the 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more-informative bar graphs or scatter plots.]

      (3) The reticulocyte lysate co-expression data are quite important, and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here-I'm not a protein expression expert-but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?

      Comments on revisions:

      The authors have satisfied my concerns with their response letter and revisions.

    4. Reviewer #3 (Public review):

      Summary:

      The authors report two P4 receptors, ABHD2 and mPRβ that function as co-receptors to induce PLA2 activity and thus drive meiosis. In their experimental studies, the authors knock down ABHD2 and demonstrated inhibition of oocyte maturation and inactivation of Plk1, MAPK, and MPF, which indicated that ABHD2 is required for P4-induced oocyte maturation. Next, they showed three residues (S207, D345, H376) in the lipase domain that are crucial for ABHD2 P4-mediated oocyte maturation in functional assays. They performed global lipidomics analysis on mPRβ or ABHD2 knockdown oocytes, among which the downregulation of GPL and sphingolipid species were observed and enrichment in LPA was also detected using their metabolomics method. Furthermore, they investigated pharmacological profiles of enzymes predicted to be important for maturation based on their metabolomic analyses and ascertained the central role for PLA2 in inducing oocyte maturation downstream of P4. They showed the modulation of S1P/S1PR3 pathway on oocyte maturation and potential role for or Gαs signaling and potentially Gβγ downstream of P4.

      Strengths:

      The authors make a very interesting finding that ABHD2 has PLA2 catalytic activity but only in the presence of mPRβ and P4. Finally, they provided supporting data for a relationship between ABHD2/PLA2 activity and mPRβ endocytosis and further downstream signaling. Collectively, this research report defines early steps in nongenomic P4 signaling, which is of broad physiological implications.

      Weaknesses:

      There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.

      Comments on revisions:

      In the revised manuscript, the authors have addressed my major concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      “…However, the findings are reliant on high concentrations of inhibitor drugs, and mechanistic details about the molecular interaction and respective functions of ABHD2 and mPRb are incomplete.”

      As discussed below in the response to Reviewers the drug concentrations used span the full dose response of the active range of each drug. In cases where the drug concentrations required to block oocyte maturation where significantly higher than those reported in the literature, we considered those drugs ineffective. In terms of the molecular details of the mechanistic interaction between mPRb and ABHD2, we now provide additional data confirming their molecular interaction to produce PLA2 activity where each protein alone is insufficient. Although these new studies provide more mechanistic insights, there remains details of the ABHD2-mPR interactions that would need to be addressed in future studies which are beyond the scope of the current already extensive study.   

      Public Reviews:

      Reviewer 1

      (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.

      The co-IP experiments presented in Figure 5E argue that the two receptors are constitutively associated at rest before exposure to P4; but at low levels since addition of P4 increases the association between mPRβ and ABHD2 by ~2 folds. Importantly, we know from previous work (Nader et al., 2020) and from imaging experiments in this study that mPR recycles in immature oocytes between the PM and the endosomal compartment. It is not clear at this point within which subcellular compartment the basal association of mPR and ABHD2 occurs. We have tried to elucidate this point but have not been able to generate a functional tagged ABHD2. We generated GFP-tagged ABHD2 at both the N- and C-terminus but these constructs where not functional in terms of their ability to rescue ABHD2 knockdown. This prevented us from testing the association dynamics between ABHD2 and mPR.   

      Regarding whether ABHD2 in the oocyte directly binds P4 or not, we had in the initial submission no data directly supporting this rather we based the cartoon in Fig. 6J on the findings from Miller et al. (Science 2016) who showed that ABHD2 in sperm binds biotinylated P4. With the use of a new expression system to produce ABHD2 in vitro (please see below) we were able to try the experiment suggested by the Reviewer. In vitro expressed ABHD2 was incubated with biotinylated P4, and binding tested on a streptavidin column. Under these conditions we could not detect any specific binding of P4 to ABHD2, however, these experiments remain somewhat preliminary and would require validation using additional approaches to conclusively test whether Xenopus ABHD2 binds P4 or not. The discrepancy with the Miller et al. findings could be species specific as they tested mammalian ABHD2.  

      (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2B and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a.

      We believe the Reviewer meant to indicate discrepancies between Fig. 2E (not 2B) and Supp. Fig. 2C. Indeed, the Reviewer is correct, and this is because Fig. 2E shows pooled normalized data on a per PG species and frog, whereas Supp. Fig. 2E shows and example of absolute raw levels from a single frog to illustrate the relative basal abundance of the different PG species. We had failed to clarify this in the Supp. Fig. 2E figure legend, which we have now added in the revised manuscript. So, the discrepancies are due to variation between different donor animals which is highlighted in Supp. Fig. 2A. Furthermore, to minimize confusion, in the revised manuscript we revised Supp. Fig. 2C to show only PG levels at rest, to illustrate basal levels of the different PG species relative to each other, which is the goal of this supplemental figure. 

      (3) Although they propose PGs, LPA, and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.

      We agree conceptually with the Reviewer that identifying the details of the signaling of the different GPCRs involved in oocyte maturation would be interesting. However, our lipidomic data argue that the activation of a PLA2 early in the maturation process in response to P4 leads to the production of multiple lipid messengers that would activate GPCRs and branch out the signaling pathway to activate various pathways required for the proper and timely progression of oocyte maturation. Preparing the egg for fertilization is complex; so, it is not surprising that a variety of pathways are activated simultaneously to properly initiate both cytoplasmic and nuclear maturation to transition the egg from its meiotic arrest state to be ready to support the rapid growth during early embryogenesis. We focus on the S1P signaling pathway specifically because, as pointed out by the Reviewer, we could not detect an increase in S1P even though our metabolomic data collectively argued for an increase. Our results on the S1P pathway -as well as a plethora of other studies historically in the literature that we allude to in the manuscript- argue that these different GPCRs support and regulate oocyte maturation, but they are not essential for the early maturation signaling pathway. For example, for S1P, as shown in Figure 4, the delay/inhibition of oocyte maturation due to S1PR3 knockdown can be reversed at high levels of P4, which presumably leads to higher levels of other lipid mediators that would bypass the need for signaling through S1PR3. This is reminiscent of the kinase cascade driving oocyte maturation where there is significant redundancy and feedback regulation. Therefore, analyzing each receptor subtype that may regulate the different PG species, LPA, and S1P would be a tedious and time-consuming undertaking that goes beyond the scope of the current manuscript. More importantly based on the above arguments, we suggest that findings from such an analysis, similar to the conclusions from the S1PR3 studies (Fig. 4), would show a modulatory role on oocyte maturation rather than a core requirement for the maturation process as observed with mPR and ABHD2. Thus they would provide relatively little insights into the core signaling pathway driving P4-mediated oocyte maturation.

      Reviewer 2:

      (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype since they both should sequester the AS oligo? Maybe I'm missing something here.

      For the ABHD2 rescue experiment, the ABHD2 constructs (S or L) were expressed 48 hrs before the antisense was injected. The experiment was conducted in this way to avoid the potential confounding issue of both constructs sequestering the antisense. The assumption is that the injected RNA after protein expression would be degraded thus allowing the injected antisense to target endogenous ABHD2. The idea is to confirm that ABHD2.S expression alone is sufficient to rescue the antisense knockdown as confirmed experimentally.

      However, to further confirm the rescue, we performed the experiment in a different chronological order, where we started with injecting the antisense to knock down endogenous ABHD2 and this was followed 24 hrs later by expressing wild type ABHD2.S. As shown in Author response image 1 this also rescues the knockdown.

      Author response image 1.

      ABHD2 knockdown and rescue. Oocytes were injected with control antisense (Ctrl AS) or specific ABHD2 antisense (AS) oligonucleotides and incubated at 18 oC for 24 hours. Oocytes were then injected with mRNA to overexpress ABHD.S for 48 hours and then treated with P4 overnight. The histogram shows % GVBD in naïve, oocytes injected with control or ABHD2 antisense with or without mRNA to overexpress ABHD2.S.

      In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.

      The dose response of ABHD2 protein expression in correlation with rescue of the ABHD2 knockdown is shown indirectly in Figure 1I and 1J. In experiments ABHD2 knockdown was rescued using either the WT protein or two mutants (H120A and N125A). All three constructs rescued ABHD2 KD with equal efficiency (Fig. 1I), eventhough their expression levels varied (Fig. 1J). The WT protein was expressed at significantly higher levels than both mutants, and N125A was expressed at higher levels than H120A (Fig. 1J), note the similar tubulin loading control. Crude estimation of the WBs argues for the WT protein expression being ~3x that of H120A and ~2x that of N125A, yet all three have similar rescue of the ABHD2 knockdown (Fig. 1I). This argues that low levels of ABHD2 expression is sufficient to rescue the knockdown, consistent with the catalytic enzymatic nature of the ABHD2 PLA2 activity.

      Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.

      The n reflects the number of independent female frogs. We have added this information to the figure legends. For each donor frog at each time point 10-30 oocytes were used.

      (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.

      [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. The 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more informative bar graphs or scatter plots.]

      We have revised the lipidomics data as requested by the Reviewer. The Reviewer asked that we show the data as a time course with each individual frog as in Fig. 2E. This turns out to be confusing and not a good way to present the data (please see Author response image 2).

      Author response image 2.

      Metabolite levels from 5 replicates of 10 oocytes each at each time point were measured and averaged per frog and per time point. Fold change was measured as the ratio at the 5- and 30-min time points relative to untreated oocytes (T0). FCs that are not statistically significant are shown as faded. Oocytes with mPR knockdown (KD) are boxed in green and ABHD2-KD in purple.

      We therefore revised the metabolomics data as follow to improve clarity. The changes in the glycerophospholipids and sphingolipids determined on the Metabolon CLP platform (specific for lipids) are now shown as single metabolites clustered at the levels of species and pathways and arranged for the 5- and 30-min time points sequentially on the same heatmap as requested (Fig. 2B). This allows for a quick visual overview of the data that clearly shows the decrease in the lipid species following P4 treatment in the control oocytes and not in the mPR-KD or ABHD2-KD cells (Fig. 2B). The individual species are listed in Supplemental Tables 1 and 2. We also revised the Supplemental Tables to include the values for the non-significant changes, which were omitted from the previous submission.

      We revised the metabolomics data from the HD4 platform in a similar fashion but because the lipid data were complimentary and less extensive than those from the CLP platform, we moved that heatmap to Supplemental Fig. 2B.

      For the single oocyte metabolomics, we now show the data as the correlation between FC and p value, which clearly shows the upregulated (including LPA) and downregulated metabolites at T30 relative to T0 (Fig. 2C). The raw data is now shown in a new Supplemental Table 7.  

      (3) The reticulocyte lysate co-expression data are quite important and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here - I'm not a protein expression expert - but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?

      We thank the Reviewer for this insightful comment. We agree that this is a critical result that would benefit from cross validation, especially given the low level of PLA2 activity detected in the reticulocyte lysate expression system. We have therefore expanded these studies using another in vitro expression system with microsomal membranes based on tobacco extracts (ALiCE®Cell-Free Protein Synthesis System, Sigma Aldrich) to enhance production and stability of the expressed receptors as suggested by the Reviewer. We further prepared virus-like particles (VLPs) from cells expressing each receptor individually or both receptors together. We however could not detect any PLA2 activity from the VLPs. We thus focused on the coupled in vitro transcription/translation tobacco extracts that allow the expression of difficult-to-produce membrane proteins in microsomes. This kit targets membrane protein directly to microsomes using a microsome targeting melittin signal peptide. This system took significant time and effort to troubleshoot and adapt to mPR and ABHD2 expression. We were however ultimately able to produce significantly higher amounts of both ABHD2 and mPRb, which were readily detected by WBs (Supplemental Fig. 4I). In contrast, we could not reliably detect mPR or ABHD2 using WBs from reticulocyte lysates given the limited amounts produced.

      Similarly to our previous findings with proteins produced in reticulocytes, expression of ABHD2 or mPRβ alone was not associated with an increase in PLA2 activity over a two-hour incubation period (Fig. 5C). It is worth noting here that the tobacco lysates had high endogenous PLA2 activity. However, co-expression of both mPRb and ABHD2 produced robust PLA2 activity that was significantly higher than that detected in reticulocyte lysate system (Fig. 5C). Surprisingly, however this PLA2 activity was P4 independent as it was observed when both receptors are co-expressed in the absence of P4.

      These results validate our earlier conclusion that PLA2 activity requires both mPR and ABHD2, so their interaction in needed for enzymatic activity. It is interesting however that in the tobacco expression system this mPR-ABHD2 PLA2 activity becomes for the most part P4 independent. As the tobacco expression system forces both ABHD2 and mPR into microsomes using a signal sequence, the two receptors are enriched in the same vesicular compartment. As they can interact independently of P4 as shown in the co-IP experiments in immature oocytes (Fig. 5D), their forced co-expression in the same microsomal compartment could lead to their association and thus PLA2 activity. This is an attractive possibility that fits the current data, but would need independent validation.

      Reviewer 3:

      There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.

      For the inhibitors used we performed a full dose response to define the active concentrations. So, inhibitors were not used at one high dose. We then compared the EC50 for each active inhibitor to the reported EC50 in the literature (Table 1). The inhibitors were deemed effective only if they inhibited oocyte maturation within the range reported in the literature. This despite the fact that frog oocytes are notorious in requiring higher concentrations of drug given their high lipophilic yolk content, which acts as a sponge for drugs. So our criteria for an effective inhibitor are rather stringent.  

      Based on these criteria, only 3 inhibitors were ‘effective’ in inhibiting oocyte maturation: Ibuprofen, ACA and MP-A08 with relative IC50s to those reported in the literature of 0.7, 1.1, and 1.6 respectively. Ibuprofen targets Cox enzymes, which produce prostaglandins. We independently confirmed an increase in PGs in response to P4 in oocytes thus validating the drug inhibitory effect. ACA blocks PLA2 and inhibits maturation, a role supported by the metabolomics analyses that shows decrease in the PE/PE/LPE/LPC species; and by the ABHD2-mPR PLA2 activity following in vitro expression. Finally, MP-A08 blocks sphingosine kinase activity, which role is supported by the metabolomics showing a decrease in sphingosine levels in response to P4; and our functional studies validating a role for the S1P receptor 3 in oocyte maturation.     

      As pointed out by the Reviewer, other inhibitors did block maturation at very high concentration, but we do not consider these as effective and have not implicated the blocked enzymes in the early steps of oocyte maturation. To clarify this point, we edited the summary panel (now Fig. 2D) to simplify it and highlight the inhibitors with an effect in the reported range in red and those that don’t inhibit based on the above criteria in grey. Those with intermediate effects are shown in pink. We hope these edits clarify the inhibitors studies.

      Recommendations For the Authors

      Reviewer 2:

      (1) Introduction, para 1. Please change "mPRs mediated" to "mPR-mediated".

      Done

      (2) Introduction, para 2. Please change "cyclin b" to "cyclin B".

      Done

      (3) Introduction, para 2. Please change "that serves" to "which serves".

      Done

      (4) Introduction, para 4. I know that the authors have published evidence that "a global decrease in cAMP levels is not detectable" (2016), but old work from Maller and Krebs (JBC 1979) did see an early, transient decrease after P4 treatment, and subsequent work from Maller said that there was both a decrease in adenylyl cyclase activity and an increase in cAMP activity. Perhaps it would be better to say something like "early work showed a transitory drop in cAMP activity within 1 min of P4 treatment (Maller), although later studies failed to detect this drop and showed that P4-dependent maturation proceeds even when cAMP is high (25)".

      We agree and thank the Reviewer for this recommendation. The text was revised accordingly.

      (5) Results, para 1. Based on the results in Fig 1B, one should probably not assert that ABHD2 is expressed "at levels similar to those of mPRβ in the oocyte"-with different mRNAs and different PCR primers, it's hard to say whether they are similar or not. The RNAseq data from Xenbase in Supp Fig 1 supports the idea that the ABHD2 and mPRβ mRNAs are expressed at similar levels at the message level, although of course mRNA levels and protein levels do not correlate well when different gene products are compared (Wuhr's 2014 Curr Biol paper reported correlation coefficients of about 0.3).

      We agree and have changed the text as follow to specifically point out to RNA: “we confirmed that ABHD2 RNA is expressed in the oocyte at levels similar to those of mPRβ RNA (Fig. 1B).”

      (6) Results, para 2. It would be worth pointing out that since an 18 h incubation with microinjected antisense oligos was sufficient to substantially knock down both the ABHD2 mRNAs (Fig 1C) and the ectopically-expressed proteins (Fig 1D), the mRNA and protein half-lives must be fairly short, on the order of a few hours or less.

      Done

      (7) Figure 1. Please make the western blots (especially Fig 1D) and their labeling larger. These are key results and as it stands the labeling is virtually unreadable on printed copies of the figures. I'm not sure about eLife's policy, but many journals want the text in figures to be no smaller than 5-7 points at 100% size.

      Likewise for many of the western blots in subsequent figures.

      As requested by the Reviewer we have increased the font and size of all Western blots in the Figures.

      (8) Figure 1E, G. I am not sure one should compare the effectiveness of the ABHD2 rescue (Fig 1E) and the mPRβ rescue (Fig 1G). Even if these were oocytes from the same frog, we do not know how the levels of the overexpressed ABHD2 and mPRβ proteins compare. E.g. maybe ABHD2 was highly overexpressed and mPRβ was overexpressed by a tiny amount.

      Although this is a possibility, the expression levels of the proteins here is not of much concern because we previously showed that mPRβ expression effectively rescues mPRβ antisense knockdown which inhibits maturation (please see (Nader et al., 2020)). This argues that at the levels of mRNA injected mPR is functional to support maturation, yet it does not rescue ABHD2 knockdown to the same levels (Fig. 1G). With that it is fair to argue that mPRβ is not as effective at rescuing ABHD2 KD maturation.

      (9) Inhibitor studies: There are two likely problems in comparing the observed potencies with legacy data - in vitro vs in vivo data and frog vs. mammalian data. Please make it clear what is being compared to what when you are comparing legacy data.

      The legacy data are from the literature based on the early studies that defined the IC50 for inhibition primarily using in vivo models (cell line mostly) but not oocytes. Typically, frog oocytes require significantly higher concentrations of inhibitors to mediate their effect because of the high lipophilic yolk content which acts as a sponge for some drugs. So, the fact that the drugs that are effective in inhibiting oocyte maturation (ACA, MP-A08, and Ibuprofen) work in a similar or lower concentration range to the published IC<sub50</sub> gives us confidence as to the specificity of their effect. We have revised Table 1 to include the reference for each IC<sub50</sub> value from the literature to allow the reader to judge the exact model and context used.

      (10) Isn't it surprising that Gas seems to promote maturation, given the Maller data (and data from others) that cAMP and PKA oppose maturation (see also the authors' own Fig 1A) and the authors' previous data sees no positive effect (minor point 7 above)?

      We show that a specific Gas inhibitor NF-449 inhibits maturation (although at relatively high concentrations), which is consistent with a positive role for Gas in oocyte maturation. We argue based on the lipidomics data and the inhibitors data that GPCRs play a modulatory role and not a central early signaling role in terms of releasing oocyte meiotic arrest. They are likely to have effects on the full maturation of the egg in preparation for embryonic development. The actions of the multiple lipid messengers generated downstream of mPRβ activation are likely to act through GPCRs and could signal through Gas or other Ga or even through Gβγ. Minor point 7 refers to the size of Western blots.

      (11) Page 9, bottom: "...one would predict activation of sphingosine kinases...." Couldn't it just be the activity of some constitutively active sphingosine kinase? Maybe replace "activation" with "activity".

      A constitutively sphingosine kinase activity would not make sense as it needs to be activated by P4.

      (12) Sometimes the authors refer to concentrations in molar units plus a power of 10 (e.g. 10-5 M) and sometime in µM or nM, sometimes even within the same paragraph. This makes it unnecessarily difficult to compare. Please keep consistent.

      We replaced all the concentrations through the text to M with scientific notation for consistency as requested by the Reviewer.

      (13) Fig 3I: "Sphingosine kinase" is misspelled.

      This has been corrected. We thank the Reviewer for catching it.

      (14) Legend to Fig. 5: Please change "after P4 treatment in reticulocytes" to "after P4 treatment in reticulocyte lysates".

      Done

      (15) Fig 6J. Doesn't the MAPK cascade inhibit MYT1? I.e. shouldn't the arrow be -| rather than ->?

      Yes the Reviewer is correct. This has been changed. We thank the Reviewer for noticing this error.

      (16) Materials and Methods, second paragraph. Please change "inhibitor's studies" to "inhibitor studies".

      Corrected thanks.

      (17) Table 1: Please be consistent in how you write Cox-2.

      Done.

      Reviewer #3:

      The findings are of potential broad interest, but I have some concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. Importantly, several claims regarding lipid metabolism signaling in the context of oocyte maturation are made without critical validation that the intended target is inactivated with reasonable selectivity across the proteome. Several of the inhibitors used for pharmacology and metabolomics are known covalent inhibitors (JZL184 and MJN110) that can readily bind additional lipases depending on the treatment time and concentration.

      I did not find any data using the reported ABHD2 inhibitor (compound 183; PMID: 31525885). Is there a reason not to include this compound to complement the knockdown studies? I believe this is an important control given that not all lipid effects were reversed with ABHD2 knockdown. The proper target engagement and selectivity studies should be performed with this ABHD2 inhibitor.

      We obtained aliquots the reported ABHD2 inhibitor compound 183 from Dr. Van Der Stelt and tested its effect on oocyte maturation at 10<sup>-4</sup>M using both low (10<sup>-7</sup>M) or high (10<sup>-5</sup>M) P4 concentration. Compound 183 partially inhibited P4-mediated oocyte maturation. The new data was added to the manuscript as Supplemental Figure 3D.

      Additional comments:

      (1) Pristimerin was tested at low P4 concentration for effects on oocyte maturation. Authors should also test JZL184 and MJN110 under this experimental paradigm.

      We have tested the effect of high concentration (2.10-<sup>-5</sup>M) of JZL184 or MJN110 on oocyte maturation at low P4 concentration (Author response image 3).  MJN 110 did not have a prominent effect on oocyte maturation at low P4, whereas JZL184 inhibited maturation by 50%. However, this inhibition of maturation required concentrations of JZL 184 that are 10 times higher than those reported in rat and human cells (Cui et al., 2016; Smith et al., 2015), arguing against an important role for a monoacylglycerol enzymatic activity in inducing oocyte maturation.

      Author response image 3.

      The effect of MJN110 and JZL184 compounds on oocyte maturation at low P4 concentration. Oocytes were pre-treated for 2 hours with the vehicle or with the highest concentration of 2.10-<sup>-5</sup> M for both JZL184 or MJN110, followed by overnight treatment with P4 at 10-<sup>7</sup>M. Oocyte maturation was measured as % GVBD normalized to control oocytes (treated with vehicle) (mean + SEM; n = 2 independent female frogs for each compound).

      2) Figure 4A showed different ct values of ODC between Oocytes and spleen, please explain them in the text. There is not any description regarding spleen information in Figure 4A, please make it clear in the text.

      We thank the Reviewer for this recommendation. The text was revised accordingly.

      (3) For Figures 3A, E, and I, there are different concentration settings for comparing the activity, is it possible to get the curves based on the same set of concentrations? The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect. Please set more concentration points to improve the figures. And for the error bar, there are different display formats like Figure 4c and 4d, etc. Please uniform the format for all the figures. Additionally, for the ctrl. or veh., please add an error bar for all figures.

      Some of the drugs tested were toxic to oocytes at high concentrations so the dose response was adjusted accordingly. The graphs were plotted to encompass the entire tested dose response. We could have plotted the data on the same x-axis range but that would make the figures uneven and awkward.

      We are not clear what the Reviewer means by “The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect.”

      The error bars for all dose responses are consistent throughout all the Figures. They are different from those on bar graphs to improve clarity. If the Reviewer wishes to have the error bars on the bar graphs and dose response the same, we are happy to do so. 

      For the inhibitor studies the data were normalized on a per frog basis to control for variability in the maturation rate in response to P4, which varies from frog to frog. It is thus not possible to add error bars for the controls.

      (4) Please check the sentence "However, the concentration of HA130...... higher that......'; Change "IC50" to "IC50" in the text and tables. Table 1 lists IC50 values in the literature, but the references are not cited. Please include the references properly. For the IC50 value obtained in the research, please include the standard deviation in the table. For reference parts, Ref 1, 27, 32, 46, doublecheck the title format.

      We edited the sentence as follows to be more clear: “However, this inhibition of maturation required high concentrations of HA130  -at least 3 orders of magnitude higher that the reported HA130 IC<sub>50</sub>-…”

      We changed IC50 to subscript in Table 1.

      We added the relevant references in Table 1 to provide context for the cited IC50 values for the different inhibitors used.

      We added SEM to the IC<sub>50</sub> for inhibition of oocyte maturation values in Table 1.

      We checked the titles on the mentioned references and cannot identify any problems.

      References

      Cui, Y., Prokin, I., Xu, H., Delord, B., Genet, S., Venance, L., and Berry, H. (2016). Endocannabinoid dynamics gate spike-timing dependent depression and potentiation. eLife 5, e13185.

      Nader, N., Dib, M., Hodeify, R., Courjaret, R., Elmi, A., Hammad, A.S., Dey, R., Huang, X.Y., and Machaca, K. (2020). Membrane progesterone receptor induces meiosis in Xenopus oocytes through endocytosis into signaling endosomes and interaction with APPL1 and Akt2. PLoS Biol 18, e3000901.

      Smith, M., Wilson, R., O'Brien, S., Tufarelli, C., Anderson, S.I., and O'Sullivan, S.E. (2015). The Effects of the Endocannabinoids Anandamide and 2-Arachidonoylglycerol on Human Osteoblast Proliferation and Differentiation. PloS one 10, e0136546.

    1. eLife Assessment

      This work is of fundamental significance to the field of nervous system development as it advances our mechanistic understanding of axon guidance. The rigorous biochemical and genetic approaches are compelling, experiments are well-controlled, and the major claims are supported by convincing data. The study should be of general interest to the developmental neurobiology community.

    2. Reviewer #1 (Public review):

      Summary:

      This study is focused an important aspect of axon guidance at the central nervous system (CNS) midline: how neurons extend axons that either do or do not cross the CNS midline. The authors here address contradictory work in the field relating to how cell surface expression of the slit receptor Robo1 is regulated so as to generate crossed and non-crossed axon trajectories during Drosophila neural development. They use fly genetics, cell lines, and biochemical assessments to define a complex consisting of the commissureless, Nedd4 and Robo1 proteins necessary for regulating Robo1 protein expression. This work resolves certain remaining questions in the field regarding midline axon guidance, with strengths out weighing weaknesses; however, addressing some of these weaknesses would strengthen this study.

      Strengths:

      Strengths include:<br /> - The use of well controlled genetic gain-of-function (over expression) approaches in vivo in Drosophila to show that phosphorylation sites (there are 2, and this study allows for assessment of the contributions made by each) in the commissureless (Comm) protein are indeed required for Comm function with respect to regulating axon midline guidance via their role in directing Comm-mediated Robo1 ubiquitination and degradation in the lysosome.<br /> - The demonstration that in vitro, and in a sensitized genetic background in vivo, the Nedd4 ubiquitin ligase regulates Robo1 protein cell surface distribution and also midline axon crossing in vivo.<br /> - Important evidence here that serves to resolve many questions raised by previous studies (not from these authors) regarding how Robo1 is regulated by Comm and Nedd4 family ubiquitin ligases. Further, these results are likely to have implications for thinking about the regulation of midline guidance in more complex nervous systems.

      Weaknesses:

      - A weakness beyond the purview of revision but important to mention is that the authors chose not to complement their GOF experiments with gene editing approaches to generate endogenous PY mutant alleles of Comm that might have been useful in genetic interaction experiments directed toward revealing roles for endogenous Comm in the regulation of Robo1.

      Comments on revised version:

      In this revised manuscript the authors provide new experiments and also reasonable explanations to address concerns raised in the initial review. I am satisfied that these efforts address satisfactorily the points raised in the initial review and that this study has been strengthened. This is an interesting body of work that adds to our understanding of CNS midline guidance molecular mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      Sullivan and Bashaw delve into the mechanisms that drive neural circuit assembly, and specifically, into the regulation of cell surface proteins that mediate axon pathfinding. During nervous system development, axons must traverse a molecularly and physically complex extracellular milieu to reach their synaptic targets. A fundamental, conserved repulsive signaling pathway is initiated by the Slit-Robo ligand-receptor pair. Robo, expressed on axon growth cones, binds Slit, secreted by midline cells, to prevent "pre-crossing" and "re-crossing" of axons at the midline. To control this repulsion, Robo surface levels are tightly regulated. In Drosophila, Commissureless (Comm) downregulates Robo surface levels and is required for axon crossing at the midline. Several studies suggest that PY motifs in Comm are required to localize Robo to endosomes. PY motifs have been shown to bind WW-domain containing proteins including the ubiquitin ligase Nedd4 family, so the authors propose that Comm may regulate Robo through Nedd4 interactions. Previous studies have hinted at a role for Nedd4-mediated ubiquitination of Comm in regulation of Robo localization, but there have also been conflicting data. For example, Comm mutants that are unable to be ubiquitinated mimic wild-type Comm, suggesting that ubiquitination of Comm is not required for regulation of Robo. The current study utilizes a suite of genetic analyses in Drosophila to resolve discrepancies pertaining to the mode of Comm-dependent regulation of Robo1 and proposes that Comm acts as an adapter for the Nedd4 ubiquitin ligase to recognize Robo1 as a substrate. The authors also demonstrate that Nedd4 is indeed required for midline crossing.

      Strengths:

      While this work is more incremental rather than field-shifting, it is nonetheless an excellent example of a rigorous, thorough analysis that culminates in enriching our mechanistic understanding of how neurons regulate cell-surface receptors in a spatiotemporal manner to control fundamental steps of circuit wiring. The experimental approach is thorough, and the manuscript is extremely well-written.

      Weaknesses:

      Some key experiments (eg. complex formation) were performed in cell culture in an overexpression background. However, updated experiments demonstrated complex formation using immunoprecipitation in tissues overexpression the corresponding components. Also, there was a missed opportunity to bolster the model proposed by using Comm PY mutants in several experiments.

      Comments on revised version:

      The revised manuscript bolsters the authors' conclusions and now provides evidence for interactions in tissue. No additional experiments are needed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Response to Editor and Reviewer Comments:

      Many thanks to the editor and reviewers for the thoughtful assessment of our manuscript “Commissureless acts as a substrate adapter in a conserved Nedd4 E3 ubiquitin ligase pathway to promote axon growth across the midline.” Thank you also for the positive comments about the quality of our writing, and for deeming our study rigorous and thorough. We are very pleased that, overall, you believe our combination of genetic and biochemical approaches offers useful insight into the mechanism of Robo regulation at the Drosophila embryonic midline and effectively reconciles the contradictory findings of previous studies done in this field.

      Response to the previous Public Reviews:

      We appreciate the concerns expressed by the reviewers and the suggestions of areas in which the study and manuscript could be improved. The reviewer suggestions were very helpful as we revised our manuscript in order to strengthen our mechanistic understanding of Robo downregulation and better characterize the role Nedd4 plays in this process. We strongly agree with Reviewer 1 that our insight into the mechanism of Robo downregulation via Comm would be much stronger had we not solely relied on overexpression experiments to investigate the effects of PY motif mutations on Comm function. While it is outside the scope of this particular paper, we appreciate your suggestion to use gene editing to investigate the role of PY motif mutation on endogenous comm function and believe this would be a useful question to address in future papers. In addition to this concern, both reviewers identified additional opportunities to strengthen the paper. We have done our best to incorporate reviewer suggestions and will outline how we addressed the following four areas that were identified by both reviewers as areas where additional data could strengthen our conclusions:

      (1) Additional experiments to examine Comm and Robo1 localization in vivo: Characterizing Robo localization in vivo when co-expressed with PY-mutant Comm variants.

      (2) Testing biochemical interactions in embryonic protein extracts: Examining the biochemical interaction between Robo, Comm, and Nedd4 in a more biologically relevant context than cell culture.

      (3) Additional genetic interaction experiments: A) Investigating whether Nedd4 overexpression enhances the Comm G.O.F phenotype of enhanced ectopic crossing. B) Testing for additional genetic interactions with comm.

      (4) Editing the text of the manuscript for clarity.

      (1) Characterizing Robo localization in vivo when co-expressed with Comm variants.

      In the first version of our manuscript, we characterized the localization of wild-type and PY mutant Comm variants expressed in apterous neurons (Figure 5C), but did not examine how these variants of Comm affected localization of their cargo Robo1. To address this gap, we co-expressed 10X UAS Comm-myc (WT, 1PY, 2PY) with 10X UAS Robo-HA under the ap gal4 driver, visualized Comm and Robo by immunostaining for Myc and HA, and measured colocalization between Comm and Robo. We found that Robo colocalizes equally with all comm variants and that its expression pattern mimics that of the Comm variant with which it is expressed. We observe that Robo is restricted to cell bodies when overexpressed with WT Comm but “leaks out” into axons when co-expressed with Comm 1PY or 2PY. This finding suggests that PY motifs are not only required for effective Comm localization to the appropriate cellular areas, but also for proper routing of its cargo, Robo1. These new data are presented in a new supplemental figure: Figure S3.

      (2) Examining the biochemical interaction between Robo, Comm, and Nedd4 in vivo.

      To examine biochemical interaction between Comm, Robo, and Nedd4 in a more biologically relevant context, we performed immunoprecipitations in fly embryonic lysate prepared from the following categories: WT, elav gal4: 5X UAS Comm-myc WT, and elav gal4: 5X UAS Comm-myc WT + 10X UAS Nedd4-HA. We performed immunoprecipitation for myc (Comm), and blotted for endogenous Robo, Myc (Comm), and HA (Nedd4). Corroborating our results in cell culture (Figure 7 A-C), we were able to pull down a three-protein complex consisting of Comm, Nedd4 and Robo in embryonic fly tissue. These new data are presented in a new supplemental figure: Figure S8.

      (3) Investigating additional genetic interactions between Comm and Nedd4.

      A) In our submitted manuscript, we demonstrated that overexpression of Nedd4 enhances Comm-induced downregulation of Robo levels (Figure 7 D-G). To determine whether Nedd4 also increases ectopic crossing, which is a morphological output of Comm activity/Robo downregulation, we analyzed nerve cord phenotypes in embryos from the following categories: WT, embryos expressing WT Comm under the elav gal4, and embryos co-expressing WT Comm and Nedd4 under the elav gal4 driver. We measured nerve cord widths and sorted them into three different “bins” of phenotypic severity, with more severe phenotypes being characterized by thinner nerve cords. We find that the distribution of phenotypes in embryos expressing Comm alone differs significantly from embryos expressing Comm + Nedd4, with the latter shifted toward more severe/thinner phenotypic classes. In addition to examining nerve cord width, we investigated whether Nedd4 can enhance collapse of the nerve cord segments (defined by loss of negative space within the segment) induced by Comm overexpression. We determined percentage of collapsed nerve cord segments and divided these values into three phenotypic classes: no collapse, partial collapse, and total collapse. The distribution of phenotypes in embryos co-expressing Nedd4 and Comm differs significantly from those expressing Comm alone. In the Comm expressing population, we only observe nerve cords with no or partial collapse, but in flies co-expressing Comm and Nedd4 we observe the more severe complete collapse phenotype. These findings suggest that addition of Nedd4 enhances the Comm gain of function phenotype both by further reducing nerve cord width and increasing the occurrence of defects related to ectopic crossing. These new data are presented in a new supplemental figure: Figure S9.

      B) The reviewers also suggested additional genetic interaction experiments between Nedd4 and Comm. It was suggested that we included experiments to look at Nedd4 manipulations in Comm null mutant backgrounds. However, given the complete penetrance and expressivity of the Comm null mutation in which no axons cross the midline, these experiments would not be informative. As an alternative, we attempted to use the described hypomorphic Comm allele, but here too, the baseline commissural axon guidance defects are too strong to allow meaningful detection of enhanced phenotypes. Finally, we tested whether removing one copy of comm could reveal phenotypes in the nedd4 zygotic mutants, but we did not detect defects. This is perhaps unsurprising given that comm heterozygotes have no detectable midline crossing defects.

      (4) Text edits.

      We have made a variety of changes to decrease ambiguity in the text and create a more user-friendly experience for the reader. In the text, as opposed to just the figures, we now explicitly state whether we use 5X or 10X UAS constructs for each of our overexpression constructs. We also edited all mentions of the truncated frazzled construct (FraDc) so that they are uniform. We have also edited all mentions of MiMIC so that they are uniform. In addition, we answer a few questions the reviewers posed. First, we clarify that S2R+ cells express endogenous Comm at very low levels. In addition, we clarify about how we know expression levels are similar across the three Comm variants by explaining that transgenes incorporated into the fly genome by targeted insertion into the same location on the third chromosome.

      We hope that these changes adequately address reviewer concerns, strengthen our study, and enhance readability of the paper. We appreciate the time you took to evaluate our manuscript and the thoughtful commentary and suggestions that you provided.

    1. eLife Assessment

      The paper presents a new behavioral assay for Drosophila aggression and demonstrates that social experience influences fighting strategies, with group-housed males favoring high-intensity but low-frequency tussling over aggressive lunging observed in isolated males. This paper is valuable for researchers studying Drosophila social behaviors, while the characterization of behavioral and neuroanatomical data is incomplete.

    2. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths:

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Weaknesses:

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

    3. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Weaknesses:

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

    4. Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are keys for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other. Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths:

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Weaknesses:

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      We will tone down the description and cite related references.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      Ww will textually address this issue.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      We will textually address this issue.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      We will cite this paper and discuss on this issue.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      We will provide more methodological details.

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      We will textually address this issue.

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      We will textually address this issue and provide additional videos.

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      We will textually address the first question and provide more detailed analysis for the second question.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      We will tone down this statement.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      We conducted pathway analysis in the FlyWire electron microscopy database to investigate the connection between Or47b neurons and pC1 neurons. The results indicate that at least three intermediate neurons are required to establish a connection from Or47b neurons to pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they provide a reference for circuit connect in males. Using the currently available upstream and downstream tracing tools (e.g., retro-/trans-Tango), it is not possible to establish a direct connection between the two. Identifying the intermediate neurons involved in this connection is beyond this study. We will discuss on this concern in our revised manuscript.

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

      We will discuss on this concern.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc.), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Thank you for the acknowledgment of the novelty and significance of the study, and your suggestions for improving the manuscript.

      Weaknesses:

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      We will either tone down this statement or provide additional analysis.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

      We will provide more detailed methods and data analysis regarding tussling behavior.

      Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Thank you for the precise summary of the key findings and novelty of the study, and your insightful suggestions.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      We will add more details in methods.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      Reply: we will perform additional morphological and functional experiments on pC1<sup>SS2</sup> neurons, e.g., whether they are fru or dsx positive and comparing them with P1a neurons.

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other. Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Regarding why Or47b-expressing OSNs regulate two types of innate behaviors, we will add a discussion in the revised manuscript to explore the possible mechanisms underlying this phenomenon.

      Regarding the relationship between Or47b-expressing OSNs and pC1<sup>SS2</sup> neurons, we conducted pathway connection analyses using the FlyWire database. Although the FlyWire database currently only contains neuronal data from female brains, these findings provide a certain degree of reference. The results indicate that at least three intermediate neurons are required to establish the connection between these two neuronal types. We hope the editor and reviewers would agree with us that identifying these intermediate neurons involved in this connection is beyond this study.

      Regarding the rationale and conclusions from the experiments in Figure 5, we acknowledge the difficulty in quantifying tussling and lunging behaviors in these experiments. In the revised manuscript, we will tone down the statements about the relationship between fighting strategies and reproductive success. Additionally, we will provide further behavioral experiments to support the association between these two factors.

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

      We will add more details in methods and cite additional references, we will also perform additional experiment on pC1<sup>SS2</sup> function.

    1. eLife Assessment

      This paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and even superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell's equations. If verified, the potential impact of the proposed method is very high indeed, but it is currently impossible to verify because the clarity of presentation and the evidence for the claims in the current version is inadequate.

    2. Reviewer #1 (Public Review):

      The paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and also superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell's equations.

      The proposed method brings together some very interesting ideas, and the potential impact is high. However, the work does not provide the evaluations expected when validating a new source reconstruction approach. I cannot judge the success or impact of the approach based on the current set of results. This is very important to rectify, especially given that the work is challenging some long-standing and fundamental assumptions made in the field.

      I also find that the clarity of the description of the methods, and how they link to what is shown in the main results hard to follow.

      I am insufficiently familiar with the intricacies of Maxwell's equations to assess the validity of the assumptions and the equations being used by WETCOW. The work therefore needs assessing by someone more versed in that area. That said, how do we know that the new terms in Maxwell's equations, i.e. the time-dependent terms that are normally missing from established quasi-static-based approaches, are large enough to need to be considered? Where is the evidence for this?

      I have not come across EFD, and I am not sure many in the EEG field will have. To require the reader to appreciate the contributions of WETCOW only through the lens of the unfamiliar (and far from trivial) approach of EFD is frustrating. In particular, what impact do the assumptions of WETCOW make compared to the assumptions of EFD on the overall performance of SPECTRE?

      The paper needs to provide results showing the improvements obtained when WETCOW or EFD are combined with more established and familiar approaches. For example, EFD can be replaced by a first-order vector autoregressive (VAR) model, i.e. y_t = A y_{t-1} + e_t (where y_t is [num_gridpoints x 1] and A is [num_gridpoints x num_gridpoints] of autoregressive parameters).

      The authors' decision not to include any comparisons with established source reconstruction approaches does not make sense to me. They attempt to justify this by saying that the spatial resolution of LORETA would need to be very low compared to the resolution being used in SPECTRE, to avoid compute problems. But how does this stop them from using a spatial resolution typically used by the field that has no compute problems, and comparing with that? This would be very informative. There are also more computationally efficient methods than LORETA that are very popular, such as beamforming or minimum norm.

      In short, something like the following methods needs to be compared:

      (1) Full SPECTRE (EFD plus WETCOW)<br /> (2) WETCOW + VAR or standard ("simple regression") techniques<br /> (3) Beamformer/min norm plus EFD<br /> (4) Beamformer/min norm plus VAR or standard ("simple regression") techniques

      This would also allow for more illuminating and quantitative comparisons of the real data. For example, a metric of similarity between EEG maps and fMRI can be computed to compare the performance of these methods. At the moment, the fMRI-EEG analysis amounts to just showing fairly similar maps.

      There are no results provided on simulated data. Simulations are needed to provide quantitative comparisons of the different methods, to show face validity, and to demonstrate unequivocally the new information that SPECTRE can _potentially_ provide on real data compared to established methods. The paper ideally needs at least 3 types of simulations, where one thing is changed at a time, e.g.:

      (1) Data simulated using WETCOW plus EFD assumptions<br /> (2) Data simulated using WETCOW plus e.g. VAR assumptions<br /> (3) Data simulated using standard lead fields (based on the quasi-static Maxwell solutions) plus e.g. VAR assumptions

      These should be assessed with the multiple methods specified earlier. Crucially the assessment should be quantitative showing the ability to recover the ground truth over multiple realisations of realistic noise. This type of assessment of a new source reconstruction method is the expected standard.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript claims to present a novel method for direct imaging of electric field networks from EEG data with higher spatiotemporal resolution than even fMRI. Validation of the EEG reconstructions with EEG/FMRI, EEG, and iEEG datasets are presented. Subsequently, reconstructions from a large EEG dataset of subjects performing a gambling task are presented.

      Strengths:

      If true and convincing, the proposed theoretical framework and reconstruction algorithm can revolutionize the use of EEG source reconstructions.

      Weaknesses:

      There is very little actual information in the paper about either the forward model or the novel method of reconstruction. Only citations to prior work by the authors are cited with absolutely no benchmark comparisons, making the manuscript difficult to read and interpret in isolation from their prior body of work.

    4. Author response:

      The authors plan to respond in full in due course.

    1. eLife Assessment

      This important study reveals that disrupting fatty acid metabolism in macrophages significantly restricts the growth of Mycobacterium tuberculosis, showing that impaired lipid processing triggers various antimicrobial responses. Whilst the approach is robust, utilizing CRISPR-Cas9 knockout of multiple genes involved in lipid metabolism which yielded some convincing data, there are aspects that require improvement such as the autophagy assay and redox measurements. This work highlights how host lipid metabolism affects the ability of tubercle bacilli to thrive intracellularly, pointing to potential new therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

    3. Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis. A<br /> a) The phenotypes of PLIN2-/-, FATP1-/-, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point. b) Finding the FATP1 transcript in the HOXB8-derived FATP1-/- CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1-/- cells.<br /> c) No gene showing differential regulation in FATP-/- macrophages, which is very surprising.<br /> d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      (2) Experimental design: For a few assays, the experimental design is inappropriate<br /> a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.<br /> b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      (3) Using correlative observations as evidence:<br /> a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.<br /> b) Claiming that the inability to generate lipid droplets in PLIN2-/- cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, revealsspecific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

      We thank the reviewer for pointing this out. We acknowledge that our in vitro system may indeed not fully replicate the complex in vivo environment in light of the heterogenous responses of macrophages to Mtb infection in whole animal models. We do believe, however, that the Hoxb8 in vitro model provides a powerful genetic tool to interrogate host-Mtb interactions using primary macrophages that represent the bone marrow-derived macrophage lineage. Reviewer #1 also made several helpful suggestions in their recommendations to authors relating to the reorganization of the data in our Figures in both the manuscript and the supplemental data.  We will incorporate these suggestions into the revised version of the manuscript upon resubmission.

      Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis. A

      a) The phenotypes of PLIN2-/-, FATP1-/-, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point.

      We thank the reviewer for highlighting this. Our main focus was on assessing the impact of manipulating lipid homeostasis in macrophages and the consequences this has on the intracellular growth of Mtb.  It was never our intention to imply these mutants generated equivalent phenotypes, and we will modify the revised manuscript to reflect this point.  We will stress that interfering with lipid processing at different stages in macrophages results in both shared and divergent anti-microbial conditions against Mtb.

      b) Finding the FATP1 transcript in the HOXB8-derived FATP1-/- CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1<sup>-/-</sup> cells.

      CRISPR-Cas9 targeting of genes with single sgRNAs as is the case with our mutants generates insertions and deletions (INDELs) at the CRISPR cut site. These INDELs do not block mRNA transcription totally, and this is widely reported and accepted in the field.  In these cases, RT-PCR or RNA-seq methods are not used to verify CRISPR knockouts as they are not sensitive enough to identify INDELs. We provide knockout efficiencies by ICE analysis in supplemental information file 1 for all the mutants used in the study. We also demonstrate protein depletion by western blot and flow cytometry for all the mutants (Figure 1 - figure supplement 1). Only mutants with greater than >90% protein depletion were used for subsequent characterization.

      c) No gene showing differential regulation in FATP1<sup>-/-</sup> macrophages, which is very surprising.

      We assume the reviewer is referring to the Mtb transcriptome response in FATP1<sup>-/-</sup> macrophages, which we agree was unexpected.  However, we saw a significant compensatory response in the host cell (at transcriptional level) in FATP1-/- macrophages as evidenced by an upregulation of other fatty acid transporters (Figure 5 - figure supplement 1). We postulate that these compensatory responses could, in part, alleviate the stresses the bacteria experience within the cell, and these were discussed in the manuscript.

      d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      We thank the reviewer for the suggestion. However, confocal imaging is also widely used to measure ROS with similar quantitative power and individual cell resolution (PMID: 32636249, 35737799).

      (2) Experimental design: For a few assays, the experimental design is inappropriate

      a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.

      We would like to point out that monitoring LC3I to LC3II conversion by western blot, confocal imaging of LC3 puncta and qPCR analysis of autophagy related genes are all validated assays for monitoring autophagic flux in a wide variety of cells. We refer the reviewer to the latest extensive guidelines on the subject (PMID: 33634751). Furthermore, Bafilomycin A and chloroquine are not specific inhibitors of autophagy and therefore are of limited value as controls. BafA is an inhibitor of the proton-ATPase apparatus as well impacting autophagy through activity on the Ca-P60A/SERCA pathway. Chloroquine impacts vacuole acidification, autophagosome/lysosome fusion and slows phagosome maturation. So, while BafA and chloroquine will reduce autophagy their effects are pleotropic and their impact on Mtb is unknown.

      b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      See our response above.

      (3) Using correlative observations as evidence:

      a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.

      We are not entirely sure where we used our RNA-seq data sets as functional readouts. We used our transcriptome data to provide a preliminary identification of anti-microbial responses in the mutant macrophages infected with Mtb. Where applicable, we followed up and confirmed the more compelling RNA-seq data either by metabolic flux analyzes, qPCR, ROS measurements, and quantitative imaging.

      b) Claiming that the inability to generate lipid droplets in PLIN2-/- cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      Again, it was not our intention to infer causality. Throughout the manuscript, we endeavor to present our data with a specific focus on describing the consequences of interfering with either fatty acid import, lipid droplet biogenesis and fatty acid oxidation on macrophage responses to Mtb.  We will revisit the revised manuscript to remove any sections that imply causality.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      To the best of our knowledge, Mtb growth restrictions in PLIN2 and FATP1 deficient macrophages have not been reported elsewhere. To the contrary, PLIN2 knockout macrophages obtained from PLIN2 deficient mice have been reported to robustly support Mtb replication (PMID: 29370315), quite the opposite to our data. We extensively discuss these discrepancies in the manuscript. We also discuss and cite appropriate references where Mtb growth restriction for similar macrophage mutants have been reported (CD36<sup>-/-</sup> and CPT2<sup>-/-</sup>). Our aim was to carry out a systematic myeloid specific genetic interference of fatty acid import, storage and catabolism to assess the effect on Mtb growth at all stages of lipid handling instead of focusing on one target. In the chemical approach, we used TMZ and Metformin deliberately because they had already been reported as being active against intracellular Mtb and we wished to place our data in the context of existing literature.  These studies were referenced extensively in the text.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

      We will re-organize the figures in the manuscript revision as per the reviewer’s recommendation, and the recommendations of reviewer #1.

      We will address the other concerns raised by reviewer #2 in the recommendations to authors during revision of the manuscript. 

      Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, reveals specific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

    1. eLife Assessment

      This study uses electrophysiological recordings, causal manipulations of activity, and modeling to investigate how the maintenance of a spatial location in working memory affects the representation of visual information in area V4 of monkeys. The work is important not just for understanding how visual information is encoded, but also for determining precisely how prefrontal inputs to the sensory cortex sculpt the corresponding visual responses during working memory. The data provide solid evidence of direct communication between prefrontal circuits that store spatial information and V4, which, under the current experimental conditions, manifests mainly as changes in temporal activity patterns (beta oscillations and phase coding).

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different. With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

    3. Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

    4. Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 311-319). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract line 27, introduction lines 58-60, results line 215, conclusion lines 294-295).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 301-322), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 311-319).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      We plan to incorporate statistical analysis of this point in the revised version.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We plan to incorporate statistical analysis of this point in the revised version.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 180-182).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      We plan to incorporate statistical analysis of this point in the revised version.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 301-322).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 301-322). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      We plan to incorporate statistical analysis of this point in the revised version.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 277-278). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We plan to incorporate statistical analysis of this point in the revised version.

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

    1. eLife Assessment

      Masroor Ahmad Paddar and colleagues reveal noncanonical roles of ATG5 and membrane ATG8ylation in regulating retromer assembly and function. They identify ATG5's unique non-autophagic role and show that CASM partially contributes to these phenotypes. Although the mechanism by which ATG8ylation regulates the retromer remains unclear, the findings provide important insights with solid supporting evidence.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane atg8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage.

      Strengths:

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process.

      Weaknesses:

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved.

      Comments on revisions:

      After revision, though the major weakness remains unsolved, other questions have been addressed experimentally or further interpreted.

    3. Reviewer #2 (Public review):

      Summary:

      Padder et al. demonstrates that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributors to these phenotypes.

      Strengths:

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributors to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking.

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be difference in regulation dependent on the stress imposed on a cell.

      Comments on revisions:

      My previous comments have been addressed.

    4. Reviewer #3 (Public review):

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1.

      Comments on revisions:

      The concerns I raised have all been addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane atg8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage. 

      Strengths: 

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process. 

      Weaknesses: 

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved. 

      We agree with the reviewer.  We do however show how at least one aspect of atg8ylation contributes to the proper retromer function, which occurs via lysosomal membrane maintenance and repair. Understanding the more direct effects on retromer will require a separate study. We now emphasize this in the revised manuscript (p. 18) and point out the limitations of the present work (p. 18): “One of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s that still need to be understood”.

      Reviewer #2 (Public Review): 

      Summary:

      Padder et al. demonstrate that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. 

      Strengths: 

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking. 

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be differences in regulation dependent on the stress imposed on a cell. 

      We thank the reviewer for the overall assessment of the strengths of the study.  We have discussed in the manuscript the elegant study by Roy et al., PMID 28602683. To accommodate reviewer’s comment, we have additionally emphasized in the text that our study is focused on basal conditions and conditions that perturb endolysosomal compartments. We agree with the reviewer that under metabolic stress conditions (such as glucose limitation) more complex pathways may be engaged and have acknowledged that in the discussion. We have now included this in the limitations of the study (p. 18): “Another limitation of our study is that we have focused on basal conditions or conditions causing lysosomal damage, whereas metabolic stress including glucose excess or limitation with its multitude of metabolic effects have not been addressed”.

      Weaknesses: 

      (1) Additional controls are needed to clarify the role of CASM in the control of retromer function. Because the manuscript proposes both CASM-dependent and independent pathways in the ATG5 mediated regulation of the retromer, it is important to provide robust evidence that CASM is required for retromer-dependent GLUT1 sorting to the plasma membrane vs. lysosome. The experiments with monensin in Fig. 7C-E are consistent with but not unequivocally corroborative of a role for CASM. 

      We fully agree with the reviewer. In fact, our data with bafilomycin A1 treatment causing GLUT1 miss-sorting show that it is the perturbance of lysosomes  and not CASM per se that leads to mis-sorting of GLUT1 (Fig. 7D,E). Note that it has been shown (PMIDs: 28296541, 25484071 and 37796195) that although bafilomycin A1 deacidifies lysosomes it does not induce but instead inhibits CASM. This is because bafilomycin A1 causes dissociation of V1 and V0 sectors of V-ATPase, unlike other CASM-inducing agents which promote V1 V0 association. Complementing this, our data with ATG2AB DKO and ESCRT VPS37A KO (Fig. 8A-F) indicate that the repair of lysosomes is important to keep the retromer machinery functional (as illustrated in Fig. 8G). This may be one of the effector mechanisms downstream of membrane atg8ylation in general and hence also downstream of CASM. We have revised Fig. 7 title to read “Lysosomal perturbations cause GLUT1 mis-sorting” and have explained these relationships in the text (p. 12-13): “Since bafilomycin A1 does not induce CASM but disturbs luminal pH, we conclude that it is the less acidic luminal pH of the endolysosomal organelles, and not CASM, that is sufficient to interfere with the proper sorting of GLUT1.”

      Based on the results shown with ATG16KO in Fig 4A-D, rescue experiments of these 16KO cells with WT vs. C-terminal WD40 mutant versions of ATG16 will specifically assess the requirement for CASM and potentially provide more rigorous support for the conclusions drawn. 

      We have carried out complementation with ATG16L1 WT and its E230 mutant (devoid of WD40 repeats but still capable of canonical autophagy) and placed these data in Fig. 7 (panels I and J) as recommended by the reviewer. This is now described on p. 13 (To additionally test this notion, we compared ATG16L1 full length (ATG16L1FL) and ATG16L1E230 (Rai et al., PMID 30403914) for complementation of the GLUT1 sorting defect in ATG16L1 KO cells (Fig. 7I,J). ATG16L1E230 [Rai, 2019, 30403914] lacks the key domain to carry out CASM via binding to VATPase 29,30 31-33 but retains capacity to carry out atg8ylation.  Both ATG16L1FL and ATG16L1E230 complemented mis-sorting of GLUT1 (Fig. 7I,J). Collectively, these data indicate that it is not absence of CASM/VAIL but absence of membrane atg8ylation in general that promotes GLUT1 mis-sorting.).

      (2) Also, the role of TBC1D5 should be further clarified. In Fig S7, are there any changes in the interactions between TBC1D5 and VPS35 in response to LLOMe or other agents utilized to induce CASM? 

      We thank the reviewer for pointing this out. We do have data with VPS35 in co-IPs shown in Fig. S7.  There is no change in the amounts of VPS35 or TBC1D5 in GFP-LC3A co-IPs. We now include in Fig. S7 (new panel D) a graph with quantification in the revised manuscript and emphasize this point (p. 12): “However, under CASM-inducing conditions, no changes were detected (Fig. S7B-D) in interactions between TBC1D5 and LC3A or in levels of VPS35 in LC3A co-IP, a proxy for LC3A-TBC1D5-VPS29/retromer association. This suggests that CASM-inducing treatments and additionally bafilomycin A1 do not affect the status of the TBC1D5-Rab7 system”.        

      Does TBC1D5 loss-of-function modulate the numbers of GLUT1 and Gal3 puncta observed in ATG5 deficient cells in response to LLOMe? 

      We agree that TBC1D5 is an interesting aspect. However, because TBC1D5 does not change its interactions in the experiments in our study, we consider this topic (i.e. whether TBC1D5 phenocopies VPS35 and ATG5 KOs in its effects on Gal3) to be beyond the scope of the present work. We underscore that LLOMe (lysosomal damage) mis-sorts GLUT1 even without any genetic intervention (e.g., in WT cells in the absence of ATG5 KO; Fig. 7). Thus, in our opinion the effects of TBC1D5 inactivation may be a moot point.  

      (3) Finally, the studies here are motivated by experiments in Fig. S1 (as well as other studies from the Deretic and Stallings labs) suggesting unique autophagy-independent functions for ATG5 in myeloid cells and neutrophils in susceptibility to Mycobacterium tuberculosis infection. However, it is curious that no attempt is made to relate the mechanistic data regarding the retromer or GLUT1 receptor mis-sorting back to the infectious models. Do myeloid cells or neutrophils lacking ATG5 have deficiencies in glucose uptake or GLUT1 cell surface levels? 

      Reviewer’s point is well taken. Glucose uptake, its metabolism, and diabetes underly resurgence in TB in certain populations and are important factors in a range of other diseases. This was alluded to in our discussion (lines 461-469). However, these are complex topics for future studies. We have now expanded this section of the discussion (p. 18): “In the context of tuberculosis, diabetes, which includes glucose dysregulation, is associated with increased incidence of active disease and adverse outcomes” (Dheda et al., ,PMID: 26377143; Dooley, et al., PMID:19926034).

      Reviewer #3 (Public Review): 

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1.

      Nevertheless, there are several issues that the authors need to address to further clarify their conclusions. 

      (1) The authors confirmed the interaction between Atg5 and the Retromer complex through Co-IP experiments. Is the interaction between Atg5 and the Retromer direct? If it is direct, which Retromer complex protein regulates the interaction with Atg5? Additionally, does ATG5 K130R mutant enhance its interaction with the Retromer? 

      AlphaFold modeling in the initial submission of our study to eLife (absent from the current version) suggested the possibility of a direct interaction between ATG5 and VPS35 with ATG12—ATG5 complex facing outwards, in which case K130R would not matter. However, mutational experiments in putative contact residues did not alter association in co-IPs. So either ATG5 interacts with other retromer subunits or more likely is in a larger protein complex containing retromer. It will take a separate study to dissect associations and find direct interaction partners. 

      (2) To more directly elucidate how ATG5 regulates Retromer function by interacting with the Retromer and participates in the trafficking of GLUT1 to the plasma membrane, the authors should identify which region or crucial amino acid residues of ATG5 regulate its interaction with the Retromer. Additionally, they should test whether mutations in ATG5 that disrupt its interaction with the Retromer affect Retromer function (such as participating in the trafficking of GLUT1 to the plasma membrane) and whether they affect Atg8ylation. They also need to assess whether these mutations influence canonical autophagy and lysosomal sensitivity to damage. 

      Please see the response to point 1.

      Recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors): 

      While most data are solid and convincing, the following questions need to be addressed before publication: 

      Major Concerns: 

      (1) Examining only one cargo (GLUT1) is insufficient to reflect the retromer's function comprehensively. At least two additional cargoes should be analyzed to observe the phenotypes more accurately. 

      We agree that having another retromer cargo (in addition to GLUT1) would be of interest. We point out that our data also show mis-sorting of SNX27 to lysosomes (Fig. 3H, quantifications in Fig. 3I).  SNX27 in turn sorts nearly 80 ion channels, signaling receptors, and other nutrient transporters. Which of the 80 cargos to prioritize and check (the expectation is that all 80 might be missorted given that they need SNX27)?  We have instead tested MPR, a SNX27-independent cargo. We now include data on effects of ATG5 knockout on CI-MPR (Fig. S9A-F). This is described in the text (p. 14; “Effect of ATG5 knockout on MPR sorting

      We tested whether ATG5 affects cation-independent mannose 6-phosphate receptor (CI-MPR). For this, we employed the previously developed methods (Fig. S9A) of monitoring retrograde trafficking of CI-MPR from the plasma membrane to the TGN 70,118-121. In the majority of such studies, CI-MPR antibody is allowed to bind to the extracellular domain of CI-MPR at the plasma membrane and its localization dynamics following endocytosis serves as a proxy for trafficking of CI-MPR. We used ATG5 KOs in HeLa and Huh7 cells and quantified by HCM retrograde trafficking to TGN of antibody-labeled CI-MPR at the cell surface, after being taken up by endocytosis and allowed to undergo intracellular sorting, followed by fixation and staining with TGN46 antibody. There was a minor but statistically significant reduction in CIMPR overlap with TGN46 in HeLaATG5-KO that was comparable to the reduction in HeLa cells when

      VPS35 was depleted by CRISPR (HeLaVPS35-KO) (Fig. S9B,C). Morphologically, endocytosed Ab-CI-

      MPR appeared dispersed in both HeLaATG5-KO and HeLaVPS35-KO cells relative to HeLaWT cells (Fig. S9D). Similar HCM results were obtained with Huh7 cells (WT vs. ATG5KO; Fig. S9E,F). We interpret these data as evidence of indirect action of ATG5 KO on CI-MPR sorting via membrane homeostasis, although we cannot exclude a direct sorting role via retromer. We favor the former interpretation based on the strength of the effect and the controversial nature of retromer engagement in sorting of CI-MPR (57,70,75,98,120).”)

      (2) The evidence from Alphafold predictions is weak. The direct interaction of ATG5 with retromer subunits should be tested. 

      Please see the above response to Reviewer 3.

      In addition, does retromer also interact with ATG16L1 similarly to the phenomenon in VAIL? 

      We fully agree with the reviewer that finding the direct interacting partners between retromer and membrane atg8ylation machinery is an important direction as in our opinion it would expand the repertoire of E3 ligases and its adaptors. However, given the complexity and variety of possibilities, we believe that this is a topic for a future study.  

      (3) In Line 166, Figures 2C and 2D, the Gal3 phenotype does not seem to be well complemented by VPS35. 

      We have adjusted the text to acknowledge incomplete complementation (p.7). 

      (4) In Figures 3 and 4, the authors show that KO of membrane atg8ylation machineries and ATG8-Hexa KO affects the localization of retromer cargo GLUT1 and SNX27. However, the mechanism by which membrane ATG8ylation affects retromer remains unresolved.

      Additionally, are other retromer subunits' locations are also affected, if so, how are they impacted? At least a speculative explanation should be provided. 

      Following reviewers request, we now state on p. 19 that “one of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s on retromer that still need to be understood”.

      (5) In Figure 3, endogenous IP results are required to examine the interaction of ATG5 with retromer if suitable retromer antibodies for IP are available. 

      Endogenous IPs are given in Fig. 1. We have modified text on p. 8 to clarify this.

      (6) In Figure 4, ATG8 Hexa KO, and triple KO of LC3s or GABARAPs all increase the localization of GLUT1 on lysosomes. It seems redundant for ATG8 family proteins here.

      Can any individual member of the ATG8 family rescue this phenotype? 

      If the intent of such complementation analysis is to identify a specific mATG8 responsible for the observed effects, this is already pre-empted by the fact that TKOs also have a similar effect as HEXA mutants (i.e. loss of at least two of mATG8s is enough to cause the phenotype). We now discuss this in the text (p. 10): “Thus, at least two mATG8s, each one from two different mATG8 subclasses (LC3s and GABARAPs) or the entire membrane atg8ylation machinery was engaged in and required for proper GLUT-1 sorting”.  

      (7) In Figure 5, knockdown of ATG5 in FIP200 KO cells inhibited GLUT1 sorting from endosomes, leading to its trafficking to lysosomes. However, it is known that very little remnant ATG5 in ATG5 KD cells is enough to support ATG8 lipidation. Therefore, it is essential to repeat this experiment using ATG5/FIP200 double KO or ATG5 KO combined with an autophagy inhibitor. 

      We point out to this limitation in the text (p. 11): “….we knocked down ATG5 in FIP200 KO cells (Fig. S5D) and found that GLUT1 puncta and GLUT1+LAMP2+ profiles increased even in the FIP200 KO background with the effects nearing those of VPS35 knockout (Figs. 5D-F and S5C), with the difference between VPS35 KO and ATG5 KD attributable to any residual ATG5 levels in cells subjected to siRNA knockdowns”.

      (8) In Figure 7, the authors show that the induction of CASM inhibited GLUT1 sorting from endosomes. However, ATG5 KO, which abolishes membrane ATG8ylation, also inhibits GLUT1 sorting. This seems paradoxical and requires a reasonable explanation or discussion. 

      We understand reviewer’s comment. The answer to this paradox is that it is actually the lysosomal damage that causes GLUT1 mis-sorting and not CASM. Membrane atg8ylation, such as CASM and probably other processes given that involvement of both ATG2 and ESCRTs (Fig. 8) counteracts the damage and works in the direction of restoring/maintaining proper retromer-dependent sorting. This is now explained better in the text, and have revised the title of Fig. 7 to read “Lysosomal damage causes GLUT1 mis-sorting”. Our data with bafilomycin A1 show that it is the perturbance of lysosomes (not CASM per se) that leads to mis-sorting of GLUT1 (Fig. 7D,E), and our data with ATG2AB DKO and ESCRT (VPS37A) KO (Fig. 8A-F) indicate that repair of lysosomes is important to keep the retromer working machinery functional (as illustrated in Fig. 8G), which may be one of the effector mechanisms downstream of membrane atg8ylation  in general (and hence also of CASM).  

      (9) The immuno-staining results for Figures 7F and 7G are lacking. 

      We now provide the requested images.

      (10) In Figure 8D, the quality of the image for VPS37 KO cells treated with LLOME is not sufficient to show increased colocalization between GLUT1 and LAMP2. 

      We now provide a different example image. We note that these are epiflorescent HCM images  

      Minor Concerns: 

      (1) It would be better to distinguish the function of the membrane ATG8ylation machinery (i.e., ATG5) from the function of membrane ATG8ylation in the description. No ATG8ylation-deficient mutants were used in this study. 

      We have used atg8ylation mutants (e.g. KOs in ATG3, ATG5, ATG7, and ATG16L1). We now emphasize this better in the text (p. 10). 

      (2) In Figure 2D, a green box appears there by incident. 

      This has been fixed.

      (3) In Figure 3A, the conjugate for ATG5-ATG12 is absent in the gel for IB: ATG5.

      The ATG5 antibody used in Fig. 3A recognizes primarily the conjugated form of ATG5. This is now clarified in the figure legend. 

      (4) Figure 5G is missing in the manuscript. 

      Fig 5G is now mentioned in the text. Thank you.

      (5) The gRNA sequence information for FIP200 KO is missing in the Methods section. 

      Reference(s) to the already published gRNA sequence are in the manuscript. 

      (6) Suggest moving the last paragraph in Result section to Discussion section. 

      We kept this single-paragraph section in Results as it contains actual data.

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is unclear why the rescue of VPS35KO cells in Fig 1C-D is so modest. 

      Complementation data depend on transfection efficiency and some variability is to be expected.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figures 2A, 2C, 2E, and 2G lack scale bars. Figure 2D has a small square above the y axis. 

      Relative scale bars are now included. 

      (2) Figures S3B, S3D, and S3F lack scale bars. 

      Relative scale bars are now included.

    1. eLife Assessment

      This important study uses reinforcement learning to study how turbulent odor stimuli should be processed to yield successful navigation. They find that there is an optimal memory length over which an agent should ignore blanks in the odor to discriminate whether the agent is still inside the plume or outside of it, complementing recent studies using RNNs and finite state controllers to identify optimal strategies for navigating a turbulent plume. While the overall strength of evidence is convincing, better justification for using Brownian motion as a recovery strategy and the addition of accompanying code for reproducibility would add to this strength.

    2. Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lend confidence to the conclusions about the existence of an optimal memory duration. There are a few points or questions that could be addressed in greater detail in a revision:

      (1) Discussion of spatial encoding

      The manuscript contrasts the approach taken here (reinforcement learning in a grid world) with strategies that involve a "spatial map" such as infotaxis. The authors note that their algorithm contains "no spatial information." However, I wonder if further degrees of spatial encoding might be delineated to better facilitate comparisons with biological navigation algorithms. For example, the gridworld navigation algorithm seems to have an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right). I assume this is how the agent learns to move upwind in the absence of an explicit wind direction signal. However, not all biological organisms likely have this allocentric representation. Can the agent learn the strategy without wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates)? In discussing possible algorithms, and the features of this one, it might be helpful to distinguish<br /> (1) those that rely only on egocentric computations (run and tumble),<br /> (2) those that rely on a single direction cue such as wind direction,<br /> (3) those that rely on allocentric representations of direction, and<br /> (4) those that rely on a full spatial map of the environment.

      (2) Recovery strategy on losing the plume

      While the approach to encoding odor dynamics seems highly principled and reaches appealingly intuitive conclusions, the approach to modeling the recovery strategy seems to be more ad hoc. Early in the paper, the recovery strategy is defined to be path integration back to the point at which odor was lost, while later in the paper, the authors explore Brownian motion and a learned recovery based on multiple "void" states. Since the learned strategy works best, why not first consider learned strategies, and explore how lack of odor must be encoded or whether there is an optimal division of void states that leads to the best recovery strategies? Also, although the authors state that the learned recovery strategies resemble casting, only minimal data are shown to support this. A deeper statistical analysis of the learned recovery strategies would facilitate comparison to those observed in biology.

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest (line 280) that the number of olfactory states could potentially be reduced to reduce computational cost. This raises the question of whether there is a maximally efficient representation of odors and blanks sufficient for effective navigation. The authors choose to represent odor by 15 states that allow the agent to discriminate different spatial regimes of the stimulus, and later introduce additional void states that allow the agent to learn a recovery strategy. Can the number of states be reduced or does this lead to loss of performance? Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate the problem of olfactory search in turbulent environments using artificial agents trained using tabular Q-learning, a simple and interpretable reinforcement learning (RL) algorithm. The agents are trained solely on odor stimuli, without access to spatial information or prior knowledge about the odor plume's shape. This approach makes the emergent control strategy more biologically plausible for animals navigating exclusively using olfactory signals. The learned strategies show parallels to observed animal behaviors, such as upwind surging and crosswind casting. The approach generalizes well to different environments and effectively handles the intermittency of turbulent odors.

      Strengths:

      (1) The use of numerical simulations to generate realistic turbulent fluid dynamics sets this paper apart from studies that rely on idealized or static plumes.

      (2) A key innovation is the introduction of a small set of interpretable olfactory states based on moving averages of odor intensity and sparsity, coupled with an adaptive temporal memory.

      (3) The paper provides a thorough analysis of different recovery strategies when an agent loses the odor trail, offering insights into the trade-offs between various approaches.

      (4) The authors provide a comprehensive performance analysis of their algorithm across a range of environments and recovery strategies, demonstrating the versatility of the approach.

      (5) Finally, the authors list an interesting set of real-world experiments based on their findings, that might invite interest from experimentalists across multiple species.

      Weaknesses:

      (1) The inclusion of Brownian motion as a recovery strategy, seems odd since it doesn't closely match natural animal behavior, where circling (e.g. flies) or zigzagging (ants' "sector search") could have been more realistic.

      (2) Using tabular Q-learning is both a strength and a limitation. It's simple and interpretable, making it easier to analyze the learned strategies, but the discrete action space seems somewhat unnatural. In real-world biological systems, actions (like movement) are continuous rather than discrete. Additionally, the ground-frame actions may not map naturally to how animals navigate odor plumes (e.g. insects often navigate based on their own egocentric frame).

      (3) The lack of accompanying code is a major drawback since nowadays open access to data and code is becoming a standard in computational research. Given that the turbulent fluid simulation is a key element that differentiates this paper, the absence of simulation and analysis code limits the study's reproducibility.

    4. Author response:

      We thank the Editor and Reviewers for their work on our manuscript, and are happy to receive their positive comments, as well as their questions and suggestions. We are currently revising the manuscript and are planning to de-emphasize Brownian recovery as a simple yet biologically irrelevant benchmark and include comparisons with other biologically inspired strategies suggested by the reviewers. As for sharing the code and data: we completely agree: dataset 1 is already public and we will share the other dataset as well as the code. In a nutshell, we will be addressing the referee’s suggestions as follows:

      (1)   As Referee 1 points out, even if the algorithm does not require a map of space, the agent is still required to tell apart North, East, South and West relative to the wind direction which is implicitly assumed known. We will better clarify the spatial encoding required to implement these strategies.

      (2)   Referee 1 remarks that the learned recovery strategy works best and suggests to give it a more prominent role and better characterize it. We agree that what is done in the void state is definitely key and more work is needed to understand it. In the revised manuscript, we are planning to further substantiate the statistics of the learned recovery by repeating training several times and comparing several trajectories. Note that this strategy is much more flexible than the others and could potentially mix aspects of recovery to aspects of exploitation: we defer a more in-depth analysis that disentangles these two aspects elsewhere.

      (3)   Referee 1 asks whether an optimal, minimal representation of the olfactory states exists. Q learning defines the olfactory states prior to training and does not allow to systematically optimize odor representation for the task. Given the odor features, we can however discretize them in more or less olfactory states. We expect that decreasing the number of olfactory states provides less positional information and potentially degrades performance, although loss in performance may be overshadowed by noise or by efficient recovery. We are planning to re-train our model with a smaller numer of non-void states and will provide the comparison. The number of void states does not need further testing: we chose 50 void states because it matches the time agents typically remain in the void and indeed achieves very high performance (less than 50 void states results in no convergence and more than 50 introduces states that are rarely visited)

      (4)   Both reviewers correctly remark that Brownian motion is not biologically relevant. We will make sure to further clarify that this is a rather simple --but biologically irrelevant-- benchmark. We are planning to include results with both circling and zigzaging as biologically inspired recovery strategies.

      (5)   We agree with reviewer 2 that animal locomotion does not look like a series of discrete displacements on a checkerboard. However, to overcome this limitation, one has to first focus on a specific system to define actions in a way that best adheres to a species’ motor controls. Second, these actions are likely continuous, which makes reinforcement learning notoriously more complex. While we agree that more realistic models are definitely needed for a comparison with real systems, this remains outside the scope of the current work.

      (6)   We agree with the referees and editor that it is important to publish the code and data alongside with the manuscript. It was already planned and we will make sure to share the links within the revised version of the manuscript.

    1. eLife Assessment

      This study presents valuable findings on the role of a well-studied signal transduction pathway, the Slit/Robo system, in the context of the assembly of the hematopoietic niche in the Drosophila embryo. The evidence supporting the claims of the authors is solid. The work will interest developmental biologists working on molecular mechanisms of tissue morphogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Nelson et al. is focused on formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM) and cardioblasts (CBs) - in coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      Strengths:

      • Using expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      • Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      • Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs, because PSCs show a phenotype even when CBs do not (Fig 4G).

      • New insight into dorsal vessel formation by VM is presented in Fig 4A,B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      • The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Fig. 4G).

      • If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Fig 4D.

      • Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

    3. Reviewer #2 (Public review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams) and cardioblasts (CBs) are in proximity of PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Previous major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like specification of their cell types, structure and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      (2) The densely, parallel aligned fibers in the lower part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      (4) Discussion is very lengthy and should shortened.

      Comments on revised version:

      Although the authors have responded to my concerns as they deemed suitable, these concerns still stand for the revised version.

    4. Reviewer #3 (Public review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessel. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented and well documented, and fully support the conclusions drawn from the different experiments.

      The authors have addressed all of my previous comments, in particular concerning the role of epidermal cell rearrangements during dorsal closure as a possible force acting on the movement of PSC cells. The authors have clarified their definition of "collective migration" as it applies to the movement of PSC. The revised paper will make an important contribution to our understanding of the mechanisms driving morphogenesis.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      The study by Nelson et al. is focused on formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM) and cardioblasts (CBs) - in coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      This Public Review by Reviewer #1 is identical to their original Public Review, thus we are unsure whether Reviewer #1 assessed the revised version of our manuscript, and whether they read our responses to their original Public Review. Below we summarize our original responses to the weaknesses listed for the first version of our manuscript.

      Strengths:

      • Using expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      • Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      • Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs, because PSCs show a phenotype even when CBs do not (Fig 4G).

      • New insight into dorsal vessel formation by VM is presented in Fig 4A,B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      • The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Fig. 4G).

      We maintain our conclusion, and, we point out that the Reviewer stated, “The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs”. We already added a statement to the Discussion reminding the reader of the possibility of secondary defects (“Finally, it is possible that PSC cells do not intrinsically require Robo activation, but rather CB-independent PSC mis-positioning in sli or robo mutants could be a secondary defect caused by compromised Slit-Robo signaling in some other tissue.”).

      • If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Fig 4D.

      As described in our first response, use of Antp-GAL4 with RNAi would be no better than a whole animal double Robo mutant.

      • Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      Vm does not directly contact the PSC, so the Vm cannot be physically pushing the PSC. In their original review, Reviewer #3 expressed similar concerns (Weaknesses #1 and #2), and upon their review of our revised manuscript they determined we addressed these concerns.

      Reviewer #2 (Public review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams) and cardioblasts (CBs) are in proximity of PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Previous major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like specification of their cell types, structure and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We state in our original response that exploring the functional consequences of PSC mis-positioning was outside the scope of this study. Given that the necessary cis-regulatory modules have not been identified at Antp or Odd, creating a split-GAL4 with ‘Antp and Odd promoters’ cannot be accomplished in a reasonable time frame, as we previously detailed in our original response.

      (2) The densely, parallel aligned fibers in the lower part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      This was directly addressed by the additional data included in our revision. When epidermal closure is stalled, the PSC is able to migrate past the stalled leading edge, closer to the midline.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We addressed this comment in detail in our original response. Briefly, double-blinding was oftentimes not possible due to the obviousness of the genotype in the image. The criteria we outline for normal PSC positioning is as comprehensive as possible given the subtlety variability of mis-positioning phenotypes. Two of the authors independently analyzed the relatively large sets of samples and arrived at the same conclusions.

      (4) Discussion is very lengthy and should shortened.

      We shortened the Discussion in the revised version.

      Comments on revised version:

      Although the authors have responded to my concerns as they deemed suitable, these concerns still stand for the revised version.

      Given our responses above and the lack of detail in this comment, we are unsure why the Reviewer is still concerned.

      Reviewer #3 (Public review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessel. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented and well documented, and fully support the conclusions drawn from the different experiments.

      The authors have addressed all of my previous comments, in particular concerning the role of epidermal cell rearrangements during dorsal closure as a possible force acting on the movement of PSC cells. The authors have clarified their definition of "collective migration" as it applies to the movement of PSC. The revised paper will make an important contribution to our understanding of the mechanisms driving morphogenesis.

      We are appreciative of the time spent by the Reviewer reading our responses and assessing the revision.

      ---------

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Nelson et al. is focused on the formation of the Drosophila Posterior Signaling Center (PSC) which ultimately acts as a niche to support hematopoietic stem cells of the lymph gland (LG). Using a combination of genetics and live imaging, the authors show that PSC cells migrate as a tight collective and associate with multiple tissues during a trajectory that positions them at the posterior of the LG.

      This is an important study that identifies Slit-Robo signaling as a regulator of PSC morphogenesis, and highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development. However, one point requiring clarification is the idea that PSC cells exhibit a collective cell migration; it is not clear that the cells are migrating rather than being pushed to a more dorsal position through dorsal closure and/or other similar large-scale embryo movement. This does not detract from the very interesting analysis of PSC morphogenesis as presented.

      Since each referee asked for clarification concerning collective cell migration, we present a combined response further below, placed after the comments from Reviewer #3.

      Strengths:

      (1) Using the expression of Hid or Grim to ablate associated tissues, they find evidence that the VM and CB of the dorsal vessel affect PSC migration/morphology whereas the alary muscles do not. Slit is expressed by both VM and CBs, and therefore Slit-Robo signaling was investigated as PSCs express Robo.

      (2) Using a combination of approaches, the authors convincingly demonstrate that Slit expression in the CBs and VM acts to support PSC positioning. A strength is the ability to knockdown slit levels in particular tissue types using the Gal4 system and RNAi.

      (3) Although in the analysis of robo mutants, the PSC positioning phenotype is weaker in the individual mutants (robo1 and robo2) with only the double mutant (robo1,robo2) exhibiting a phenotype comparable to the slit RNAi. The authors make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G).

      (4) New insight into dorsal vessel formation by VM is presented in Figure 4A, B, as loss of the VM can affect dorsal vessel morphogenesis. This result additionally points to the VM as important.

      Weaknesses:

      (1) The authors are cautioned to temper the result that Slit-Robo signaling is intrinsic to PSC since the loss of robo may affect other cell types (besides CBs and PSCs) to indirectly affect PSC migration/morphogenesis. In fact, in the robo2, robo1 mutant, the VM appears to be incorrectly positioned (Figure 4G).

      We have reexamined our wording in the relevant Results section and, given that this referee agrees that we, “make a reasonable argument that Slit-Robo signaling has an intrinsic effect, likely acting within PSCs because PSCs show a phenotype even when CBs do not (Figure 4G)”, it was not clear how we might temper our conclusions more. Given that PSC cells express Robo1 and Robo2, and that the Vm does not contact the PSC, our ‘reasonable argument’ appears fair and parsimonious. Since we agree with the referee that a reader should be made as aware as possible of alternatives, we will add a comment to the Discussion, reminding the reader of the possibility of a secondary defect.

      (2) If possible, the authors should use RNAi to knockdown Robo1 and Robo2 levels specifically in the PSCs if a Gal4 is available; might Antp.Gal4 (Fig 1K) be useful? Even if knockdown is achieved in PSCs+CBs, this would be a better/complementary experiment to support the approach outlined in Figure 4D.

      While we agree that PSC-specific knockdown of Robo1 and Robo2 simultaneously would be ideal, this is not possible. First, the most-effective UAS-RNAi transgenes (that is, those in a Valium 20 backbone) are both integrated at the same chromosomal position; these cannot be simultaneously crossed with a GAL4 transgenic line to attempt double knock down. Additionally, as with all RNAi approaches that must rely on efficient knockdown over the rapid embryonic period, even having facile access to the above does not ensure the RNAi approach will cause as effective depletion as the genetic null condition that we use. Second, as the referee concedes, there is no embryonic PSC-specific GAL4. The proposed use of Antp-GAL4 would cause knockdown in many tissues (PSC, CB, Vm, epidermis and amnioserosa). This would lead to a reservation similar to that caused by our use of the straight genetic double mutant, as regards potential indirect requirement for Robo function.

      (3) Movies are hard to interpret, as it seems unclear that the PSCs actively migrate rather than being pushed/moved indirectly due to association with VM and CBs/dorsal vessel.

      First, the Vm does not directly contact the PSC, so it cannot be pushing the PSC dorsally. We will re-examine our text to be certain to make this clear. Second, in our analysis of bin mutants, which lack Vm, LGs and PSCs are able to reach the dorsal midline region in the absence of Vm. Finally, please see our response to Reviewer #3, point 2, for why we maintain that PSC cells are “migrating” even though some PSC cells are attached to CBs.

      Reviewer #2 (Public Review):

      The paper by Nelson KA, et al. explored the collective migration, coalescence, and positioning of the posterior signaling center (PSC) cells in Drosophila embryo. With live imaging, the authors observed the dynamic progress of PSC migration. Throughout this process, visceral mesoderm (VM), alary muscles (Ams), and cardioblasts (CBs) are in proximity to PSC. Genetic ablation of these tissues reveals the requirement for VM and CBs, but not AMs in this process. Genetic manipulations further demonstrated that Slit-Robo signaling was critical during PSC migration and positioning. While the genetic mechanisms of positioning the PSC were explored in much detail, including using live imaging, the functional consequence of mispositioning or (partial) absence of PSC cells has not been addressed, but would much increase the relevance of their findings. A few additional issues need to be addressed as well in this otherwise well-done study.

      Major points:

      (1) The only readout in their experiments is the relative correctness of PSC positioning. Importantly, what is the functional consequence if PSC is not properly positioned? This would be particularly important with robo-sli manipulations, where the PSC is present but some cells are misplaced. What is the consequence? Are the LGs affected, like the specification of their cell types, structure, and function? To address this for at least the robo-slit requirement in the PSC, it may be important to manipulate them directly in the PSC with a split Gal4 system, using Antp and Odd promoters.

      We agree that the functional consequence of PSC mis-positioning is important and a relevant question to eventually address. However, virtually all markers and reagents used to assess the effect of the PSC on progenitor cells and their differentiated descendants are restricted to analyses carried out on the third larval instar - some three days after the experiments reported here. Most of the manipulated conditions in our work are no longer viable at this phase and, thus, addressing the functional consequences of a malformed PSC will require the field to develop new tools. 

      As we noted in the Introduction, the consistency with which the wildtype PSC forms as a coalesced collective at the posterior of the LG strongly suggests importance of its specific positioning and shape, as has now been found for other niches (citations in manuscript). Additionally, in the Discussion we mention the existence of a gap junction-dependent calcium signaling network in the PSC that is important for progenitor maintenance. Without continuity of this network amongst all PSC cells (under conditions of PSC mis-positioning), we strongly anticipate that the balance of progenitors to differentiated hemocytes will be mis-managed, either constitutively, and / or under immune challenge conditions. 

      Finally, to our knowledge, the tools do not exist to build a “split Gal4 system using Antp and Odd promoters”. The expression pattern observed using the genomic Antp-GAL4 line must be driven by endogenous enhancers–none of which have been defined by the field, and thus cannot be used in constructing second order drivers. Similarly, for odd skipped, in the embryo the extant Odd-GAL4 driver expresses only in the epidermis, with no expression in the embryonic LG. Thus, the cis regulatory element controlling Odd expression in the embryonic LG is unknown. In the future, the discovery of an embryonic PSC-specific driver will aid in addressing the specific functional consequences of PSC mis-positioning.

      (2) The densely, parallel aligned fibers in the part of Figure 1J seemed to be visceral mesoderm, but further up (dorsally) that may be epidermis. It is possible that the PSC migrate together with the epidermis? This should be addressed.

      See response to Reviewer #3.

      (3) Although the authors described the standards of assessing PSC positioning as "normal" or "abnormal", it is rather subtle at times and variable in the mutant or KD/OE examples. The criteria should be more clearly delineated and analyzed double-blind, also since this is the only readout. Further examples of abnormal positioning in supplementary figures would also help.

      We appreciate the Reviewer’s concern and acknowledge that the phenotypes we observed were indeed variable, and, at times subtle. As we show and discuss in the paper, our results revealed that the signaling requirements for proper PSC positioning are complex; this was favorably commented upon by Reviewer #1 (“...highlights the complex relationship of interacting cell types - PSC, visceral mesoderm (VM), and cardioblasts (CBs) - in the coordinated development of these three tissues during organ development.…”). We suspect the phenotypic variability is attributable to any number of biological differences such as heterogeneity of PSC cells and an accompanying difference in the timing of their competence to receive and respond to Slit-Robo signaling, the timing of release of Slit from CBs and Vm, number of cells in a given PSC, which PSC cells in the cluster respond to too little or too much signaling, and/or typical variability between organisms. Furthermore, PSC positioning analyses were conducted by two of the authors, who independently came to the same conclusions. For many of the manipulations double blinding was not possible since the genotype of the embryo was discernible due to the obvious phenotype of the manipulated tissue.

      (4) The Discussion is very lengthy and should shortened.

      We will re-examine the prose and emphasize more conciseness, while maintaining clarity for the reader.

      Reviewer #3 (Public Review):

      Summary:

      This work is a detailed and thorough analysis of the morphogenesis of the posterior signaling center (PSC), a hematopoietic niche in the Drosophila larva. Live imaging is performed from the stage of PSC determination until the appearance of a compact lymph gland and PSC in the stage 16 embryo. This analysis is combined with genetic studies that clarify the involvement of adjacent tissue, including the visceral mesoderm, alary muscle, and cardioblasts/dorsal vessels. Lastly, the Slit/Robo signaling system is clearly implicated in the normal formation of the PSC.

      Strengths:

      The data are clearly presented, well documented, and fully support the conclusions drawn from the different experiments. The manuscript differs in character from the mainstay of "big data" papers (for example, no sets of single-cell RNAseq data of, for instance, PSC cells with more or less Slit input, are offered), but what it lacks in this regard, it makes up in carefully planned and executed visualizations and genetic manipulations.

      Weaknesses:

      A few suggestions concerning improvement of the way the story is told and contextualized.

      (1) The minute cluster of PSC progenitors (5 or so cells per side) is embedded (as known before and shown nicely in this study) in other "migrating" cell pools, like the cardioblasts, pericardial cells, lymph gland progenitors, alary muscle progenitors. These all appear to move more or less synchronously. What should also be mentioned is another tissue, the dorsal epidermis, which also "moves" (better: stretches?) towards the dorsal midline during dorsal closure. Would it be reasonable to speculate (based on previously published data) that without the force of dorsal closure, operating in the epidermis, at least the lateral>medial component of the "migration" of the PSC (and neighboring tissues) would be missing? If dorsal closure is blocked, do essential components of PSC and lymph gland morphogenesis (except for the coming-together of the left and right halves) still occur? Are there any published data on this?

      Each of the Reviewers is interested in our response to this very relevant question, and, thus, we will address the issue en bloc here. First, we will add a Supplementary Figure showing that LG and CBs are still able to progress medially towards the dorsal midline when dorsal closure stalls.  This rules out any major effect for the most prominent “large-scale embryo cell sheet movement” in positioning the PSC. Second, published work by Haack et. al. and Balaghi et. al. shows that CBs and leading edge epidermal cells are independently migratory, and we will add this context to the manuscript for the reader.

      (2) Along similar lines: the process of PSC formation is characterized as "migration". To be fair: the authors bring up the possibility that some of the phenotypes they observe could be "passive"/secondary: "Thus, it became important to test whether all PSC phenotypes might be 'passive', explained by PSC attachment to a malforming dorsal vessel. Alternatively, the PSC defects could reflect a requirement for Robo activation directly in PSC cells." And the issue is resolved satisfactorily. But more generally, "cell migration" implies active displacement (by cytoskeletal forces) of cells relative to a substrate or to their neighbors (like for example migration of hemocytes). This to me doesn't seem really clearly to happen here for the dorsal mesodermal structures. Couldn't one rather characterize the assembly of PSC, lymph gland, pericardial cells, and dorsal vessel in terms of differential adhesion, on top of a more general adhesion of cells to each other and the epidermis, and then dorsal closure as a driving force for cell displacement? The authors should bring in the published literature to provide a background that does (or does not) justify the term "migration".

      Before addressing this specifically, we remind readers of our response above that states the rationale ruling out large, embryo-scale movements, such as epidermal dorsal closure, in driving PSC positioning. So, how are PSC cells arriving at their reproducible position? This manuscript reports the first live-imaging of the PSC as it comes to be positioned in the embryo. We interpret these movies to suggest strongly that these cells are a ‘collective’ that migrates. Neither the data, nor we, are asserting that each PSC cell is ‘individually’ migrating to its final position. Rather, our data suggest that the PSC migrates as a collective. The most paradigmatic example of directed, collective cell migration, is of Drosophila ovarian border cells. That cell cluster is surrounded at all times by other cells (nurse cells, in that case), and for the collective to traverse through the tissue, the process requires constant remodeling of associations amongst the migrating cells in the collective (the border cells), as well as between cells in the collective and those outside of it (the nurse cells). In fact, the nurse cells are considered the substrate upon which border cells migrate. Note also that in collective border cell migration cells within the collective can switch neighbors, suggesting dynamic changes to cell associations and adhesions. 

      In our analysis, the PSC cells exhibit qualities reminiscent of the border cells, and thus we infer that the PSC constitutes a migratory cell collective.  We also show in Figure 1H that PSC cells exhibit cellular extensions, and thus have a very active, intrinsic actin-based cytoskeleton. In fact, in Figure 1I, we point out that PSC cells shift position within the collective, which is not only a direct feature of migration, but also occurs within the border cell collective as that collective migrates. Additionally, the fact that the lateral-most PSC cells shift position in the collective while remaining a part of the collective–and they do this while executing net directional movement–makes a strong argument that the PSC is migratory, as no cell types other than PSCs are contacting the surfaces of those shifting PSC cells. Lastly, the Reviewer’s supposition that, rather than migration, dorsal mesoderm structures form via “differential adhesion, on top of a more general adhesion of cells to each other” is, actually, precisely an inherent aspect of collective cell migration as summarized above for the ovarian border collective.

      In our resubmission we will adjust text citing the existing literature to better put into context the reasoning for why PSC formation based on our data is an example of collective cell migration.

      (3) That brings up the mechanistic centerpiece of this story, the Slit/Robo system. First: I suggest adding more detailed data from the study by Morin-Poulard et al 2016, in the Introduction, since these authors had already implicated Slit-Robo in PSC function and offered a concrete molecular mechanism: "vascular cells produce Slit that activates Robo receptors in the PSC. Robo activation controls proliferation and clustering of PSC cells by regulating Myc, and small GTPase and DE-cadherin activity, respectively". As stated in the Discussion: the mechanism of Slit/Robo action on the PSC in the embryo is likely different, since DE-cadherin is not expressed in the embryonic PSC; however, it maybe not be THAT different: it could also act on adhesion between PSC cells themselves and their neighbors. What are other adhesion proteins that appear in the late lateral mesodermal structures?

      Could DN-cadherin or Fasciclins be involved?

      We agree with the Reviewer that Slit-Robo signaling likely acts in part on the PSC by affecting PSC cell adhesion to each other and/or to CBs (lines 428-435). As stated in the Discussion, we do not observe Fasciclin III expression in the PSC until late stages when the PSC has already been positioned, suggesting that Fasciclin III is not an active player in PSC formation. Assessing whether the PSC expresses any other of the suite of potential cell adhesion molecules such as DN-Cadherin or other Fasciclins, and then study their potential involvement in the Slit-Robo pathway in PSC cells, would be part of a follow-up study.  

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to address several key issues and provide more explicit clarification when interpreting the behavior of the PSC cells as "migration." It is recommended that the authors engage with all reviewers' comments and refine the text based on the feedback they find valuable.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) Is it possible to assay robo1 and/or robo1 RNAi in a tissue-specific manner to further explore an intrinsic role in the PSC? Might the VM indirectly affect PSCs in a CB-independent manner? How does this affect the interpretation of results in Figure 4.

      See also our response to Reviewer #1, Public review weaknesses #2.

      Though we agree with the Reviewer that this is the better experiment to test for an intrinsic role for Robo in the PSC, this experiment is not possible at this time. As we noted in the manuscript, we do not yet have an embryonic PSC-specific GAL4, though we have been putting efforts towards identifying/developing such a tool. The Antp-GAL4 driver we used in this study will drive not only in both PSCs and CBs, but also in Vm, epidermis, and amnioserosa, as well as other tissues. The other available embryonic PSC drivers are not specific to the PSC and will drive expression in CBs and Vm, at minimum. This, combined with the reality that RNAi can be ineffective in embryonic tissues, resulted in our use of whole organism mutants to best address this question. 

      We acknowledge that it is possible the Vm indirectly effects the PSC in a CB-independent manner in the double Robo mutant, and we added a statement to the Discussion reiterating this point. However, because the PSC expresses Robo1 and Robo2, we maintain that the simplest interpretation of the results in Figure 4 is that PSC cells require intrinsic Robo signaling. And, as we state in the manuscript, it is possible that Slit signals directly from Vm to Robo on the PSC.

      (2) As this is the first study to be presenting PSC formation as involving collective cell migration, can the authors provide experimental evidence and rationale for this categorization?

      We have added our rationale to the Results section in the revision.

      See also our response to Reviewer #3, Public review weakness #2.

      (3) The Slit staining presented in Fig 3 W', Z' should be quantified. Furthermore, what is the VM phenotype when Robo1 is overexpressed? Is there a VM-specific phenotype and could this indirect effect cause the PSC to misform/mismigrate?

      We didn’t quantify Slit levels in the Vm-specific Robo overexpression condition because there was a visually striking difference compared to controls (increased intensity and specific localization to Vm membranes), and the manipulation resulted in a PSC phenotype. Thus, the evidence we show appears sufficient to strongly suggest that our genetic manipulation resulted in successful trapping of Slit on the Vm.

      As to a Vm phenotype when Robo1 is overexpressed Vm-specifically: we know Vm is present, but we haven’t performed an in-depth phenotypic analysis. In the manuscript we show that this manipulation at least affects organization of PSC-adjacent CBs, which we go on to show is correlated with mis-positioned PSCs. Thus, the PSC phenotype in this condition is not solely due to a Vm-specific phenotype.

      Minor concerns/suggestions:

      (1) I might have missed it but where are the Movies referenced in the text? Are legends provided for the videos? It is important that this is included in the final version (or more clearly presented if I missed it).

      We thank you the Reviewer for pointing this out; we now direct the reader to the movies at appropriate places within the text.

      (2) In Figure 5, it might be helpful to add a third column to A in which the PSCs are pseudo-colored and thus highlighted because it is difficult to discern the white (not pink) PSCs...

      We appreciate the suggestion and now include these panels as Figure 5A’’ in the revision.

      (3) If I am following correctly, the lost PSC cells in Figure 5 don't move. Doesn't this suggest that what is critical is that the PSCs attach to the VM and/or CBs, and not necessarily that they are an actively migrating cell type? They "move" but might be passively carried.

      See also the response to Reviewer #3, Public reviews weaknesses #2.

      The Reviewer is correct that the PSC cells in Fig. 5 don’t move very much, but we interpret this differently from the Reviewer. After detachment of the cells in question they undergo dramatic shape changes, indicating active cytoskeletal remodeling, so the molecular machinery needed for migration appears to remain intact. Thus, we suggest that this observation actually emphasizes our finding that collectivity is needed for the migration. Given the consistency of PSC coalescence/collectivity and the intricate regulation that controls it, we believe it to be an integral part of PSC identity. When PSC cells become detached, they likely lose an aspect of their identity. In various manipulations we’ve noted instances of severely dispersed PSC cells expressing very low levels of identity markers Antp or Odd. Cells in such cases are likely compromised for their function, and this can include, for example, whether they can properly sense cues for migration.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      (1) The expression pattern of Antp-Gal4 > myrGFP in the whole embryo should be shown to better demonstrate the overlap with Odd. How does it compare with Antp-Gal4 > CD8::GFP?

      We do not understand the question posed. We are not suggesting that Antp and Odd overlap in all cells, nor even many cells. It has been demonstrated by the field that co-expression among mesodermal cells, in the position where LG cells are specified, is a marker for the PSC. We have not thoroughly investigated all reporter lines for the GAL4 drivers used by the field.

      (2) Does Tincdelta4-Gal4 not at all express in the PSC? This should be verified.

      This question appears to refer to depletion of Slit by RNAi or cell killing driven by tinCΔ4-GAL4. TinCΔ4-GAL4 is expressed in CBs and in precisely 1 embryonic PSC cell. First, Slit isn’t expressed by any PSC cells to our eye, so any PSC mis-positioning observed upon tinCΔ4>Sli RNAi implicates CB involvement in PSC positioning. In designing tests for CB involvement, we were unable to identify any mutant known to lack CBs (or have fewer CBs) that didn’t also affect specification of the LG/PSC. The cell killing approach seemed best.  It is possible that, in this scenario, perhaps ablation of a single, key PSC cell could affect final positioning of the other PSCs, but we think that less likely than a role for CBs. We also retain our original conclusion due to the fact that we often find mis-positioned PSC cells adjacent to mis-positioned CBs, including in the panel representing the CB ablation experiment, Figure 2S.  

      (3) Line 212: The data provide evidence that Vm is necessary, but clearly not sufficient, as CBs are also necessary.

      We see how this wording was misleading and have adjusted the text accordingly.

      (4) The CBs are not visible in Figure 3B.

      We are unsure what the Reviewer is referring to, as we are certain that the signal between the blue outlines is indeed Slit expression in CBs.

      Reviewer #3 (Recommendations For The Authors):

      One minor mistake (I believe): in line 229 it should say "3C and 3D"

      We have corrected this error.

    1. eLife Assessment

      This study unveils important data describing cell states of olfactory ensheathing cells, and how these cell states may relate to repair after spinal cord injury. The framework used for characterizing these cells is solid, although the study would be strengthened by additional quantification of immunohistochemical data and by complementing expression data with functional outcomes. This work will be of interest to stem cell biologists and spinal cord injury researchers.

    2. Joint Public Review:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      Strengths

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Challenges

      Despite the breadth of information presented, and although many of the suggestions in the initial review were addressed well, some points related to quantification and discussion of sex differences are not fully addressed in this revision.

      (1) The request for quantification of OEC bridges is not fully addressed. We note that this revision includes the following statement (page 6): "We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals." However, the title of the paper states that olfactory ensheathing cells promote neural repair and the abstract states that "OECs transplanted near the injury site modify the inhibitory glial scar and facilitate axon regeneration past the scar border and into the lesion." Statements such as these make it more crucial to include quantification of OEC bridges, because if single images are shown of remarkable, unusual bridges, but only one sentence acknowledges the low frequency of this occurrence, then this information taken together might present the wrong takeaway to readers.

      Including some sort of quantification of bridging, whether it be the number of rats exhibiting bridges, the percentage area of OECs near a lesion site, or some other meaningful analysis, would add rigor and clarity to the manuscript.

      (2) The additional discussion of sex differences in OEC bridging elaborates on the choice to study female rats, citing bladder challenges in male rats, but does not note salient clinical implications of this choice. Men account for ~80% of spinal cord injuries and likely also have worsened urinary tract issues, so it would be important to acknowledge this clinical fact and consider including males in future studies.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary

      This manuscript explores the transcriptomic identities of olfactory ensheathing cells (OECs), glial cells that support life-long axonal growth in olfactory neurons, as they relate to spinal cord injury repair. The authors show that transplantation of cultured, immunopurified rodent OECs at a spinal cord injury site can promote injury-bridging axonal regrowth. They then characterize these OECs using single-cell RNA sequencing, identifying five subtypes and proposing functional roles that include regeneration, wound healing, and cell-cell communication. They identify one progenitor OEC subpopulation and also report several other functionally relevant findings, notably, that OEC marker genes contain mixtures of other glial cell type markers (such as for Schwann cells and astrocytes), and that these cultured OECs produce and secrete Reelin, a regrowth-promoting protein that has been disputed as a gene product of OECs.

      This manuscript offers an extensive, cell-level characterization of OECs, supporting their potential therapeutic value for spinal cord injury and suggesting potential underlying repair mechanisms. The authors use various approaches to validate their findings, providing interesting images that show the overlap between sprouting axons and transplanted OECs, and showing that OEC marker genes identified using single-cell RNA sequencing are present in vivo, in both olfactory bulb tissue and spinal cord after OEC transplantation.

      Despite the breadth of information presented, however, further quantification of results and explanation of experimental approaches would be needed to support some of the authors' claims. Additionally, a more thorough discussion is needed to contextualize their findings relative to previous work.

      (1) a. Important quantification is lacking for the data presented. For example, multiple figures include immunohistochemistry or immunocytochemistry data (Figures 1, 5, 6), but they are presented without accompanying measures like fractions of cells labeled or comparisons against controls.

      We would like to clarify that the immunohistochemistry or immunocytochemistry data presented are meant to be qualitative rather than quantitative. The main purpose of the images is to show the presence or absence of markers of OEC subtypes rather than how much is present. That being said, in the revision we now add quantitative estimates of cell fractions for OECs along with other major cell types in Supplemental Table 1 and each OEC subtype marker in Supplemental Table 2. 

      b. As a result, for axons projecting via OEC bridges in Figure 1, it is unclear how common these bridges are in the presence or absence of OECs.

      We note that the number of spinal cord transected rats with bridges of axons crossing the lesion core are extremely rare following a severe spinal cord injury in adult mammals. Our first example of axon bridging following a complete spinal cord transection followed by OEC transplants was reported in Thornton et al., (2018) and compared to an incomplete transection in a fibroblast-transplanted control in his Figure 4. That figure also appeared the cover of Experimental Neurology when the paper was published. Figure 1 in the current paper was from an independent experiment which replicated the previously observed rare bridge formation. We noted this in the revised manuscript.

      Page 6: “We note, however, that such bridge formation is rare following a severe spinal cord injury in adult mammals.”

      c. For Figure 6., it is unclear whether cells having an alternative OEC morphology coincide with progenitor OEC subtype marker genes to a statistically significant degree. (see top paragraph on page 11)

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: one type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our OEC subtypes based on scRNAseq with those previously described based on morphology alone (Franceschini and Barnett, 1996). There could be agreement between large, flat or small fusiform OECs morphological and their progenitor status, but it is not required that the two classification types would significantly overlap. Here we report the percentage of morphology-based cell subtypes that show expression of our OEC subtype markers to estimate the overlap between the two. Our results indicate the two types of OEC morphologies share a certain degree of overlap, a finding that indicates similarities as well as differences between the two classification methods.

      In our results section we show that ~3/4ths of the Ki67-expressing OEC progenitor cells sampled were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remaining ~1/4th of the Ki67-labeled  OECs were fusiform in shape and expressed Ngfr<sup>p75</sup> strongly. We feel that this is important to include as it is the only previous report of OB-OEC subtypes. The statistics of these results were in our original manuscript on page 11 and we further revise the text as follows:

      Page 12: “To determine if the proliferative OECs differ in appearance from adult OECs, and whether there is concordance between our OEC subtypes based on gene expression markers and previously described morphology-based OEC subtyping (Franceschini & Barnett, 1996), we analyzed OECs identified with the anti-Ki67 nuclear marker and anti-Ngfr<sup>p75</sup>  (Figure 6g-h). Of the Ki67-positive OECs in our cultures, 24% ± 8% were strongly Ngfr<sup>p75</sup>-positive and spindle-shaped, whereas 76% ± 8% were flat and weakly Ngfr<sup>p75</sup>-labeled (n=4 cultures, p\= 0.023). Here we show that a large percentage (~3/4ths) of proliferative OECs are characterized by large, flat morphology and weak Ngfr<sup>p75</sup> expression resembling the previously described morphology-based astrocyte-like subtype. Our results indicate the two types of OEC classifications share a certain degree of overlap, indicating similarities but also differences between the two classification methods.”

      d. Similar quantification is missing in other types of data such as Western blot images (Fig. 9) and OEC marker gene data (for which p-values are not reported; Table S2). 

      Response on Western blots: The Western blot signals shown in Figure 9 are from experiments that were designed to be qualitative rather than quantitative, by addressing the question, “Can we detect Reelin signals or not? in the different samples.” Both Western blots show that Reln<sup>+/+</sup> mouse olfactory bulbs (d) or cortices (e) contain Reelin whereas Reln<sup>-/-</sup>  samples do not and therefore provide positive and negative controls, respectively. The rat olfactory nerve layer (ONL, laminae I-II of olfactory bulb, d lane 1; e lane 3) contains mainly OECs wrapped around the axons of the olfactory sensory neurons that transmit olfactory signals into the olfactory bulb. To address your request for quantification, Dr. Khankan measured the density of the three isoforms of Reelin, 400 kD, 300 kD and 180 kD in Fig. 9e and normalized them against the GADPH control (37 kD). The graph below shows the normalized band density in arbitrary units on the Y-axis relative to the first 3 conditions, i.e., Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse cerebral cortices and rat  Reln<sup>+/+</sup> ONL. Because the conditioned medium was collected from tissue culture medium rather than cells or tissue, the GAPDH control was not present and therefore these data cannot be normalized in a similar analysis.  

      Author response image 1.

      Response for OEC marker gene data: We now add new full supplementary Table S1 (for major cell types) and Table S2 (for OEC subtypes) to report statistical p values and adjusted p values, as well as additional statistics information including percent cell expressing a subtype marker in a given subtype versus in other subtypes. 

      e. The addition of quantitative measures and, where appropriate, statistical comparisons with p-values or other significance measures, would be important for supporting the authors' claims and more rigorously conveying the results.

      As detailed in the above responses, we now add quantifications and statistics to support the claims and enhance the rigor of our analysis.

      (2) a. Some aspects of the experimental design that are relevant to the interpretation of the results are not explained. For example, OECs appear to be collected from only female rats, but the potential implications of this factor are not discussed.

      We added a short explanation in the Discussion and Methods section regarding why spinal cord injury studies are carried out on female rats.

      Page 24, Discussion: “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies prefer females as their short urethra facilitates daily manual bladder expression. Our study, therefore, was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation.”

      Page 26, Methods: “Only females were used in order to match the sex of previous SCI studies conducted exclusively on female rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018). Following complete thoracic spinal cord transection, an adult rat is unable to urinate voluntarily and therefore urine must be manually “expressed” twice a day throughout the experiment. Females have a shorter urethra than males, and thus their bladders are easier to empty completely.”

      b. Additionally, it is unclear from the manuscript to what degree immunopurified cells are OECs as opposed to other cell types. The antibody used to retain OECs, nerve growth factor receptor p75 (Ngfr-p75), can also be expressed by non-OEC olfactory bulb cell types including astrocytes [1-3]. The possible inclusion of Ngfr-p75-positive but non-OEC cell types in the OEC culture is not sufficiently addressed.

      (a) Cragnolini, A.B. et al., Glia, (2009), doi: 10.1002/glia.20857.

      (b) Vickland H. et al., Brain Res., (1991), doi: 10.1016/0006-8993(91)91659-O.

      (c) Ung K. et al., Nat Commun., (2021), doi: 10.1038/s41467-021-25444-3.

      Our OECs are dissected primarily from the olfactory nerve layer that is concentrated medially and ventrally around the olfactory bulb together with a small part of the glomerular layer (layer II). OECs are the only glia present in olfactory nerve layer. Thus, although it is possible that other cell types also express Ngfr-p75 as pointed out by the reviewer and in the references provided, our OEC dissection method severely limits the number of astrocytes that might be included in our cultures. We further provide additional evidence (see updated Figure 2d and the detailed responses to the next question) that our immunopanned OECs using our dissection method consistently express all classic OEC markers but do not consistently express the majority of classic markers for other glial cell types such as astrocytes or oligodendrocytes.

      Such non-OEC cell types are also not distinguished in the analysis of single-cell RNA sequencing data (only microglia, fibroblasts, and OECs are identified; Figure 2). Thus, it is currently unclear whether results related to the OEC subtype may have been impacted by these experimental factors.

      We need to clarify that when determining potential cell types in Figure 2, we compared our cell cluster marker genes against a broad array of cell types including astrocytes, oligodendrocytes and Schwann cells, but the gene overlap was only significant for microglia, fibroblasts, and OECs, which we labeled in new Figure 2d. We added more details in methods and results to clarify how we determined the cell types in Figure 2 (text added below). We did consider all the potential cell types that could have been present in our OEC cultures, including astrocytes. However, astrocyte or oligodendrocyte markers were not significantly enriched in the clusters, but markers for microglia, fibroblasts, and OECs were prominent in the cell clusters.

      In the revised Figure 2d, we now illustrate that the OEC clusters not only express typical OEC markers, but also express a few but not all marker genes from other glial cells. We show the comparative data on markers for astrocytes, oligodendrocytes, and Schwann cells in Figure 2d in parallel with the marker genes for OECs, microglia, and fibroblasts. For each of the other glial cell types, there are some genes which overlap with OECs, and that is the reason why we identified OECs as hybrid glia.

      Page 6, Results: “Based on previously reported cell type marker genes for fibroblasts and major glial cell types including OECs, astrocytes, oligodendrocytes, and microglia, we found elevated expression of OEC marker genes in clusters 2, 3 and 7, microglia marker genes in clusters 4, 6, and 7, and fibroblast marker genes in clusters 0, 1, and 5 (Figure 2d).”

      Page 33, Methods: “Additional marker genes for fibroblasts and multiple glial cell types including astrocytes, oligodendrocytes, and microglia were also used to compare with those of the cell clusters.”

      (3) The introduction, while well written, does not discuss studies showing no significant effect of OEC implantation after spinal cord injury. The discussion also fails to sufficiently acknowledge this variability in the efficacy of OEC implantation. This omission amplifies bias in the text, suggesting that OECs have significant effects that are not fully reflected in the literature. The introduction would need to be expanded to properly address the nuance suggested by the literature regarding the benefits of OECs after spinal cord injury. Additionally, in the discussion, relating the current study to previous work would help clarify how varying observations may relate to experimental or biological factors.

      We appreciate the insightful comment and have now included information about the variability in OEC transplantation in previous studies in both the introduction and discussion sections. We discuss technical differences that lead to variability in the Introduction and how our findings could help interpret the variability in the Discussion.

      Page 4-5: Text added to the Introduction: “The outcomes of OEC transplantation studies after spinal cord injury vary substantially in the literature due to many technical differences between their experimental designs. The source of OECs has a great impact on the outcome, with OB-OECs showing more promise than peripheral lamina propria-derived OECs, and purified, freshly-prepared OECs being required for optimal OEC survival. Other important variables include the severity of the injury (hemisection to complete spinal cord transection), the age of the spinal cord injured host (early postnatal versus adult), and OEC transplant strategies (delayed or acute transplantation, cell transplants with or without a matrix; Franssen et al., 2007). Franssen et al. (2007) evaluated studies that used only OECs as a transplant, and reported that 41 out of 56 studies showed positive effects, such as OEC stimulation of regeneration, positive interactions with the glial scar and remyelination of axons. More recent systematic reviews and meta-analyses on the effects of OEC transplantation following different spinal cord injury models reported that OECs significantly improved locomotor function (Watzlawick et al.2016; Nakjavan-Shahraki et al., 2018), but did not improve neuropathic pain (Nakjavan-Shahraki et al., 2018.)”

      Pages 24-25: Discussion on OEC source variability  “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between these OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and help resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      Reviewer #1 (Recommendations For The Authors):

      This is an extremely well-written and impactful series of experiments from a renowned leader in the field. The experimental questions are timely, with similar therapeutic approaches being prepared for clinical trial. The results address a gap that has persisted in the field for several decades and one that has been considered by many scientists long before technology existed to find answers. This highlights the importance of these experiments and the results reported here. With these things in mind, there are only a few minor factors that I have, that should be addressed to strengthen the paper.

      We truly appreciate the positive evaluations from the reviewer!

      Primary concerns

      (1) Quantification of results: The authors report on the data with broad brush strokes, missing the opportunity to quantify results and strengthen the interpretations. For instance, when describing gene expression, what proportion of cells analyzed were expressing these genes? How did this compare with detectable levels of protein? Can the author draw correlations between data sets collected that could offer even more insight into the identities of the cells studied? There is also a missed opportunity to evaluate how transplantation into injured neural tissue might alter gene expression of the phenotypes identified prior to transplantation.

      We appreciate these insightful comments and have added quantitative information and other relevant discussions in the revision. We now add Suppl Tables 1 (for major cell types including OECs, fibroblast, and microglia) and 2 (for OEC subtypes) to indicate the proportion of cells expressing each marker gene in each given cell cluster/subtype in the column. “Percentage of cells expressing the gene in the subtype/cell type” versus the proportion of cells expression the given marker genes in other cell types in the column “Percentage of cells expressing the gene in the other subtypes/cell types.” In the new supplementary tables, we report statistical p values and adjusted p values after multiple testing correction to indicate statistical significance.

      Regarding the comparison with protein levels, we carried out immunohistochemistry experiments to confirm the proteins corresponding to OEC subtype markers. Our findings show that proteins for the gene markers can be detected, and thereby supports our sc-seq findings. However, the immunofluorescence only provides a qualitative measure of protein levels in situ, so we cannot perform a correlation analysis. This is something we plan to  pursue in a follow-up study with measurable protein levels. We also discuss future directions to examine the genes and proteins in in vivo transplantation studies in the Discussion.

      (2) Discussion and interpretation: Greater depth to interpretation and discussion of data and its impact on future work is needed. For example, on pages 20-21, the authors reflect briefly on why Reelin might be of interest (it could lead to Dab-1 expression), but why is that important? There are several instances like this where it would be useful for the authors to provide a little more insight into the potential impact of these data and interpretations.

      We appreciate these valuable suggestions. We have revised our Results and Discussion sections to offer deeper insight and interpretation of the importance of the data, especially that for Reelin.

      Page 17: Results: “In the canonical Reelin-signaling pathway, Reelin binds to the very-low-density lipoprotein receptor (Vldlr) and apolipoprotein E receptor 2 (ApoER2) and induces Src-mediated tyrosine phosphorylation of the intracellular adaptor protein Disabled-1 (Dab1). Both Reelin and Dab1 are highly expressed in embryos and contribute to correct neuronal positioning.”

      Page 22-23, Discussion: “Reelin is a developmentally expressed protein detected in specific neurons, in addition to OECs and Schwann cells. The canonical Reelin-signaling pathway involves neuronal-secreted Reelin binding to Vldlr and ApoER2 receptors expressed on Dab1-labeled neurons. Following Reelin binding, Dab1 is phosphorylated by Src family kinases which initiates multiple downstream pathways. Very little is known, however, about Reelin secreted by glia. Panteri et al. (2006) reported that Schwann cells express low levels of Reelin in adults, and that it is upregulated following a peripheral nerve crush, as is reported above for many neurotrophic factors. Reelin loss in Schwann cells reduced the diameter of small myelinated axons but did not affect unmyelinated axons (Panteri et al., 2005). In the olfactory system, OECs ensheath the Dab1-labeled, unmyelinated axons of olfactory sensory neurons which are continuously generated and die throughout life. OEC transplantation following spinal cord injury would provide an exogenous source of Reelin that could phosphorylate Dab1-containing neurons or their axons. Dab1 is expressed at high levels in the axons of some projection neurons, such as the corticospinal pathway (Abadesco et al., 2014). Future experiments are needed to explore the function that glial-secreted Reelin may have on axonal regeneration.”

      Minor concerns

      (3) The authors reflect on the spontaneous glial bridge that develops in the repairing spinal cord of Zebrafish, but perhaps even more relevant is that this same phenomenon occurs in mammals as well if the spinal cord is injured during early development (opossum; Lane et al, EJN 2007). This should be considered and the statement that there is little regeneration in the mammalian spinal cord should be clarified.

      We appreciate this insightful comment. We now add discussions of the axonal regeneration and bridging observed following severe spinal cord injury in young developing mouse and opossum spinal cords.

      Page 23: “Adult mammals show little evidence of spontaneous axonal regeneration after a severe spinal cord injury in contrast to transected neonatal rats (Bregman, 1987; Bregman et al., 1993) and young postnatal opossums (Lane et al., 2007). In immature mammals, axons continue to project across or bridge the spinal cord transection site during development. Lower organisms such as fish, show even more evidence of regeneration following severe SCI. Mokalled et al. (2016) reported that glial secretion of Ctgfa/Ccn2 was both necessary and sufficient to stimulate a glial bridge for axon regeneration across the zebrafish transection site. Cells in the injury site that express Ctgf include ependymal cells, endothelial cells, and reactive astrocytes (Conrad et al., 2005; Mokalled et al., 2016; Schwab et al., 2001). Here we show that, although rare, Ctgf-positive OECs can contribute to glial bridge formation in adult rats. The most consistent finding among our severe SCI studies combined with OEC transplantation is the extent of remodeling of the injury site and axons growing into the inhibitory lesion site, together with OECs and astrocytes. The formation of a glial bridge across the injury was critical to the spontaneous axon generation seen in zebrafish (Mokalled et al., 2016) and likely contributed to the axon regeneration detected in our OEC transplanted, transected rats (Dixie, 2019; Khankan et al., 2016; Takeoka et al., 2011; Thornton et al., 2018).

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript title and abstract must include the species and sex studied.

      The title and abstract have been modified as suggested.

      Page 1: “Olfactory ensheathing cells from adult female rats are hybrid glia that promote neural repair”

      (2) OECs submitted for sequencing were like those about to be transplanted; however, the phenotype of the cells would likely change immediately and shift over time post-implantation. Please briefly address or discuss this point in the Discussion (or Results).

      We have added this important discussion point.

      Pages 23-24: Discussion: “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      (3) Page 12: Use of "monocytes" - the word "monocyte" implies a circulating, undifferentiated innate immune cell. This should not be used interchangeably with macrophage or microglia.

      We agree and now refer to microglia or macrophages depending on the context. We did leave the term monocyte in Table 3 if these cells were found in a top 20 gene reported in the references.

      (4) Page 12: "We now show that these unique monocytes reported between the bundles of olfactory axons surrounded by OECs (Smithson & Kawaja, 2010), are in fact, a distinct subtype of OECs."

      Is it possible to conclude that these cells are a "distinct subtype of OECs?" Perhaps these cells are a hybrid between microglia/macrophages and OECs? This is speculative, so should be worded more carefully - especially in the Results section. Please clarify, dampen conclusions, and/or better justify the wording here.

      We agree and have modified the entire paragraph to dampen and more carefully explain our conclusions. We also added an additional observation that strengthens the relationship between OECs and microglial/macrophages.  

      Page 12, Results: Additional observation: “In fact, all top 20 genes in cluster 3 are expressed in microglia, macrophages, and/or monocytes (Suppl. Table 3).”

      Page 13, Results: The statement referenced in your review was deleted and we wrote the following: “Smithson and Kawaja (2010) identified unique microglial/macrophages that immunolabeled with Iba-1 (Aif1) and Annexin A3 (Anxa3) in the olfactory nerve and outer nerve layer of the olfactory bulb. These authors proposed that Iba1-Anxa3 double-labeled cells were a distinct population of microglia/macrophages that protected the olfactory system against viral invasion into the cranial cavity. Based on our scRNA-seq data we offer an alternative interpretation that at least some of these Iba-1-Anxa3 cells may be a hybrid OEC-microglial cell type. Supporting this interpretation, there are a number of reports that suggest OECs frequently function as phagocytes (e.g., Khankan et al., 2016; Nazareth et al., 2020; Su et al. 2013).”

      (5) Page 13: "Pseudotime trajectory analysis, a widely used approach to predict cell plasticity and lineages based on scRNA-seq data, suggests that there are potential transitions between specific OEC subclusters." This is interesting but is somewhat unclear. Please add one more sentence to aid the reader's understanding regarding how this analysis is performed.

      Thank you for your valuable feedback. We have revised the text for clarity as follows:

      Page 14, Results: “We performed pseudotime trajectory analysis using the Slingshot algorithm to infer lineage trajectories, cell plasticity and lineages by ordering cells in pseudotime based on their transcriptional progression reflected in scRNA-seq data. Transcriptional progression refers to the changes in gene expression profiles of cells as they undergo differentiation or transition through different states. The trajectory analysis results suggest that there are potential transitions between specific OEC subclusters.”

      (6) The authors could discuss potential reasons for variability in OEC treatment results after spinal cord injury between studies and labs. How might sequencing results here inform the debate about whether OECs are helpful or not?

      In response to the Public Review, we added discussions about the variability in OEC treatments between studies in both the Introduction and Discussion, and these comments are copied on pages 6-7 of this document. In the Discussion we included a statement about how the current findings may inform the debate on OECs.

      (7) Discussion: please add a discussion of limitations and future directions that addresses the following points:

      a) Please add one sentence on the lack of studying sex differences - only females were studied here.

      b) There is no correlation or modulation of any target genes, so all results here are correlative.

      c) Please add a brief paragraph with future directions for the field, including acknowledgment that the role of OECs in repair after SCI is not fully resolved and that future studies might consider targeting some of the specific pathways described herein.

      d) Which pathways and OEC subpopulations likely best support repair, and how might these be reinforced or better maintained in the SCI environment? If not known, what are the next steps for identifying the most reparative OEC subtype?

      Thank you for the valuable suggestions. We have added these to the discussion as detailed below.

      Pages 23-25, Discussion:

      “Limitations of these OEC scRNA-Seq studies”

      “We recognize that this study is a single snapshot of OEC gene expression derived from adult female rats before they are transplanted above and below the spinal cord transection site. We would expect the gene expression of transplanted OECs to change in each new environment, i.e. as they migrate into the injury site, integrate into the glial scar, and wrap around axons. Based on our past studies, OECs survived in an outbred Sprague-Dawley rat model for ~ 4 weeks (Khankan et al., 2016) and in an inbred Fischer 344 model for 5 months (Dixie, 2019). As spinal cord injury transplant procedures are further enhanced and OEC survival improves, these hybrid glial cells should be examined at multiple time points to better evaluate their proregenerative characteristics.”

      “Due to the extensive urinary tract dysfunction in spinal cord transected rats, most studies are conducted on females as their short urethra facilitates daily manual bladder expression. Our study was carried out only on adult female rats, so sex differences and the generalizability of our findings to adult male rats would require further investigation. We also did not modulate any of the genes or proteins in the identified OEC subtypes to test their causal and functional roles, thus our findings remain correlative in the current study. Future gene/protein modulation studies are necessary to understand the functional roles of the individual OEC subtypes in the context of their reparative functions to determine which pathways and subtypes are more critical and can be enhanced for neural repair. Our current findings build the foundation for these future studies to help resolve the role of OECs in spinal cord injury repair.” 

      “Extensive differences between OEC preparations contribute to the large variation in results from OEC treatments following spinal cord injury. This scRNA-seq study focused entirely on OB-OECs, and the next step would be to carry out similar studies on the peripheral, lamina-propria-derived OECs to discern the differences between the two OEC populations. Such comparative studies using scRNA-seq will help define the underlying mechanisms and resolve the variability in results from OEC-based therapy. Detailed studies of the composition of different OEC transplant types will contribute to identifying the most reparative cell transplantation treatments.”

      (8) Figure 6: What is the major point of this figure and its related immunocytochemistry? Please clarify.

      Franceschini & Barnett (1996) suggested that there were 2 distinct types of OECs that could be distinguished by their different morphology: One type resembling a Schwann cell and the other, an astrocyte. The purpose of Figure 6 is to determine if there is a link between our scRNA-seq-based OEC subtypes with those previously described based on morphology alone (Franceschini and Barnett, 1996). In our results section we show that ~3/4ths of the OECs sampled that were Ki67+ progenitor cells and were astrocyte-like, i.e., flat in shape and weakly Ngfr<sup>p75</sup>-labeled. The remainder were Schwann cell-like, fusiform in shape and strongly Ngfr<sup>p75</sup>-labeled. Our results indicate the two types of OEC classifications share certain degrees of overlap, indicating similarities but also differences between the different classification methods.

      (9) Figure 9, caption: "OEC whole cell lysates (WCL; lanes: 4, 6, and 8), and OEC conditioned medium (CM; lanes: 5 and 7)."  This statement is unclear - please clarify the result here.

      We added clarification to the legend for Figure 9d. 

      Page 50: (d) “Western blot confirms the expression of Reelin in rat olfactory nerve layer I and layer II (ONL; lane 1 of western blot). Reln<sup>+/+</sup> and Reln<sup>-/-</sup> mouse olfactory bulbs were used as positive and negative controls, respectively (lanes: 2 and 3). Reelin that was synthesized by cultured OECs was found in whole cell lysates (WCL; lanes: 4, 6, and 8), whereas Reelin that was secreted by cultured OECs into tissue culture medium was measured in the OEC “conditioned medium” (CM; lanes: 5 and 7). GAPDH was the loading control for tissue homogenates (lanes 1-4, 6, 8).”

      (10) Methods: A Cat. No. for all antibodies and key supplies should be included.

      Response: All of the antibody information in the revised version is in Suppl. Table 4. Information for other key supplies is included in the extensive methods section.

      (11) Methods: How was primary antibody specificity validated for less-used antibodies? Background staining can be a major issue after SCI; e.g., with the CTGF antibody used in Figure 5.

      The spinal cord section shown in Figure 5 was compared to sections from the same SCI cohort that had been injected with control cells, i.e. skin fibroblasts. We have used the first two antibodies (anti-Glial fibrillary acidic protein and anti-Green fluorescent protein) for many years so only the CTGF was a “less-used antibody.” Our strategy for working with “less-used” or “newly-purchased” antibodies was as follows.

      First, we studied the literature to find the best antibodies for neuronal tissue. Many of the images in Figure 7 were generated with antibodies purchased just for this study. Our goal was to characterize them on normal adult lamina propria and olfactory bulb tissues rather than in the injured spinal cord where background can be an issue. In the olfactory bulb we examined the olfactory nerve layer where OECs are concentrated and then examined the olfactory epithelium, lamina propria, and the deep layers of the olfactory bulb to find regions without immunolabel. As described above, we tested anti-CTGF antibodies in SCI sections implanted with skin fibroblasts controls when conducting experiments for CTGF in sections with OECs. New antibodies were tested at multiple concentrations and we tried different immunocytochemical techniques. Anti-CTFG is expressed by several different cell types, but expression is low in most of the areas above and below the injury site. Despite our success with many “newly-purchased” antibodies there were at least 4 of them that we were never able obtain specific labeling. 

      (12) Will the data (especially the sequencing data) be shared publicly?

      The data has been uploaded to and shared via the public data repository GEO. Data availability is stated on the title page of this manuscript.

    1. eLife Assessment

      This important paper provides solid evidence for an alternative conceptualization of the functional role of the place and grid cell network in the medial temporal lobe for memory as opposed to spatial processing or navigation. The theory is extensive, tightly integrating data on various spatial cell types. It accounts for many experimental results and generates strong predictions for future studies that will be of interest to researchers in this field. The impact of the work would be strengthened if future experiments reveal that grid cells do indeed encode specific nonspatial features.

    2. Reviewer #1 (Public review):

      Huber proposes a theory where the role of the medial temporal lobe (MTL) is memory, where properties of spatial cells in the MTL can be explained through memory function rather than spatial processing or navigation. Instantiating the theory through a computational model, the author shows that many empirical phenomena of spatial cells can be captured, and may be better accounted through a memory theory. It is an impressive computational account of MTL cells with a lot of theoretical reasoning and aims to tightly relate to various spatial cell data.

      In general, the paper is well written, and has been greatly improved after revision for clarity and situating the model in the context of the literature. Below are a few responses to the author's rebuttal.

      (2 & 3) In response to my previous review point 2 and 3, the author has now added "According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments." It is good to know that it captures data that show non-grid like responses in more complex and realistic environments. However, the model still focuses on explaining the spatial firing aspect of grid cells even though they are not supposed to be spatial. I noted in my previous review, "If it's not encoding a spatial attribute, it doesn't have to have a spatial field. For example, it could fire in the whole arena". The author notes inhibitory drive and habituation. Habituation happens, but then spatial cell responses are supposed (or assumed) to be still strong after many visits to that environment. More generally, I am more convinced that grid-like and spatial coding are a special case - both in navigation and memory. In a way I believe the author agrees, though the work here focuses on capturing spatial properties (which is understandable given the literature). In conclusion, though there may be theoretical disagreements, I find the points the author raises fair.

      (4) The difference between mEC and lEC or PRC for encoding non-spatial vs spatial attributes is still not clear to me - though not crucial for the point of this paper.

      (5) Thank you for providing a video - this makes it extremely clear how learning occurs.

    3. Reviewer #3 (Public review):

      The author presents a novel theory and computational model suggesting that grid cells do not encode space, but rather encode non-spatial attributes. Place cells in turn encode memories of where those specific attributes occurred. The theory accounts for many experimental results and generates useful predictions for future studies. The model's simplicity and potential explanatory power will interest others in the field. There are, however, a few weaknesses outlined below which undermine the theory.

      Main criticisms:

      (1) A crucial assumption of the model is that grid cells express grid-like firing patterns if and only if the content of experience is constant in space. It is difficult to imagine a real world example that satisfies this assumption. Odors and sounds are used as examples. While they are often more spatially diffuse than an object on the ground, odors and sounds have sources that are readily detectable and thus are not constant in space. Animals can easily navigate to a food source or to a vocalizing conspecific. This assumption is especially problematic because it predicts that all grid cells should become silent when their preferred non-spatial attribute (e.g. a specific odor) is missing. I'm not aware of any experimental data showing that grid cells become silent. On the contrary, grid cells are known to remain active across all contexts that have been tested, including across sleep/wake states. Unlike place cells, grid cells have never been shown to turn off. Since grid cells are active in all contexts, their preferred attribute must also be present in all contexts, and therefore they would not convey any information about the specific content of an experience. The author lists many attributes that could in theory be constant in a laboratory setting, but there is no data I'm aware of that shows this is true in practice. As it stands, this crucial assumption of the model remains mere speculation.

      (2) The proposed novelty of this theory is that other models all assume that grid cells encode space. This is not quite true of models based on continuous attractor networks, the discussion of which is essentially absent. More specifically, attractor models focus on the importance of intrinsic dynamics within entorhinal cortex in generating the grid pattern. While this firing pattern is aligned to space during navigation and therefore can be used a representation of that space, the neural dynamics are preserved even during sleep. Similarly, it is because the grid pattern does not strictly encode physical space that grid-like signals are also observed in relation to other two-dimensional continuous variables.

      (3) The use of border cells or boundary vector cells as the main (or only) source of spatial information in the hippocampus is not well supported by experimental data. Border cells in entorhinal cortex are not active in the center of an environment. Boundary-vector cells can fire farther away from the walls, but are not found in entorhinal cortex. They are located in the subiculum, a major output of the hippocampus. While the entorhinal-hippocampal circuit is a loop, the route from boundary-vector cells to place cells is much less clear than from grid cells. Moreover, both border cells and boundary-vector cells (which are conflated in this paper) comprise a small population of neurons compared to grid cells.

      Minor comments:

      (1) There is substantial theoretical and experimental work supporting the idea that grid cell modules instantiate continuous attractor networks, yet this class of models is largely ignored:

      p. 7 "In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation"

      The following references should be added:

      McNaughton, B. L., Battaglia, F. P., Jensen, O., Moser, E. I. & Moser, M.-B. Path integration and the neural basis of the 'cognitive map'. Nat. Rev. Neurosci. 7, 663-678 (2006).

      Fuhs, M. C. & Touretzky, D. S. A spin glass model of path integration in rat medial entorhinal cortex. J. Neurosci. 26, 4266-4276 (2006).

      Burak, Y. & Fiete, I. R. Accurate path integration in continuous attractor network models of grid cells. PLoS Comput. Biol. 5, e1000291 (2009).

      Guanella, A., Kiper, D. & Verschure, P. A model of grid cells based on a twisted torus topology. Int. J. Neural Syst. 17, 231-240 (2007).

      Couey, J. J. et al. Recurrent inhibitory circuitry as a mechanism for grid formation. Nat. Neurosci. 16, 318-324 (2013).

      (Note: the Bellmund et al. (2016) citation is likely a mistake and was intended to be Bellmund et al. (2018).)

      (2) The author claims in two places that this model is the first to explain that grid cell population activity lies on a torus. While it may be the first explicit computational account of why grid cell activity is mapped onto a torus, these claims should be moderated for clarity, for example by adding "but see McNaughton et al. (2006) and others".

      Box 1. Results Uniquely Explained by this Memory Model - the population code of grid cells lies on a torus

      p.11 "In addition, this simplifying assumption allows the model to capture the finding that the population of grid cells lies on a torus (Gardner et al., 2022), although I note that the model was developed before this result was known."

      (3) Lateral entorhinal cortex is largely ignored in this memory model. It should be considered that the predominance of spatial representations reported in the literature is due to historical reasons. Namely, the discovery of hippocampal place cells spurred interest in looking upstream for the source of spatial information, which was later abundantly clear in medial entorhinal cortex. Lateral entorhinal cortex is relatively understudied, but is known to encode odors, objects, and time in a way that medial entorhinal cortex does not. It is therefore confusing to assume that these attributes are instead encoded by grid cells.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review): 

      (1) Although the theory is based on memory, it also is based on spatially-selective cells.

      Not all cells in the hippocampus fulfill the criteria of place/HD/border/grid cells, and place a role in memory. E.g., Tonegawa, Buszaki labs' work does not focus on only those cells, and there are certainly a lot of non-pure spatial cells in monkeys (Martinez-Trujillo) and humans (iEEG). Does the author mainly focus on saying that "spatial cells" are memory, but do not account for non-spatial memory cells? This seems to be an incomplete account of memory - which is fine, but the way the model is set up suggests that *all* memory is, place (what/where), and non-spatial attributes ("grid") - but cells that don't fulfil these criteria in MTL (Diehl et al., 2017, Neuron; non-grid cells; Schaeffer et al., 2022, ICML; Luo et al., 2024, bioRxiv) certainly contribute to memory, and even navigation. This is also related to the question of whether these cell definitions matter at all (Luo et al., 2024). The authors note "However, this memory conjunction view of the MTL must be reconciled with the rodent electrophysiology finding that most cells in MTL appear to have receptive fields related to some aspect of spatial navigation (Boccara et al., 2010; Grieves & Jeffery, 2017). The paucity of non-spatial cells in MTL could be explained if grid cells have been mischaracterized as spatial." Is the author mainly talking about rodent work?

      There is a new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      (2) Related to the last point, how about non-grid multi-field mEC cells? In theory, these also should be the same; but the author only presents perfect-look grid cells. In empirical work, clearly, this is not the case, and many mEC cells are multi-field non-grid cells (Diehl et al., 2017). Does the model find these cells? Do they play a different role? As noted by the author "Because the non-spatial attributes are constant throughout the two-dimensional surface, this results in an array of discrete memory locations that are approximately hexagonal (as explained in the Model Methods, an "online" memory consolidation process employing pattern separation rapidly turns an approximately hexagonal array into one that is precisely hexagonal). " If they are indeed all precisely hexagonal, does that mean the model doesn't have non-grid spatial cells? 

      Grid cells with irregular firing fields are now considered in the discussion with the following paragraphs

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multilocation grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face-centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is real-world variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three real-world dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      (3) Theoretical reasons for why the model is put together this way, and why grid cells must be coding a non-spatial attribute: Is this account more data-driven (fits the data so formulated this way), or is it theoretical - there is a reason why place, border, grid cells are formulated to be like this. For example, is it an efficient way to code these variables? It can be both, like how the BVC model makes theoretical sense that you can use boundaries to determine a specific location (and so place cell), but also works (creates realistic place cells). 

      The motivation for this model is now articulated in the new section, quoted above, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ Regarding the assumption that border cells provide a spatial metric, this assumption is made for the same reasons as in the BVC model. Regarding this, the text said: “These assumptions regarding border cells are based on the boundary vector cell (BVC) model of Barry et al. (2006). As in the BVC model, combinations of border cells encode where each memory occurred in the realworld X/Y plane.”. A new sentence is added to model methods, stating: “This assumption is made because border cells provide an efficient representation of Euclidean space (e.g., if the animal knows how far it is from different walls of the enclosure, this already available information can be used to calculate location).”

      But in this case, the purpose of grid cell coding a non-spatial attribute, and having some kind of system where it doesn't fire at all locations seems a little arbitrary. If it's not encoding a spatial attribute, it doesn't have to have a spatial field. For example, it could fire in the whole arena - which some cells do (and don't pass the criteria of spatial cells as they are not spatially "selective" to another location, related to above).  

      Some cells have a constant high firing rate, but they are the exception rather than the rule. More typically, cells habituate in the presence of ongoing excitatory drive and by doing so become sensitive to fluctuations in excitatory drive. Habituation is advantageous both in terms of metabolic cost and in terms of function (i.e., sensitivity to change). This is now explained in the following paragraph:

      “In theory, a cell representing a non-spatial attribute found at all locations of an enclosure (aka, a grid cell in the context of this model), could fire constantly within the enclosure. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations rather than in all locations even though the attribute is found at all locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive. In the model simulation, this dynamic adaptation is captured by supposing that cells fire 5% of the time on-average across the simulation, regardless of their excitatory inputs.”

      (4) Why are grid cells given such a large role for encoding non-spatial attributes? If anything, shouldn't it be lateral EC or perirhinal cortex? Of course, they both could, but there is less reason to think this, at least for rodent mEC.  

      This is a good point and the following paragraph has been added to the introduction to explain that lateral EC is likely part of the explanation. But even when including lateral EC, it still appears that most of the input to hippocampus is spatial.

      “One possible answer to the apparent lack of non-spatial cells in MTL is to highlight the role of the lateral entorhinal cortex (LEC) as the source of non-spatial what information for memory encoding (Deshmukh & Knierim, 2011). LEC can be contrasted with mEC, which appears to only provide where information (Boccara et al., 2010a; Diehl et al., 2017). Although it is generally true that LEC is involved in non-spatial processing, there is evidence that LEC provides some forms of spatial information (Knierim et al., 2014). The kind of non-spatial information provided by LEC appears to be in relation to objects (Connor & Knierim, 2017; Wilson et al., 2013). However, in a typical rodent spatial navigation study there are no objects within the enclosure. Thus, although the distinction between mEC and LEC is likely part of the explanation, it is still the case that rodent entorhinal input to hippocampus appears to heavily favor spatial information.”

      (5) Clarification: why do place cells and grid cells differ in terms of stability in the model? Place cells are not stable initially but grid cells come out immediately. They seem directly connected so a bit unclear why; especially if place cell feedback leads to grid cell fields. There is an explanation in the text - based on grid cells coding the on-average memories, but these should be based on place cell inputs as well. So how is it that place fields are unstable then grid fields do not move at all? I wonder if a set of images or videos (gifs) showing the differences in spatial learning would be nice and clarify this point.  

      In this revision, I provide a new video focused on learning of place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. The short answer is that the grid fields for the non-spatial cell are based on the average across several view-dependent memories (i.e., across several place cells that have head direction sensitivity) and the average is reliable even if the place cells are unstable. The text of this explanation now reads:

      “Why was the grid immediately apparent for the non-spatial attribute cell whereas the grid took considerable prior experience for the head direction cells? The answer relates to memory consolidation and the shifting nature of the hippocampal place cells. Head direction cells only produced a reliable grid once the hippocampal place cells (aka, memory cells) assumed stable locations. During the first few sessions, the hippocampal place cells were shifting their positions owing to pattern separation and consolidation. But once the place cells stabilized, they provided reliable top-down memory feedback to the head direction cells in some places but not others, thus producing a reliable grid arrangement to the firing maps of the head direction cells. In other words, for the head direction cells, the grid only appeared once the place cells stabilized. This slow stabilization of place fields is a known property (Bostock et al., 1991; Frank et al., 2004).

      In the simulation, the place cells did not stabilize until a sufficient number of place cells were created (Figure 9C). Specifically, these additional memories were located immediately outside the enclosure, around all borders (Figure 9D). These “outside the box” memories served to constrain the interior place cells, locking them in position despite ongoing consolidation. This dynamic can be seen in a movie showing a representative simulation. The movie shows the positions of the head direction sensitive place cells during initial learning, and then during additional sessions of prior experience as the movie speeds up (see link in Figure 9 capture).

      Why did the non-spatial grid cell (k) produce a grid immediately, before the place cells stabilized? As discussed in relation to Figure 8, the non-spatial grid cell is the projection through the 3D volume of real-world coordinates that includes X, Y, and head direction. Each grid field of a non-spatial grid cell reflects feedback from several place cells that each have a different head direction sensitivity (see for instance the allocentric pairs of memories illustrated in Figure 8C and 8D). Thus, each grid field is the average across several memories that entail different viewpoints and this averaging across memories provides stability even if the individual memories are not yet stable. This average of unstable memories produces a blurry sort of grid pattern without any prior experience.

      A final piece of the puzzle relies on the same mechanism that caused the grid pattern to align with the borders as reported in the results of Figures 6 and 7. Specifically, there are some “sticky” locations with ongoing consolidation because the connection weights are bounded. Because weights cannot go below their minimum or above their maximum, it is slightly more difficult for consolidation to push or pull connection weights over the peak value or under the minimum value of the tuning curve. Thus, the place cells tend to linger in locations that correspond to the peak or trough of a border cell. There are multiple peak and trough locations but for the parameter values in this simulation, the grid pattern seen in Figure 9C shows the set of peak/trough locations that satisfy the desired spacing between memories. Thus, the average across memories shows a reliable grid field at these locations even though the memories are unstable.”

      (6) Other predictions. Clearly, the model makes many interesting (and quite specific!) predictions. But does it make some known simple predictions? 

      • More place cells at rewarded (or more visited) locations. Some empirical researchers seem to think this is not as obvious as it seems (e.g., Duvellle et al., 2019; JoN; Nyberg et al., 2021, Neuron Review).  

      • Grid cell field moves toward reward (Butler et al., 2019; Boccera et al., 2019).  

      • Grid cells deform in trapezoid (Krupic et al., 2015) and change in environments like mazes (Derikman et al., 2014).  

      Thank you for these suggestions and I have added the following paragraph to the discussion:

      “In terms of the animal’s internal state, all locations in the enclosure may be viewed as equally aversive and unrewarding, which is a memorable characteristic of the enclosure. Reward, or lack thereof, is arguably one of the most important nonspatial characteristics and application of this model to reward might explain the existence of goal-related activity in place cells (Hok et al., 2007; although see Duvelle et al., 2019), reflecting the need to remember rewarding locations for goal directed behavior. Furthermore, if place cell memories for a rewarding location activate entorhinal grid cells, this may explain the finding that grid cells remap in an enclosure with a rewarded location such that firing fields are attracted to that location (Boccara et al., 2019; Butler et al., 2019). Studies that introduce reward into the enclosure are an important first step in terms of examining what happens to grid cells when the animal is placed in a more varied environment.”

      Regarding the changes in shape of the environment, this was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).” This particular section of the paper now appears in the Appendix and Figure 12 is now Appendix Figure 2.

      Reviewer #2 (Public Review): 

      The manuscript describes a new framework for thinking about the place and grid cell system in the hippocampus and entorhinal cortex in which these cells are fundamentally involved in supporting non-spatial information coding. If this framework were shown to be correct, it could have high impact because it would suggest a completely new way of thinking about the mammalian memory system in which this system is non-spatial. Although this idea is intriguing and thought-provoking, a very significant caveat is that the paper does not provide evidence that specifically supports its framework and rules out the alternate interpretations. Thus, although the work provides interesting new ideas, it leaves the reader with more questions than answers because it does not rule out any earlier ideas. 

      Basically, the strongest claim in the paper, that grid cells are inherently non-spatial, cannot be specifically evaluated versus existing frameworks on the basis of the evidence that is shown here. If, for example, the author had provided behavioral experiments showing that human memory encoding/retrieval performance shifts in relation to the predictions of the model following changes in the environment, it would have been potentially exciting because it could potentially support the author's reconceptualization of this system. But in its current form, the paper merely shows that a new type of model is capable of explaining the existing findings. There is not adequate data or results to show that the new model is a significantly better fit to the data compared to earlier models, which limits the impact of the work. In fact, there are some key data points in which the earlier models seem to better fit the data.  

      Overall, I would be more convinced that the findings from the paper are impactful if the author showed specific animal memory behavioral results that were only supported by their memory model but not by a purely spatial model. Perhaps the author could run new experiments to show that there are specific patterns of human or animal behavior that are only explained by their memory model and not by earlier models. But in its current form, I cannot rule out the existing frameworks and I believe some of the claims in this regard are overstated. 

      As previously detailed in Box 1 and as explained in the text in several places, the model provides an explanation of several findings that remain unexplained by other theories (see “Results Uniquely Explained by the Memory Model”). But more generally this is a good point, and the initial draft failed to fully articulate why a researcher might choose this model to guide future empirical investigations. A new section in the introduction that deals with these issues, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial twodimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      - The paper does not fully take into account all the findings regarding grid cells, some of which very clearly show spatial processing in this system. For example, findings on grid-bydirection cells (e.g., Sargolini et al. 2006) would seem to suggest that the entorhinal grid system is very specifically spatial and related to path integration. Why would grid-bydirection cells be present and intertwined with grid cells in the author's memory-related reconceptualization? It seems to me that the existence of grid-by-direction cells is strong evidence that at least part of this network is specifically spatial.

      Head by direction grid cells were a key part of the reported results. These grid cells naturally arise in the model as the animal forms memories (aka, hippocampal place cells) that conjoin location (as defined by border cells), head direction at the time of memory formation, and one or more non-spatial properties found at that location. In this revision, I have attempted to better explain how including head direction in hippocampal memories naturally gives rise to these cell types. The introduction to the head direction module simulations now reads:

      “According to this memory model of spatial navigation, place cells are the conjunction of location, as defined by border cells, and one or more properties that are remembered to exist at that location. Such memories could, for instance, allow an animal to remember the location of a food cache (Payne et al., 2021). The next set of simulations investigates behavior of the model when one of the to-be-remembered properties is head direction at the time when the memory was formed (e.g., the direction of a pathway leading to a food cache). Indicating that head direction is an important part of place cell representations, early work on place cells in mazes found strong sensitivity to head direction, such that the place field is found in one direction of travel but not the other (McNaughton et al., 1983; Muller et al., 1994). Place cells can exhibit a less extreme version of head direction sensitivity in open field recordings (Rubin et al., 2014), but the nature of the sensitivity is more complicated, depending on location of the animal relative to the place field center (Jercog et al., 2019).

      It is possible that some place cell memories do not receive head direction input, as was the case for the simulations reported in Figures 6/7 – in those simulations, place cells were entirely insensitive to head direction, owing to a lack of input from head direction cells. However, removal of head direction input to hippocampus affects place cell responses (Calton et al., 2003) and grid cell responses (Winter et al., 2015), suggesting that head direction is a key component of the circuit. Furthermore, if place cells represent episodic memories, it seems natural that they should include head direction (i.e., viewpoint at the time of memory formation).

      In the simulations reported next, head direction is simply another property that is conjoined in a hippocampal place cell memory. In this case, a head direction cell should become a head direction conjunctive grid cell (i.e., a grid cell, but only when the animal is heading in a particular direction), owing to memory feedback from the hexagonal array of hippocampal place cell memories. When including head direction, the real-world dimensions of variation are across three dimensions (X, Y, and head direction) rather than two, and consolidation will cause the place cells to arrange in a three-dimensional volume. The simulation reported below demonstrates that this situation provides a “grid module”.”

      - I am also concerned that the paper does not do enough to address findings regarding how the elliptical shape of grid fields shifts when boundaries of an environment compress in one direction or change shape/angles (Lever et al., & Krupic et al). Those studies show compression in grid fields based on boundary position, and I don't see how the authors' model would explain these findings.  

      This finding was covered in the original submission: “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.”

      I apologize for being unclear in describing how the model might explain this result. The paragraph has been rewritten and now reads:

      “Consider the possibility that one mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders (e.g., learning the properties of the enclosure, such as the metal surface) while a different grid module is based on head direction in remembered positions relative to landmarks exterior to the enclosure (e.g., learning the properties of the experimental room, such as the sound of electronics that the animal is subject to at all locations). This might explain why a deformation of the enclosure (moving one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, suppose that the movement of one wall is modest and after moving the wall, the animal views the enclosure as being the same enclosure, albeit slightly modified (e.g., when a home is partially renovated, it is still considered the same home). In this case, the set of non-orthogonal dimensions associated with enclosure borders would still be associated with the now-changed borders and any memories in reference to this border-determined space would adjust their positions accordingly in real-world coordinates (i.e., the place cells would subtly shift their positions owing to this deformation of the borders, producing a corresponding deformation of the grid). At the same time, there may be other sets of memories that are in relation to dimensions exterior to the enclosure. Because these exterior properties are unchanged, any place cells and grid cells associated with the exterior-oriented memories would be unchanged by moving the enclosure wall.”

      - Are findings regarding speed modulation of grid cells problematic for the paper's memory results? 

      - A further issue is that the paper does not seem to adequately address developmental findings related to the timecourses of the emergence of different cell types. In their simulation, researchers demonstrate the immediate emergence of grid fields in a novel environment, while noting that the stabilization of place cell positions takes time. However, these simulation findings contradict previous empirical developmental studies (Langston et al., 2010). Those studies showed that head direction cells show the earliest development of spatial response, followed by the appearance of place cells at a similar developmental stage. In contrast, grid cells emerge later in this developmental sequence. The gradual improvement in spatial stability in firing patterns likely plays a crucial role in the developmental trajectory of grid cells. Contrary to the model simulation, grid cells emerge later than place cells and head direction cells, yet they also hold significance in spatial mapping. 

      - The model simulations suggest that certain grid patterns are acquired more gradually than others. For instance, egocentric grid cells require the stabilization of place cell memories amidst ongoing consolidation, while allocentric grid cells tend to reflect average place field positions. However, these findings seemingly conflict with empirical studies, particularly those on the conjunctive representation of distance and direction in the earliest grid cells. Previous studies show no significant differences were found in grid cells and grid cells with directional correlates across these age groups, relative to adults (Wills et al., 2012). This indicates that the combined representation of distance and direction in single mEC cells is present from the earliest ages at which grid cells emerge. 

      These are good points and they have been addressed in a new section of the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      Reviewer #3 (Public Review): 

      A crucial assumption of the model is that the content of experience must be constant in space. It's difficult to imagine a real-world example that satisfies this assumption. Odors and sounds are used as examples. While they are often more spatially diffuse than an objects on the ground, odors and sounds have sources that are readily detectable. Animals can easily navigate to a food source or to a vocalizing conspecific. This assumption is especially problematic because it predicts that all grid cells should become silent when their preferred non-spatial attribute (e.g. a specific odor) is missing. I'm not aware of any experimental data showing that grid cells become silent. On the contrary, grid cells are known to remain active across all contexts that have been tested, including across sleep/wake states. Unlike place cells, grid cells do not seem to turn off. Since grid cells are active in all contexts, their preferred attribute must also be present in all contexts, and therefore they would not convey any information about the specific content of an experience.  

      These are good points and in this revision I have attempted to explain that there is a great deal of contextual similarity across all recording sessions. One paragraph in the discussion now reads

      “In a typical rodent spatial navigation study, the non-spatial attributes are wellcontrolled, existing at all locations regardless of the enclosure used during testing (hence, a grid cell in one enclosure will be a grid cell in a different enclosure). Because labs adopt standard procedures, the surfaces, odors (e.g., from cleaning), external lighting, time of day, human handler, electronic apparatus, hunger/thirst state, etc. might be the same for all recording sessions. Additionally, the animal is not allowed to interact with other animals during recording and this isolation may be an unusual and highly salient property of all recording sessions. Notably, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus and entorhinal input to hippocampus. According to this model, hippocampal place cells are “marking” all locations in the enclosure as places where these things tend to happen.”

      The proposed novelty of this theory is that other models all assume that grid cells encode space. This isn't quite true of models based on continuous attractor networks, the discussion of which is notably absent. More specifically, these models focus on the importance of intrinsic dynamics within the entorhinal cortex in generating the grid pattern. While this firing pattern is aligned to space during navigation and therefore can be used as a representation of that space, the neural dynamics are preserved even during sleep. Similarly, it is because the grid pattern does not strictly encode physical space that gridlike signals are also observed in relation to other two-dimensional continuous variables. 

      These models were briefly discussed in the general discussion section and in this revision they are further discussed in the introduction in a new section, titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’ That section reads:

      “Spatial navigation is inherently a memory problem – learning the spatial arrangement of a new enclosure requires memory for the conjunction of what and where. This has long been realized and in the introduction to ‘Hippocampus as a Cognitive Map’, O’Keefe and Nadel (1978) wrote “We shall argue that the hippocampus is the core of a neural memory system providing an objective spatial framework within which the items and events of an organism's experience are located and interrelated” (emphasis added). Furthermore, in the last chapter of their book, they extended cognitive map theory to human memory for non-spatial characteristics. However, in the decades since the development of cognitive map theory, the rodent spatial navigation and human memory literatures have progressed somewhat independently.

      The ideas proposed in this model are an attempt to reunify these literatures by returning to the original claim that spatial navigation is inherently a memory problem. The goal of the current study is to explain the rodent spatial navigation literature using a memory model that has the potential to also explain the human memory literature. In contrast, most grid cell models (Bellmund et al., 2016; Bush et al., 2015; Castro & Aguiar, 2014; Hasselmo, 2009; Mhatre et al., 2012; Solstad et al., 2006; Sorscher et al., 2023; Stepanyuk, 2015; Widloski & Fiete, 2014) are domain specific models of spatial navigation and as such, they do not lend themselves to explanations of human memory. Thus, the reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account.

      This study does not attempt to falsify other theories of grid cells. Instead, this model reaches a radically different interpretation regarding the function of grid cells; an interpretation that emerges from viewing spatial navigation as a memory problem. All other grid cell models assume that an entorhinal grid cell displaying a spatially arranged grid of firing fields serves the function of spatial coding (i.e., spatial grid cells exist to support a spatial metric). In contrast, the proposed memory model of grid cells assumes that the hexagonal tiling reflects the need to keep memories separate from each other to minimize confusion and confabulation – the grid pattern is the byproduct of pattern separation between memories rather than the basis of a spatial code. 

      It is now understood that grid-like firing fields can occur for non-spatial two dimensional spaces. For instance, human entorhinal cortex exhibits grid-like responses to video morph trajectories in a two-dimensional bird neck-length versus bird leg-length space (Constantinescu et al., 2016). As a general theory of learning and memory, the proposed memory model of grid cells is easily extended to explain these results (e.g., relabeling the border cell inputs in the model as neck-length and leg-length inputs). However, there are other grid cell models that can explain both spatial grid cells as well as non-spatial grid-like responses (Mok & Love, 2019; Rodríguez-Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015). Similar to this memory model of grid cells, these models are also positioned to explain both the rodent spatial navigation and human memory literatures. Nevertheless, there is a key difference between this model and other grid cell models that generalize to non-spatial representations. Specifically, these other models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). This memory model of grid cells provides an answer to the apparent paucity of nonspatial cell types in rodent MTL by proposing that grid cells with spatial receptive fields have been misclassified as spatial (they are what cells rather than where cells) and that place cells are fundamentally memory cells that conjoin what and where.”

      The use of border cells or boundary vector cells as the main (or only) source of spatial information in the hippocampus is not well supported by experimental data. Border cells in the entorhinal cortex are not active in the center of an environment. Boundary-vector cells can fire farther away from the walls but are not found in the entorhinal cortex. They are located in the subiculum, a major output of the hippocampus. While the entorhinalhippocampal circuit is a loop, the route from boundary-vector cells to place cells is much less clear than from grid cells. Moreover, both border cells and boundary-vector cells (which are conflated in this paper) comprise a small population of neurons compared to grid cells.

      AUTHOR RESPONSE: The model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Specific questions/clarifications:  

      (1) Assumption of population-based vs single unit link to biological cells: At the start, the author assumes that each unit here can be associated with a population: "the simulated activation values can be thought of as proportional to the average firing rate of an ensemble of neurons with similar inputs and outputs (O'Reilly & Munakata, 2000)." But is a 'grid cell' found here a single cell or an average of many cells? Does this mean the model assumes many cells that have different fields that are averaged, which become a grid-like unit in the model? But in biology, these are single cells? Or does it mean a grid response is an average of the place cell inputs? 

      I apologize for being unclear about this. The grid cells in the model are equivalent to real single cells except that the simulation uses a ratecoded cell rather than a spiking cell. The averaging that was mentioned in the paper is across identically behaving spiking cells rather than across cells with different grid field arrangements. To better explain this, I have added the following text:

      “For instance, consider a set of several thousand spiking grid cells that are identical in terms of their firing fields. At any moment, some of these identically-behaving cells will produce an action potential while others do not (i.e., the cells are not perfectly synchronized), but a snapshot of their behavior can be extracted by calculating average firing rate across the ensemble. The simulated cells in the model represent this average firing rate of identically-behaving ensembles of spiking neurons.” 

      This is a mathematical short-cut to avoid simulating many spiking neurons. Because this model was compared to real spike rate maps, this real-valued average firing rate is down-sampled to produce spikes by finding the locations that produced the top 5% of real-valued activation values across the simulation.

      (2) It is not clear to me why they are circular border cells/basis sets.  

      In the initial submission, there was a brief paragraph describing this assumption. In this revision, that paragraph has been expanded and modified for greater clarity. It now reads:

      “Because head direction is necessarily a circular dimension, it was assumed that all dimensions are circular (a circular dimension is approximately linear for nearby locations). This assumption of circular dimensions was made to keep the model relatively simple, making it easier to combine dimensions and allowing application of the same processes for all dimensions. For instance, the model requires a weight normalization process to ensure that the pattern of weights for each dimension corresponds to a possible input value along that dimension. However, the normalization for a linear dimension is necessarily different than for a circular dimension. Because the neural tuning functions were assumed to be sine waves, normalization requires that the sum of squared weights add up to a constant value. For a linear dimension, this sum of squares rule only applies to the subset of cells that are relevant to a particular value along the dimension whereas for a circular dimension, this sum of squares rule is over the entire set of cells that represent the dimension (i.e., weight normalization is easier to implement with circular dimensions). Although all dimensions were assumed to be circular for reasons of mathematical convenience and parsimony, circular dimensions may relate to the finding that human observers have difficultly re-orienting themselves in a room depending on the degree of rotational symmetry of the room (Kelly et al., 2008). In addition, this simplifying assumption allows the model to capture the finding that the population of grid cells lies on a torus (Gardner et al., 2022), although I note that the model was developed before this result was known.”

      (3) Why is it 3 components? I realise that the number doesn't matter too much, but I believe more is better, so is it just for simplicity? 

      In this revision, additional text has been added to explain this assumption: “To keep the model simple, the same number of cells was assumed for all dimensions and all dimensions were assumed to be circular (head direction is necessarily circular and because one dimension needed to be circular, all dimensions were assumed to be circular). Three cells per dimensions was chosen because this provides a sparse population code of each dimension, with few border cells responding between borders, with few border cells responding between borders, while allowing three separate phases of grid cells within a grid cell module (in the model, a grid cell module arises from combination of a third dimension, such as head direction, with the real-world X/Y dimensions defined by border cells).”

      As a reminder, the text explaining the sparse coding of border cells reads: “However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).”

      The model can work with just two opponent cells or with more than three cells per basis set. In different simulations, I have explored these possibilities. Three was chosen because it is a convenient way to highlight the face-centered cubic packing of memories that tends to occur (FCP produces 3 alternating layers of hexagonally arranged firing fields). Thus, each of the three head direction cells captures a different layer of the FCP arrangement. A more realistic simulation might combine 6 different head direction cells tiling the head direction dimension with opponent border cells (just 2 cells for each border dimensions). Such a combination would produce responses at borders, but no responses between borders and, at the same time, the head direction cells would still reveal the FCP arrangement. However, it is not easy to find the right parameters for such a mix-and-match simulation in which different dimensions have different numbers of tuning functions (e.g., some dimensions having 2 cells while others have 3 or 6 and some dimensions being linear while others are circular). When all of the dimensions are of the same type, the simple sum that arises from multiplying the input by the weight values gives rise to Euclidean distance (see Figure 3B). With a mix-and-match model of different dimension-types, it should be possible to adjust the sum to nevertheless produce a monotonic function with Euclidean distance although I leave this to future work. To keep things simple, I assumed that all dimensions are of the same type (circular, with 3 cells per dimension).  

      (4) Confusion due to the border cells/box was unclear to me. "If the period of the circular border cells was the same as the width of the box, then a memory pushed outside the box on one side would appear on the opposite side of the box, in which case the partial grid field on one side should match up with its remainder on the other side. This would entail complete confusion between opposite sides of the box, and the representation of the box would be a torus (donut-shaped) rather than a flat two-dimensional surface. To reduce confusion ..." Is this confusion of the model? Of the animal?  

      This would be confusion of the animal (e.g., a memory field overlapping with one border would also appear at the opposite border in the corresponding location). At one point in model development, I made the assumption that one side of the box wraps to the other side, and I asked Trygve Solstad to run some analyses of real data to see if cells actually wrap around in this manner. He did not find any evidence of this, and so I decided to include outsidethe-box representational area which, as it turned out, allowed the model to capture other behaviors as detailed in the paper.

      This section of the paper now reads:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      (5) Figure 3 - This result seems to be related to whether you use Euclidean or city-block distance. If you use Euclidean distances in two dimensions wouldn't this work out fine?  

      Euclidean distance was the metric used in the analysis of the two-dimensional simulation, but this did not work out. To make this clear, I have changed the label on the x-axes to read “Euclidean distance” for both the two- and three-dimensional simulations. The two-dimensional simulation produced city block behavior rather than Euclidean behavior because memory retrieval is the sum of the two dimensions, as is standard in neural networks, rather than the Euclidian distance formula, which would require that memory retrieval be the square root of the sum of squares of the two dimensions. One way to address this problem with the two-dimensional simulation would be to use a specific Euclidean-mimicking activation function rather than a simple sum of dimensions. The very first model I developed used such an activation function as applied to opponent border cells with just two dimensions (so 4 cells in total – left/right and top/down). This produced Euclidean behavior, but the activation function was implausible and did not generalize to simulations that also included head direction. In contrast, with three non-orthogonal dimensions, the simple sum of dimensions is approximately Euclidean.

      (6) Final sentence of the Discussion: "However, unlike the present model, these models still assume that entorhinal grid cells represent space rather than a non-spatial attribute." I am not sure if the authors of the cited papers will agree with this. They consider the spatial cases, but most argue they can treat non-spatial features as well. What the author might mean is that they assume non-spatial features are in some metric space that, in a way, is spatial. However, I am not sure if the author would argue that non-spatial features cannot be encoded metrically (e.g., Euclidean distance based on the similarity of odours). 

      In this section, when referring to “entorhinal grid cells” I was specifically referring to traditional grid cells in a rodent spatial navigation experiment. I did not mean to imply that these other theories cannot explain nonspatial grid fields, such as in the two-dimensional bird space grid cells found with humans. The way in which the proposed memory model and these other models differ is in terms of what they assume regarding the function of grid cells that exhibit spatial grid fields. In this revision, I have changed this text to read:

      “These models can capture some of the grid cell results presented in the current simulations, including extension to non-spatial grid-like responses (e.g., grid field that cover a two-dimensional neck/leg length bird space). Furthermore, these models may be able to explain memory phenomena similar to the model proposed in this study. However, unlike the proposed model, these models assume that the function of entorhinal grid cells that exhibit spatial X/Y grid fields during navigation is to represent space. In contrast, the memory model proposed in this study assume that the function of spatial X/Y grid cells is to represent a non-spatial attribute; the only reason they exhibit a spatial X/Y grid is because memories of that non-spatial attribute are arranged in a hexagonal grid owing to the uncluttered/unvarying nature of the enclosure. Thus, these model do not explain why most of the input to rodent hippocampus appears to be spatial (Boccara et al., 2010b; Diehl et al., 2017; Grieves & Jeffery, 2017) whereas the proposed model can explain this situation as reflecting the miss-classification of grid cells with a spatial arrangement as providing spatial input to hippocampus.”

      (7) It would be interesting to see videos/gifs of the model learning, and an idea of how many steps of trials it takes (is it capturing real-time rodent cell firing whilst foraging, or is it more abstracted, taking more trials). 

      The short answer is “yes”, the model is capturing real-time rodent cell firing while foraging. This is particularly true when simulating place cell memories in the absence of head direction information, as was shown in a video provided in the initial submission in relation to Figure 4. In this revision, I have provided a second video of learning when simulating place cell memories that include head direction. This second video is in relation to the results reported in Figure 9. This shows that even when learning a three-dimensional real-world space (X, Y, and head direction), the model rapidly produces an on-average hexagonal arrangement of place cells memories owing to the slight tendency of the place cell memories to linger in some locations as compared to others during consolidation. More specifically, they are more likely to linger in the locations that are the intersections of the peaks and/or troughs of the border cells and it is this tendency that supports the immediate appearance of grid cells. However, because the place cell memories are still shifting, head direction conjunctive grid cells are slower to emerge (the head direction conjunctive grid cells require stabilization of the place cells). The video then speeds up the learning process to so how place cells eventually stabilize after sufficient learning of the borders of the enclosure from different head/view directions.

      (8) One question is whether all the results have to be presented in the main text. It was difficult to see which key predictions fit the data and do so better than a spatial/navigation account. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      Reviewer #3 (Recommendations For The Authors):  

      The title could usefully be shortened to focus on the main argument that observed firing patterns could be consistent with mapping memories instead of space. It's a stretch to argue that memory is the primary role when no such data is presented (i.e., there is no comparison of competing models). 

      This is a good point (I do not present evidence that conclusively indicates the function of MTL). This original title was chosen to make clear how this account is a radical departure from other accounts of grid cells. The revised title highlights that: 1) a memory model can also explain rodent single cell recording data during navigation; and 2) grid cell may not be non-spatial. The revised title is: “A Memory Model of Rodent Spatial Navigation: Place Cells are Memories Arranged in a Grid and Grid Cells are Non-spatial”

      When arguing that the main role of the hippocampus is memory, I strongly suggest engaging with the work of people like Howard Eichenbaum who spent the better part of their career arguing the same (e.g. DOI:10.1152/jn.00005.2017.)  

      Thank you for pointing out this important oversight. Early in introduction, I now write: “The proposal that hippocampus represents the multimodal conjunctions that define an episode is not new (Marr et al., 1991; Sutherland & Rudy, 1989) and neither is the proposal that hippocampal memory supports spatial/navigation ability (Eichenbaum, 2017). This view of the hippocampus is consistent with “feature in place” results (O’Keefe & Krupic, 2021) in which hippocampal cells respond to the conjunction of a non-spatial attribute affixed to a specific location, rather than responding more generically to any instance of a non-spatial attribute. In other words, the what/where conjunction is unique. Furthermore, the uniqueness of the what/where conjunction may be the fundamental building block of spatial memory and navigation. In reviewing the hippocampal literature, Howard Eichenbaum (2017) concludes that ‘the hippocampal system is not dedicated to spatial cognition and navigation, but organizes experiences in memory, for which spatial mapping and navigation are both a metaphor for and a prominent application of relational memory organization.’”

      With a focus on episodic memory, there should be a mention of the temporal component of memory. While it may rightfully be beyond the scope of this model, it's confusing to omit time completely from the discussion. 

      This issue and several others are now addressed in a new section in the introduction titled ‘The Scope of the Proposed Model’. That section reads:

      “The reported simulations explain why most mEC cell types in the rodent literature appear to be spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). Assuming that rodents can form non-spatial memories, rodent hippocampus must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial. Other literatures with other species find non-spatial representations in MTL (Gulli et al., 2020; Quiroga et al., 2005; Wixted et al., 2014) and non-spatial hippocampal memory encoding has been found in rodents (Liu et al., 2012; McEchron & Disterhoft, 1999). The proposed memory model is compatible with these results – the ideas contained in this model could be applied to nonspatial memory representations. However, surveys of cell types in rodent entorhinal cortex seem to indicate that most cells are spatial (Boccara et al., 2010; Diehl et al., 2017; Grieves & Jeffery, 2017). How can the rodent hippocampus encode nonspatial memories if most of its input is spatial? The goal of the reported simulations is to explain the apparent paucity of non-spatial cells in rodent entorhinal cortex by proposing that grid cells have been misclassified as spatial (see also Luo et al., 2024).

      Given the simplicity of the proposed model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed (Kraus et al., 2015) is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations (Buzsáki & Moser, 2013). The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental (Langston et al., 2010; Muessig et al., 2015; Wills et al., 2012) or evolutionary (Rodrıguez et al., 2002) time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells and it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      This model has the potential to explain context effects in memory (Godden & Baddeley, 1975; Gulli et al., 2020; Howard et al., 2005). According to this model, different grid cells represent different non-spatial characteristics and place cells represent the combination of these “context” factors and location. In the simulation, just one grid cell is simulated but the same results would emerge when simulating hundreds of different non-spatial inputs provided that all of the simulated non-spatial inputs exist throughout the recording session. However, there is evidence that hippocampus can explicitly represent the passage of time (Eichenbaum, 2014), and time is assuredly an important factor in defining episodic memory (Bright et al., 2020). Thus, although the current model addresses unique combinations of what and where, it is left to future work to incorporate representations of when in the memory model.”

      I recommend explaining the motivation of the theory in more detail in the introduction. It reads as "what if it's like this?" It would be helpful to instead highlight the limitations of current theories and argue why this theory is either a better fit for the data or is logically simpler. 

      This issue and several others are now addressed in the new section in the introduction titled ‘Why Model the Rodent Navigation Literature with a Memory Model?’, which I quoted above in response to the public reviews.

      It's worth considering shortening the results section to include only those that most convincingly support the main claim. The manuscript is quite long and appears to lack focus at times. 

      Thank you for this suggestion. To make the paper more readable and easier for different readers with different interests to choose different aspects of the results to read, the second half of the results have been put in an appendix. More specifically, the second half of the results concerned place cells rather than grid cells. Thus, in this revision, the main text concerns grid cell results and the appendix concerns place cell results.

      The discussion of path dependence on the formation of the grid pattern is important but only briefly discussed. It may be useful to add simulations testing whether different paths (not random walks) produce distorted grid patterns. 

      The short answer is that the path doesn’t affect things in general. The consolidation rule ensures equally spaced memories even if, for instance, one side of the enclosure is explored much more than the other side. As just one example, I have run simulations with a radial arm maze and even though the animal is constrained to only run on the maze arms. The memories still arrange hexagonally as memories become pushed outside the arms. Rather than adding additional simulations to study, I now briefly describe this in the model methods:

      “Of note, the ability of the model to produce grid cell responses does not depend on this decision to simulate an animal taking a random walk – the same results emerge if the animal is more systematic in its path. All that matters for producing grid cell responses is that the animal visits all locations and that the animal takes on different head directions for the same location in the case of simulations that also include head direction as an input to hippocampal place cells.”

      I struggle to understand in Figure 3 why retrieval strength ought to scale monotonically with Euclidean distance, and why that justifies a more complex model (three non-orthogonal dimensions). 

      The introduction to this section now reads: “Animals can plan novel straight line paths to reach a known position and evidence suggests they do so by learning Euclidean representations of space (Cheng & Gallistel, 2014; Normand & Boesch, 2009; Wilkie, 1989). Thus, it was assumed that hippocampal place cells represent positions in Euclidean space (as opposed to non-Euclidean space, such a occurs with a city-block metric).”

      p.17 "although the representational space is a torus (or more specifically a three-torus), it is assumed that the real-world two-dimensional surface is only a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut)." I fail to understand how the realworld surface is only a part of the torus. In the existing theoretical and experimental work on toroidal topology of grid cell activity, the torus represents a very small fraction of the real world, and repeating activity on the toroidal manifold is a crucial feature of how it maps 2D space in a regular manner. Why then here do you want the torus to be larger than the realworld? 

      This section has been rewritten to better explain these assumptions. The relevant paragraphs now read:

      “The cosine tuning curves of the simulated border cells represent distance from the border on both sides of the border (i.e., firing rate increases as the animal approaches the border from either the inside or the outside of the enclosure). Experimental procedures do not allow the animal to experience locations immediately outside the enclosure, but these locations remain an important part of the hypothetic representation, particularly when considering the modification of memories through consolidation (i.e., a memory created inside the enclosure might be moved to a location outside the enclosure). This symmetry about the border cell’s preferred location is needed to maintain an unbiased representation, with a constant sum of squares for the border cell inputs (see methods section). Rather than using linear dimensions, all dimensions were assumed to be circular to keep the model relatively simple. This assumption was made because head direction is necessarily a circular dimension and by having all dimensions be circular, it is easy to combine dimensions in a consistent manner to produce multidimensional hippocampal place cell memories. Thus, the border cells define a torus (or more accurately a three-torus) of possible locations. This provides a hypothetical space of locations that could be represented.

      In light of the assumption to represent border cells with a circular dimension, when a memory is pushed outside the East wall of the enclosure, it would necessarily be moved to the West wall of the enclosure if the period of the circular dimension was equal to the width of the enclosure. If this were true, then the partial grid field on one side of the enclosure would match up with its remainder on the other side. Such a situation would cause the animal to become completely confused regarding opposite sides of the enclosure (a location on the West wall would be indistinguishable from the corresponding location on the East wall). To reduce confusion between opposite sides of the enclosure, the width of the enclosure in which the animal navigated (Figure 5) was assumed to be half as wide as the full period of the border cells. In other words, although the space of possible representations was a three-torus, it was assumed that the real-world twodimensional enclosure encompassed a section of the torus (e.g., a square piece of tape stuck onto the surface of a donut). The torus is better thought of as “playing field” in which different sizes and shapes of enclosure can be represented (i.e., different sizes and shapes of tape placed on the donut). Furthermore, this assumption provides representational space that is outside the box without such locations wrapping around to the opposite side of the box.”

      p.28 "More specifically, egocentric grid cells (e.g., head direction conjunctive grid cells) require stabilization of the place cell memories in the face of ongoing consolidation whereas allocentric grid cells reflect on-average place field positions." and p.32 "if place cells represent episodic memories, it seems natural that they should include head direction (an egocentric viewpoint)." But the head direction signal is not egocentric, it is allocentric. I'm unsure whether this is a typo or a potentially more serious conceptual misunderstanding. 

      Any reference to egocentric has been removed in this revision. In the initial submission, when I used egocentric, I was referring to memories that depended on the head direction of the animal at the time of memory formation. I was using “egocentric” in relation to whether the memory was related to the animal’s personal bodily experience at the time of memory formation. But I concede that this is confusing since the ego/allo distinction is typically used to differentiate angular directions that are relative to the person (left/right) versus earth (East/West). Instead, throughout the manuscript I now refer to these as view-dependent memories since head direction would entail having a different view of the environment at the time of memory formation. I still refer to the stacking of multiple view-dependent memories on the same X/Y location as being the development of an allocentric representation however, since this can be thought of as one way to learn a cognitive map of the enclosure that is view independent.

      p.37 "But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would have appeared to undergo global remapping as their positions rotated by 90 degrees and the grid pattern would have also rotated." But this would not be interpreted as global remapping by standard analyses of place and grid cell responses. A coherent rotation of firing patterns is not interpreted as remapping. 

      This sentence now reads: “But if the border cells had changed their alignment with the new enclosure (e.g., if the E border dimension aligned with the North-South borders), then the place cells would remain in their same positions relative to the now-rotated borders (i.e., no remapping relative to the enclosure) and the corresponding grid cells would also retain their same alignment relative to the enclosure.”

      p.37 "this is more accurately described as partial remapping (nearly all place fields were unaffected)." If nearly all place fields were unaffected, this should be interpreted as a stable map. Partial remapping is a mix of stability, rate remapping, and global remapping within a population of place cells. 

      This sentence has been removed.

      p.40 "The dependence of grid cell responses on memory may help explain why grid cells have been found for bats crawling on a two-dimensional surface (Yartsev et al., 2011), but three-dimensional grid cells have never been observed for flying bats." This is not true. Ginosar et al. (2021) observed 3D grid cells in flying bats.  

      Thank you for highlighting this issue. In the initial submission I was using “grid cell” to mean a cell that produced a precise hexagonal grid, which is not the case for the 3D grid cells in bats. In this revision, I now discuss grid cell that produce irregular grid fields, writing:

      “According to this model, hexagonally arranged grid cells should be the exception rather than the rule when considering more naturalistic environments. In a more ecologically valid situation, such as with landmarks, varied sounds, food sources, threats, and interactions with conspecifics, there may still be remembered locations were events occurred or remembered properties can be found, but because the non-spatial properties are non-uniform in the environment, the arrangement of memory feedback will be irregular, reflecting the varied nature of the environment. This may explain the finding that even in a situation where there are regular hexagonal grid cells, there are often irregular non-grid cells that have a reliable multi-location firing field, but the arrangement of the firing fields is irregular (Diehl et al., 2017). For instance, even when navigating in an enclosure that has uniform properties as dictated by experimental procedures, they may be other properties that were not well-controlled (e.g., a view of exterior lighting in some locations but not others), and these uncontrolled properties may produce an irregular grid (i.e., because the uncontrolled properties are reliably associated with some locations but not others, hippocampal memory feedback triggers retrieval of those properties in the associations locations).

      In this memory model, there are other situations in which an irregular but reliable multi-location grid may occur, even when everything is well controlled. In the reported simulations, when the hippocampal place cells were based on variation in X/Y (as defined by Border cells), nothing else changed as a function of location, and the model rapidly produced a precise hexagonal arrangement of hippocampal place cell memories. When head direction was included (i.e., real-world variation in X, Y, and head direction), the model still produced a hexagonal arrangement as per face centered cubic packing of memories, but this precise arrangement was slower to emerge, with place cells continuing to shift their positions until the borders of the enclosure were sufficiently well learned from multiple viewpoints. If there is realworld variation in four or more dimensions, as is likely the case in a more ecologically valid situation, it will be even harder for place cell memories to settle on a precise regular lattice. Furthermore, in the case of four dimensions, mathematicians studying the “sphere packing problem” recently concluded that densest packing is irregular (Campos et al., 2023). This may explain why the multifield grid cells for freely flying bats have a systematic minimum distance between firing fields, but their arrangement is globally irregular (Ginosar et al., 2021). Assuming that the memories encoded by a bat include not just the three realworld dimensions of variation, but also head direction, the grid will likely be irregular even under optimal conditions of laboratory control.”

      Multiple typos are found on page 25, end of paragraph 3: "More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells."

      As detailed above in the response the public reviews, this paragraph has been rewritten.

    1. eLife Assessment

      The authors studied the development of mesentery borders in the rice coral Montipora, a new experimental system, to complement existing data from the sea anemone Nematostella. They make a solid case that in Montipora, there is a sequence of Hox-Gbx genes whose staggered expression in the unsegmented larva is suggestive of their role in subdividing the gastric cavity into repeated units bordered by mesenteries, as in the sea anemone Nematostella. Pharmacological experiments also point to the involvement of the BMP pathway in this process, but additional experiments validating this are necessary. This is a valuable contribution to the field of cnidarian evolution, suggesting that BMP- and "Hox-Gbx code"-dependent patterning of the directive axis was ancestral for Anthozoa.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript of He et al. compares the roles of Hox/Gbx genes between the well-established anthozoan model, the burrowing sea anemone Nematostella, and the new scleractinian model Montipora. The authors show staggered expression of Anthox6a.1, Anthox8 and Gbx of the Montipora larva and argue that their BMP-dependent expression is responsible for the segmentation of the endomesoderm, just like they have previously demonstrated in Nematostella (despite some differences in the timing, formation of extra mesenteries, etc). The authors posit that Hox/Gbx-dependent segmentation of the endomesoderm represents an ancestral anthozoan trait. The study addresses a remarkably interesting question, but it has several important shortcomings, which the authors should try to rectify.

      Strengths:

      The authors introduce a new scleractinian model Montipora and present interesting data on the composition of its compact Hox cluster, its embryonic and larval development, metamorphosis, and segmentation. They also show staggered expression of Gbx, Anthox6a.1, and Anthox8, which is suggestive of their involvement in the partitioning of the gastrodermis of the polyp.

      Weaknesses:

      He et al. claim that Gbx and Hox genes are responsible for the segmentation of the directive axis in Montipora based on expression patterns of these genes before the onset of segmentation. In the absence of functional analyses, this claim (although likely correct) is not supported. Moreover, the authors do not show that staggered Gbx and Hox gene expression correlates with the position of the segment boundaries.

      The authors use two inhibitors of BMP signaling and show that segmentation is lost in the treated animals. However, they do not provide controls, which would show that the effect of the treatment is specific to the loss of BMP function. Moreover, their transcriptomic analyses suggest that the whole BMP signaling system in Montipora is wired completely differently than in Nematostella, but they do not acknowledge and discuss this striking difference. If true, this is a very interesting result, but it requires thorough validation.

    3. Reviewer #2 (Public review):

      Building on their detailed dissection of the role of Hox-Gbx genes in endomesodermal segmentation in Nematostella, He and colleagues attempt to understand the evolutionary conservation of this process in anthozoans. In a move that should be congratulated, the authors perform this work in the coral M. capitata, a species that is not well established in the lab. The authors show convincing expression data using both RNAseq and in-situ hybridization and discover the conserved expression of Hox-Gbx genes preceding the segmentation of the enodmesoderm. The authors further attempt to understand whether BMP signalling is playing a role in this process and present data that certainly points to this being the case.

      Strength:

      The overall quality of the data is very high and the authors show very convincing expression data for the Hox-Gbx genes as well as putting forward a well-thought-out hypothesis for segment evolution.

      Weakness:

      There are a number of weaknesses in the paper which I believe can be easily addressed:

      (1) The authors in many cases claim to have provided functional evidence for the role of Hox-Gbx genes in M. capitata. This is not, however, the case, and although the expression data along with their previous work in Nematostella make their claims very likely I still believe it is necessary to set a higher bar for claiming to understand function. In the abstract, for example, they claim: "These findings demonstrate the existence of a functionally conserved Hox-Gbx module....", something which is not substantiated by the data presented. At the end of the introduction, they say they "systematically interrogate the molecular functions of Hox-Gbx genes" (line 75) which again is not what is presented in the manuscript. Finally, on line 289-291 the authors state: "Taken together, our findings strongly suggest that the heterochronic deployment of a conserved Hox-Gbx module contributes to the divergent adult body plans observed between Edwardsiidae and other anthozoans." I would remove "Strongly" given the absence of functional data. There are also other examples where functional understanding is implied and I would suggest the authors tone this down throughout the manuscript.

      (2) On Line 185, the authors state "To determine the function of the Hox-Gbx network in M.capitata segmentation..." when introducing their BMP experiments. I would reword this since they are looking at BMP signalling and do not look directly at Hox-Gbx function.

      (3) Although the BMP inhibitor experiments are very interesting I think there is a lack of basic understanding of BMP signalling in this system. Where are the BMP components expressed and how would this match with the hypothesis derived from the data? The authors present some expression patterns in Figure S3 but do not discuss them. In addition, the authors do not show pSMAD staining etc, and do not validate that the inhibitors have an effect on this. I entirely understand the difficulties in doing such experiments in a system like this and would not suggest the authors should now do them but an acknowledgment of this in the discussion would be very welcome.

      (4) In both lines 88 and 294 the authors talk about the mechanism of gastrulation. It is not clear to me how they infer this from the figure. If the authors could include some more high-resolution images that show this it would be very helpful and interesting.

      (5) On line 169/170 the authors state that two Anthox6 paralogs, McAnthox6 and McAnthox6.1, were specifically expressed at the time of settlement. This is not what I see in the images. I see that McAnthox6 is expressed at 14 hpf more strongly than at the later time point. The authors should clarify this point.

      (6) On lines 259-261 the authors state "How temporally and spatially coordinated gene expression can be achieved in this scenario remains an interesting and open question." This seems like a strange statement to include given that they have shown that there is no spatial and temporal collinearity in cnidarians. Surely it is not an open question to ask how it would work if there is none. I would simply remove this.

      (7) The authors should cite the sources of information contained in Fig. S2 including how orthology was assigned.

    4. Reviewer #3 (Public review):

      Summary:

      The authors analyze the expression of a series of genes from the Hox/Gbx family of transcription factors in the settling larva of the rice coral Montipora capitata. The first achievement of the work is developing a protocol for artificial induction of settlement in this species. In the synchronized settlers, the authors were able to follow the sequence of the subdivision of the body cavity to form individual cavities separated by mesenteries. This process has been previously studied in the starlet sea anemone, Nematostella vectensies, and this same group showed that there is a spatio-temporal sequence of expression of genes from the Hox/Gbx group, reminiscent of the sequence of Hox genes in bilaterians. The authors now repeat this analysis with orthologous genes in Montipora, and demonstrate a similar pattern. Finally, they manipulate the BMP pathway and demonstrate that in the absence of BMP signaling, the subdivision of the gastric cavity is abrogated.

      Strengths:

      The authors have developed a new experimental system for embryological work on cnidarians, where only a handful of systems are available. They identified orthologs of a number of homeobox genes and tested their expression. There is a detailed description of the sequence of the formation of the mesenteries, which differs from that of Namatostella, raising interesting questions about the evolution of mesentery number and the homology of mesenteries.

      Weaknesses:

      The in situ hybridization experiments describing the expression of the Hox/Gbx genes are not as clean and sharp as could be hoped for. This is evidently a limitation of the system. The discussion of the evolution of mesentery number does not really give new insights into the question (although just raising the discussion is interesting in its own right).

    1. eLife Assessment

      This manuscript develops a theoretical model of osmotic pressure adaptation in microbes by osmolyte production and wall synthesis. The prediction of a rapid increase in growth rate on osmotic shock is experimentally validated using fission yeast. By using phenomenological rules rather than detailed molecular mechanisms, the model can potentially apply to a wide range of microbes, providing important insights that would be of interest to the wider community studying the regulation of cell size and mechanics. However, because the core assumptions of the model have not been tested across a range of microbial organisms, the evidence for the universality of the model remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      A theoretical model for microbial osmoresponse was proposed. The model assumes simple phenomenological rules: (i) the change of free water volume in the cell due to osmotic imbalance based on pressure balance, (ii) Osmoregulation that assumes change of the proteome partitioning depending on the osmotic pressure that affects the osmolyte-producing protein production, (iii) The cell-wall synthesis regulation where the change of the turgor pressure to the cell-wall synthesis efficiency to go back to the target turgor pressure, (iv) Effect of Intracellular crowding assuming that the biochemical reactions slow down for more crowding and stops when the protein density (protein mass divided by free water volume) reaches a critical value. The parameter values were found in the literature or obtained by fitting to the experimental data. The authors compare the model behavior with various microorganismcs (E. coli, B. subtils, S. Cerevisiae, S. pombe), and successfully reproduced the overall trend (steady state behavior for many of them, dynamics for S. pombe). In addition, the model predicts non-trivial behavior such as the fast cell growth just after the hypoosmotic shock, which is consistent with experimental observation. The authors further make experimentally testable predictions regarding mutant behavior and transient dynamics.

      Strength:

      The theory assumes simple mechanistic dependence between core variables without going into specific molecular mechanisms of regulations. The simplicity allows the theory to apply to different organisms by adjusting the time scales with parameters, and the model successfully explains broad classes of observed behaviours. Mathematically, the model provides analytical expressions of the parameter dependences and an understanding of the dynamics through the phase space without being buried in the detail. This theory can serve as a base to discuss the universality and diversity of microbial osmoresponse.

      Weakness:

      The core part of this model is that everything is coupled with growth physiology, and, as far as I understand, the assumption (iv) (eq. 8) that imposes the global reaction rate dependence on crowding plays a crucial role. I would think this is a strong and interesting assumption. However, the abstract or discussion does not discuss the importance of this assumption. In addition, the paper does not discuss gene regulation explicitly, and some comparison with a molecular mechanism-oriented model may be beneficial to highlight the pros and cons of the current approach.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Ye et al. have developed a theoretical model of osmotic pressure adaptation by osmolyte production and wall synthesis.

      Strengths:

      They validate their model predictions of a rapid increase in growth rate on osmotic shock experimentally using fission yeast. The study has several interesting insights which are of interest to the wider community of cell size and mechanics.

      Weaknesses:

      Multiple aspects of this manuscript require addressing, in terms of clarity and consistency with previous literature. The specifics are listed as major and minor comments.

      Major comments:

      (1) The motivation for the work is weak and needs more clarity.

      (2) The link between sections is very frequently missing. The authors directly address the problem that they are trying to solve without any motivation in the results section.

      (3) The parameters used in the models (symbols) need to be explained better to make the paper more readable.

      (4) Throughout the paper, the authors keep switching between organisms that they are modelling. There needs to be some consistency in this aspect where they mention what organism they are trying to model, since some assumptions that they make may not be valid for both yeast as well as bacteria.

      (5) The extent of universality of osmoregulation i.e the limitations are not very well highlighted.

      (6) Line 198-200: It is not clear in the text what organisms the authors are writing about here. "Experiments suggested that the turgor pressure induce cell-wall synthesis, e.g., through mechanosensors on cell membrane [45, 46], by increasing the pore size of the peptidoglycan network [5], and by accelerating the moving velocity of the cell-wall synthesis machinery [31]". This however is untrue for bacteria as shown by the study (reference 22 is this paper:  E. Rojas, J. A. Theriot, and K. C. Huang, Response of escherichia coli growth rate to osmotic shock, Proceedings of the National Academy of Sciences 111, 7807 (2014).

      (7) The time scale of reactions to hyperosmotic shocks does not agree with previous literature (reference 22). Therefore defining which organism you are looking at is important. Hence the statement " Because the timescale of the osmoresponse process, which is around hours (Figure 3B), is much longer than the timescale of the supergrowth phase, which is about 20 minutes, the turgor pressure at the growth rate peak can be well approximated by its immediate value after the shock." from line 447 does not seem to make sense. The authors need to address this.

    1. eLife Assessment

      This potentially important study describes the progressive transformation of olfactory information across five different brain regions in the olfactory pathway. While the dataset could be of broad interest to olfactory researchers, the analysis is incomplete and would benefit from a reconsideration of the data sampling window, a more uniform analysis framework, and greater clarity of presentation.

    2. Reviewer #1 (Public review):

      In this important study, the authors characterized the transformation of neural representations of olfactory stimuli from the primary sensory cortex to multisensory regions in the medial temporal lobe and investigated how they were affected by non-associative learning. The authors used high-density silicon probe recordings from five different cortical regions while familiar vs. novel odors were presented to a head-restrained mouse. This is a timely study because unlike other sensory systems (e.g., vision), the progressive transformation of olfactory information is still poorly understood. The authors report that both odor identity and experience are encoded by all of these five cortical areas but nonetheless some themes emerge. Single neuron tuning of odor identity is broad in the sensory cortices but becomes narrowly tuned in hippocampal regions. Furthermore, while experience affects neuronal response magnitudes in early sensory cortices, it changes the proportion of active neurons in hippocampal regions. Thus, this study is an important step forward in the ongoing quest to understand how olfactory information is progressively transformed along the olfactory pathway.

      The study is well-executed. The direct comparison of neuronal representations from five different brain regions is impressive. Conclusions are based on single neuronal level as well as population level decoding analyses. Among all the reported results, one stands out for being remarkably robust. The authors show that the anterior olfactory nucleus (AON), which receives direct input from the olfactory bulb output neurons, was far superior at decoding odor identity as well as novelty compared to all the other brain regions. This is perhaps surprising because the other primary sensory region - the piriform cortex - has been thought to be the canonical site for representing odor identity. A vast majority of studies have focused on aPCx, but direct comparisons between odor coding in the AON and aPCx are rare. The experimental design of this current study allowed the authors to do so and the AON was found to convincingly outperform aPCx. Although this result goes against the canonical model, it is consistent with a few recent studies including one that predicted this outcome based on anatomical and functional comparisons between the AON-projecting tufted cells vs. the aPCx-projecting mitral cells in the olfactory bulb (Chae, Banerjee et. al. 2022). Future experiments are needed to probe the circuit mechanisms that generate this important difference between the two primary olfactory cortices as well as their potential causal roles in odor identification.

      The authors were also interested in how familiarity vs. novelty affects neuronal representation across all these brain regions. One weakness of this study is that neuronal responses were not measured during the process of habituation. Neuronal responses were measured after four days of daily exposure to a few odors (familiar) and then some other novel odors were introduced. This creates a confound because the novel vs. familiar stimuli are different odorants and that itself can lead to drastic differences in evoked neural responses. Although the authors try to rule out this confound by doing a clever decoding and Euclidian distance analysis, an alternate more straightforward strategy would have been to measure neuronal activity for each odorant during the process of habituation.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates how olfactory representations are transformed along the cortico-hippocampal pathway in mice during a non-associative learning paradigm involving novel and familiar odors. By recording single-unit activity in several key brain regions (AON, aPCx, LEC, CA1, and SUB), the authors aim to elucidate how stimulus identity and experience are encoded and how these representations change across the pathway.

      The study addresses an important question in sensory neuroscience regarding the interplay between sensory processing and signaling novelty/familiarity. It provides insights into how the brain processes and retains sensory experiences, suggesting that the earlier stations in the olfactory pathway, the AON aPCx, play a central role in detecting novelty and encoding odor, while areas deeper into the pathway (LEC, CA1 & Sub) are more sparse and encodes odor identity but not novelty/familiarity. However, there are several concerns related to methodology, data interpretation, and the strength of the conclusions drawn.

      Strengths:

      The authors combine the use of modern tools to obtain high-density recordings from large populations of neurons at different stages of the olfactory system (although mostly one region at a time) with elegant data analyses to study an important and interesting question.

      Weaknesses:

      (1) The first and biggest problem I have with this paper is that it is very confusing, and the results seem to be all over the place. In some parts, it seems like the AON and aPCx are more sensitive to novelty; in others, it seems the other way around. I find their metrics confusing and unconvincing. For example, the example cells in Figure 1C show an AON neuron with a very low spontaneous firing rate and a CA1 with a much higher firing rate, but the opposite is true in Figure 2A. So, what are we to make of Figure 2C that shows the difference in firing rates between novel vs. familiar odors measured as a difference in spikes/sec. This seems nearly meaningless. The authors could have used a difference in Z-scored responses to normalize different baseline activity levels. (This is just one example of a problem with the methodology.)

      (2) There are a lot of high-level data analyses (e.g., decoding, analyzing decoding errors, calculating mutual information, calculating distances in state space, etc.) but very little neural data (except for Figure 2C, and see my comment above about how this is flawed). So, if responses to novel vs. familiar odors are different in the AON and aPCx, how are they different? Why is decoding accuracy better for novel odors in CA1 but better for familiar odors in SUB (Figure 3A)? The authors identify a small subset of neurons that have unusually high weights in the SVM analyses that contribute to decoding novelty, but they don't tell us which neurons these are and how they are responding differently to novel vs. familiar odors.

      (3) The authors call AON and aPCx "primary sensory cortices" and LEC, CA1, and Sub "multisensory areas". This is a straw man argument. For example, we now know that PCx encodes multimodal signals (Poo et al. 2021, Federman et al., 2024; Kehl et al., 2024), and LEC receives direct OB inputs, which has traditionally been the criterion for being considered a "primary olfactory cortical area". So, this terminology is outdated and wrong, and although it suits the authors' needs here in drawing distinctions, it is simplistic and not helpful moving forward.

      (4) Why not simply report z-scored firing rates for all neurons as a function of trial number? (e.g., Jacobson & Friedrich, 2018). Figure 2C is not sufficient. For example, in the Discussion, they say, "novel stimuli caused larger increases in firing rates than familiar stimuli" (L. 270), but what does this mean? Odors typically increase the firing in some neurons and suppress firing in others. Where does the delta come from? Is this because novel odors more strongly activate neurons that increase their firing or because familiar odors more strongly suppress neurons?

      (5) Lines 122-124 - If cells in AON and aPCx responded the same way to novel and familiar odors, then we would say that they only encode for odor and not at all for experience. So, I don't understand why the authors say these areas code for a "mixed representation of chemical identity and experience." "On the other hand," if LEC, CA1, and SUB are odor selective and only encode novel odors, then these areas, not AON and aPCx, are the jointly encoding chemical identity and experience. Also, I do not understand why, here, they say that AON and PCx respond to both while LEC, CA1, and SUB were selective for novel stimuli, but the authors then go on to argue that novelty is encoded in the AON and PCx, but not in the LEC, CA1, and SUB.

      (6) Lines 132-140 - As presented in the text and the figure, this section is poorly written and confusing. Their use of the word "shuffled" is a major source of this confusion, because this typically is the control that produces outcomes at the chance level. More importantly, they did the wrong analysis here. The better and, I think, the only way to do this analysis correctly is to train on some of the odors and test on an untrained odor (i.e., what Bernardi et al., 2021 called "cross-condition generalization performance"; CCGP).

    4. Reviewer #3 (Public review):

      In this manuscript, the authors investigate how odor-evoked neural activity is modulated by experience within the olfactory-hippocampal network. The authors perform extracellular recordings in the anterior olfactory nucleus (AON), the anterior piriform (aPCx) and lateral entorhinal cortex (LEC), the hippocampus (CA1), and the subiculum (SUB), in naïve mice and in mice repeatedly exposed to the same odorants. They determine the response properties of individual neurons and use population decoding analyses to assess the effect of experience on odor information coding across these regions.

      The authors' findings show that odor identity is represented in all recorded areas, but that the response magnitude and selectivity of neurons are differentially modulated by experience across the olfactory-hippocampal pathway.

      Overall, this work represents a valuable multi-region data set of odor-evoked neural activity. However, limitations in the interpretability of odor experience of the behavioral paradigm, and limitations in experimental design and analysis, restrict the conclusions that can be drawn from this study.

    1. eLife Assessment

      In this useful study, the authors use published scRNA-seq data to highlight the importance of mast cells (MCs) in TB granulomas, reporting a comparative assessment of chymase- and tryptase-expressing MCs in the lungs of tuberculosis-infected individuals and non-human primates, with MC-deficient mice showing reduced lung bacterial burden and pathology during infection. Whilst the findings are helpful, the evidence to support conclusions is inconsistent across models and thus incomplete. Specifically, the data supporting a role for MCs in coordinating cytokine responses to modulate pathology, susceptibility to tuberculosis, and dissemination during infection are weak.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology.

      Strengths:

      (1) The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology.

      (2) It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB.

      (3) The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation.

      Weaknesses:

      (1) The evidence is inconsistent across models, leading to divergent conclusions that weaken the overall impact of the study.

      (2) Key claims, such as MC-mediated cytokine responses and conversion of MC subtypes in granulomas, are not well-supported by the data presented.

      (3) Several figures are either contradictory or lack clarity, and important discrepancies, such as the differences between mouse and human data, are not adequately discussed.

      (4) Certain data and conclusions require further clarification or supporting evidence to be fully convincing.

    3. Reviewer #2 (Public review):

      Summary:

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations.

      Strengths:

      (1) The authors have carried out a sufficient literature review to establish the background and significance of their study.

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis.

      Weaknesses:

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers.

      (2) Throughout the manuscript, the authors have mislabelled the legends for WT B6 mice and mast cell-deficient mice. As a result, the discussion and claims made in relation to the data do not align with the corresponding graphs (Figure 1B, 3, 4, and S2). This discrepancy undermines the accuracy of the conclusions drawn from the results.

      (3) The results discussed in the paper do not add a significant novel aspect to the field of tuberculosis, as the majority of the results discussed in Figure 1-2 are already known and are a re-validation of previous literature.

      (4) The claims made in the manuscript are only partially supported by the presented data. Additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

    4. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The study by Gupta et al. investigates the role of mast cells (MCs) in tuberculosis (TB) by examining their accumulation in the lungs of M. tuberculosis-infected individuals, non-human primates, and mice. The authors suggest that MCs expressing chymase and tryptase contribute to the pathology of TB and influence bacterial burden, with MC-deficient mice showing reduced lung bacterial load and pathology.

      Strengths:

      (1) The study addresses an important and novel topic, exploring the potential role of mast cells in TB pathology.

      (2) It incorporates data from multiple models, including human, non-human primates, and mice, providing a broad perspective on MC involvement in TB.

      (3) The finding that MC-deficient mice exhibit reduced lung bacterial burden is an interesting and potentially significant observation.

      Weaknesses:

      (1) The evidence is inconsistent across models, leading to divergent conclusions that weaken the overall impact of the study.

      The strength of the study is the use of multiple models including mouse, non-human primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Key claims, such as MC-mediated cytokine responses and conversion of MC subtypes in granulomas, are not well-supported by the data presented.

      To address the reviewer’s comments, we will carry out further experimentation to strengthen the link between MC subtypes and cytokine responses.

      (3) Several figures are either contradictory or lack clarity, and important discrepancies, such as the differences between mouse and human data, are not adequately discussed.

      We will further clarify the figures and streamline the discussions between the different models used in the study.

      (4) Certain data and conclusions require further clarification or supporting evidence to be fully convincing.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

      Reviewer #2 (Public review):

      Summary:

      The submitted manuscript aims to characterize the role of mast cells in TB granuloma. The manuscript reports heterogeneity in mast cell populations present within the granulomas of tuberculosis patients. With the help of previously published scRNAseq data, the authors identify transcriptional signatures associated with distinct subpopulations.

      Strengths:

      (1) The authors have carried out a sufficient literature review to establish the background and significance of their study.

      (2) The manuscript utilizes a mast cell-deficient mouse model, which demonstrates improved lung pathology during Mtb infection, suggesting mast cells as a potential novel target for developing host-directed therapies (HDT) against tuberculosis.

      Weaknesses:

      (1) The manuscript requires significant improvement, particularly in the clarity of the experimental design, as well as in the interpretation and discussion of the results. Enhanced focus on these areas will provide better coherence and understanding for the readers.

      The strength of the study is the use of multiple models including mouse, non-human primate as well as human samples. The conclusions have now been refined to reflect the complexity of the disease and the use of multiple models.

      (2) Throughout the manuscript, the authors have mislabelled the legends for WT B6 mice and mast cell-deficient mice. As a result, the discussion and claims made in relation to the data do not align with the corresponding graphs (Figure 1B, 3, 4, and S2). This discrepancy undermines the accuracy of the conclusions drawn from the results.

      We apologize for the discrepancy which will be corrected in the revised manuscript

      (3) The results discussed in the paper do not add a significant novel aspect to the field of tuberculosis, as the majority of the results discussed in Figure 1-2 are already known and are a re-validation of previous literature.

      This is the first study which has used mouse, NHP and human TB samples from Mtb infection to characterize and validate the role of MC in TB. We believe the current study provides significant novel insights into the role of MC in TB.

      (4) The claims made in the manuscript are only partially supported by the presented data. Additional extensive experiments are necessary to strengthen the findings and enhance the overall scientific contribution of the work.

      We will either provide clarification or supporting evidence for some of the key conclusions in the paper.

    1. eLife Assessment

      This interesting study explores whether tumor cells can manipulate their Hydra hosts, and includes important findings on the consequences for the fitness of the host Hydra. The evidence supporting these findings is convincing. The work will be of broad interest to many fields including development biology, evolutionary biology and tumor biology.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, BOUTRY et al examined a cnidarian Hydra model system where spontaneous tumors manifest in laboratory settings, and lineages featuring vertically transmitted neoplastic cells (via host budding) have been sustained for over 15 years. They observed that hydras harboring long-term transmissible tumors exhibit an unexpected augmentation in tentacle count. In addition, the presence of extra tentacles, enhancing the host's foraging efficiency, correlated with an elevated budding rate, thereby promoting tumor transmission vertically. This study provided the evidence that tumors, akin to parasitic entities, can also exert control over their hosts.

      Strengths:

      The manuscript is well-written, and the phenotype is intriguing.

    3. Reviewer #2 (Public review):

      Background and Summary: 

      This study addresses the intriguing question of whether and how tumours can develop in the freshwater polyp hydra and how they influence the fitness of the animals. Hydra is notable for its significant morphogenetic plasticity and nearly unlimited capacity for regeneration. While its growth through asexual reproduction (budding) and the associated processes of pattern formation have been extensively studied at the cellular level, the occurrence of tumours was only recently described in two strains of Hydra oligactis (Domazet-Lošo et al, 2014). Here, tumour-like tissue bulges formed within the ectodermal epithelial layer and contained increased numbers of interstitial cell-like cells which exhibited female germline markers, but none specific for somatic derivatives of interstitial stem cells (e.g., nematocytes, neurons or glandular cells). It seems likely that the cellular basis of these malformations is a misregulation of oogenesis. In wild-type polyps, interstitial-cell-related germline precursors give rise to oocytes and nurse cells, which are subsequently phagocytosed by the growing egg cell. By comparison, in the mutant strains, this uptake is disturbed, but the homeostasis between germline cells and epithelial cells must remain functional enabling further growth pattern formation in hydra. Determining whether this differentiation arrest constitutes a neoplasm also remains a challenge. 

      Clonal lines of both strains have been maintained in the laboratory for years and have also been used by Boutry and colleagues. They published two further papers on the ecological and evolutionary aspects of hydra tumour formation (Boutry et al 2022, 2023), which is also the focus of this manuscript. In their paper, the authors demonstrate an increase in the number of tentacles when "tumour tissue" was transplanted to intact gastric tissue of wildtype and mutant strains. While the impact on tentacle formation is relatively modest, small, it indicates a potential influence on the cross-talk between epithelial and interstitial cells in growth control (proportion regulation). The presented data are of interest, although the underlying molecular processes remain to be demonstrated. The authors offer a different interpretation. They conclude that this growth pattern (increased number of tentacles) is correlated with "reducing the burden on the host by (over-) compensating for the reproductive costs of tumours" and claim that "transmissible tumours in hydra have evolved strategies to manipulate the phenotype of their host". 

      Strength <br /> The question of whether and how tumours can develop in simple systems, here the freshwater polyp hydra, is of general interest. The authors describe transplantation experiments by using mutant strains that indicate an influence of tumour-like malformation on pattern formation. The experiments also suggest an interaction between epithelial cells and germline cells during oogenesis, interfering with the homeostatic growth control between the cell lineages. 

      Weaknesses <br /> Although it is stimulating to consider a fresh perspective from other disciplines (here, ecological and evolutionary aspects), it appears that this interpretation of the data (reducing the burden on the host by (over-) compensating for the reproductive costs of tumours) is somewhat beyond what can be reasonably inferred from the evidence presented. It is essential, particularly in the context of evolutionary biology, to conduct further analysis of the underlying cell biology of these intriguing mutant hydra strains. Such cellular analysis is a relatively straightforward approach that could provide a mechanistic understanding of the phenomenon described by the authors.