10,000 Matching Annotations
  1. Sep 2024
    1. Reviewer #3 (Public review):

      Summary:

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions.

      The authors present very nice time lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with discussion of an internally-consistent model that summarizes the results.

      Strengths:

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution timelapse imaging to track chloroplast dynamics under light limiting conditions.

      Weaknesses:

      The main weakness of the manuscript is the limited quantitative data. While it can be challenging to quantify dynamic intracellular events, quantification of these processes is important to appreciate the significance of these findings.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. eLife assessment

      This study provides valuable insights into the role of actin dynamics in regulating the transition of fusion models during homotypic fusion between late endosomes. The evidence supporting the authors' claims is convincing. However, while the observations are significant, the study could benefit from further exploration of the mechanistic details and physiological relevance.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript employs yolk sac visceral endoderm cells as a novel model for studying endosomal fusion, observing two distinct fusion behaviors: quick homotypic fusion between late endosomes, and slower heterotypic fusion between late endosomes and lysosomes. The mathematical modeling suggests that vesicle size critically influences the mode of fusion. Further investigations reveal that actin filaments are dynamically associated with late endosomal membranes, and are oriented in the x-y plane and along the apical-basal axis. Actin and Arf2/3 were shown to appear at the rear end of the endosomes along the moving direction suggesting polymerization of actin may provide force for the movement of endosomes. Additionally, the authors found that actin dynamics regulate homotypic and heterotypic fusion events in a different manner. The authors also provide evidence suggesting that Cofilin-dependent actin dynamics are involved in late endosome fusion.

      Strengths:

      The unique feature of this study is that the authors use yolk sac visceral endoderm cells to study endosomal fusion. Yolk sac visceral endoderm cells have huge endocytic vesicles, endosomes and lysosomes, offering an excellent system to explore endosomal fusion dynamics and the assembly of cellular factors on membranes. The manuscript provides a valuable and convincing observation of the modes of endosomal fusion and roles of actin dynamics in this process, and the conclusions of the study is justified by the data.

      Weaknesses:

      While the study offers compelling observations, it falls short in delivering clear mechanistic insights. Key questions remain unaddressed, such as the functional significance of actin filaments that extend apically in positioning late endosomes, the ways in which actin dynamics influence fusion events, and the functional implications of the slower bridge fusion process.

    3. Reviewer #3 (Public review):

      Summary:

      The authors found two endosomal fusion modes by live cell imaging of endosomes in yolk sac lateral endoderm cells of 8.5-day-old embryonic mice and described the fusion modes by mathematical models and simulations. They also showed that actin polymerization is involved in the regulation of one of the fusion modes.

      Strengths:

      The strength of this study is that the authors' claims are well supported by beautiful live cell images and theoretical models. By using specialized cells, yolk sac visceral endoderm cells, the live images of endosomal fusion, localization of actin-related molecules, and validation data from multiple inhibitor experiments are clear.

      Weaknesses:

      Although it would be out of scope of this study, there is no experimental verification of whether the mechanism of endosome fusion claimed by the authors occurs in general cells, so the article is limited to showing a phenomenon specific to yolk sac lateral endoderm cells. The methods used were very basic and solid. Most of the image analysis was performed manually, but the results were statistically tested.

      Summary:

      Seiichi Koike et al. studied two fusion models, explosive fusion, and bridge fusion, utilizing yolk sac visceral endoderm cells. They elucidated these two fusion models in vivo by employing mathematical modeling and incorporating fluctuations derived from actin dynamics as a key regulator for rapid homotypic fusion between late endosomes.

      Strengths:

      This study uncovered the role of actin dynamics in regulating the transition of fusion models in homotypic fusion between late endosomes and introduced a method for observing the fusion of single vesicles with two different targets.

      Weaknesses:

      The physiological significance of different fusion models is lacking.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript provides an interesting observation of the modes of endosomal fusion and roles of actin dynamics in this process and the conclusions of the study are justified by the data, there are concerns regarding the lack of important descriptions or quantification in some of the analyses and additional analyses are needed to strengthen this study. The major issues are outlined below:

      (1) The authors indicate that Zone 1 is within approximately 1 μm of the apical surface. What are the distances of Zone 2 and Zone 3 from this surface? It would be better if the authors could provide an explanation or hypothesis that explains the early endosomes, late endosomes, and lysosomes are not intermixed but separated along the z-axis.

      Thank you for pointing out this important issue. Following the comments, we have added an explanation about the depth of early endosomes, late endosomes, and lysosomes to the text (lines 123-124, 127-128, and 130-131). We have also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1B).

      Because endosomes go deeper and mature with repeated fusion and enlargement after endocytosis, early endosomes, late endosomes, and lysosomes are aligned along the z-axis, though the separation is not complete. In confocal microscopic observation, endolysosomal vesicles in VE cells are largely separated into different layers because they are huge and occupy a large space, and as a result, do not exist with much overlap. We have added the explanation to the text (lines 121-122).

      (2) The authors compared the size distribution of the late endosomes that underwent fusion with that of the total late endosomes in the observed area 5 min after labeling (Figure 2C). A similar quantification analysis should also be analyzed 15 min after labeling (Figure 3G).

      Thank you for the appropriate request. We have added the data showing the size distribution of the late endosomes that underwent fusion at 15 min after labeling, to Figure 3G.

      (3) While 3D reconstructions of actin filament patterns under normal conditions are presented (Figures 4 E-F), comparable analyses using cells treated with Cytochalasin D, Jasplakinolide, or S3 peptide needs to be performed.

      As requested by the referee, we have performed additional experiments to show 3D reconstructions of actin filaments on late endosomes after pretreatment with cytochalasin D, jasplakinolide, and S3 peptide. We show the data in new figures: Figure 7–figure supplement 1A, Figure 7–figure supplement 2, and Figure 9–figure supplement 1.

      (4) The authors should provide a clear description of how they quantified the fusion frequency. Why does the fusion frequency appear very low? Why do Cytochalasin D and jasplakinolide show different effects on heterotypic fusion?

      Thank you for pointing out this important issue. We have added the description of how the fusion frequency was quantified to the Materials and Methods (lines 643-645). Briefly, we counted the number of membrane fusion events and the number of late endosomes in the 400-s time-lapse images, and then calculated how many times a single late endosome underwent fusion per minute. The apparent fusion frequency is low because it is expressed in terms of frequency per vesicle per minute.

      As for the different effects of cytochalasin D and jasplakinolide on heterotypic fusion, we already discussed this in the manuscript (lines 537-558). In short, actin filaments extending in the apical-to-basal direction are relatively static and late endosomes receive sliding forces along the apical-basal axis by means of myosins (e.g., myosin V and myosin II) in heterotypic fusion. Thus, depolymerization of actin filaments by cytochalasin D treatment reduces heterotypic fusion, and conversely stabilization of actin filaments by jasplakinolide increases heterotypic fusion.

      (5) The authors need to analyze the distribution of actin filaments during homotypic fusion post-Cytochalasin D treatment.

      As requested by the referee, we have performed additional experiments to show the distribution of actin filaments during homotypic fusion of late endosomes after pretreatment with cytochalasin D. We show the data in a new figure: Figure 7–figure supplement 3.

      (6) Clarification is needed on whether overexpressing YFP-Cofilin led to the deterioration of cell functions.

      Thank you for the comments. As the reviewer pointed out, overexpression of cofilin can change cellular functions and actin architectures in cells (Aizawa et al., 1997; Popow-Wozniak et al., Histochem. Cell Biol., 2012, (138) 725-36). Although we did not observe apparent morphological changes of VE cells after YFP-cofilin expression, we cannot exclude the possibility that YFP-cofilin overexpression affected the distribution of actin filaments. Therefore, we have described this possibility in the text (lines 425-426).

      (7) Although the authors report that the S3 peptide does not affect heterotypic fusion, a reduction in average heterotypic fusion frequency post-treatment was detected (Figure 9G). The authors need to perform a statistical analysis of the quantification performed in Figure 9G.

      We apologize for this misleading graph representation. Because S3 peptide treatment did not change the fusion frequency significantly, we simply did not mark statistical significance in the previous graph. To clarify this point, we have added the label “n.s.” (not significant) to Figure 9G.

      (8) The authors need to provide the potential functional significance of apically extended actin filaments in positioning late endosomes in the discussion.

      We observed 3 different types of actin filaments in the apical region of VE cells (Figure 5). First, the actin mesh in zone 1, which does not interact directly with late endosomes, may function as a barrier preventing enlarged late endosomes from flowing backward from zone 2 to zone 1. Second, actin filaments extending from the apical to the basal direction on the surface of late endosomes are necessary for the movement of late endosomes toward lysosomes in a myosin-dependent manner. Third, the radial branched filaments on the surface of late endosomes temporarily polymerize in an Arp2/3-dependent manner and regulate the lateral movement of late endosomes. This actin organization coordinately regulates the position of late endosomes. We have added this explanation to the Discussion (lines 483-491).

      Reviewer #2 (Recommendations For The Authors):

      (1) What is the effect or physiological significance of the transition in fusion models?

      In material transport in cells, explosive fusion that completes membrane fusion quickly is more efficient and physiologically advantageous than slow bridge fusion. On the other hand, larger vesicle size is more effective in membrane trafficking than smaller size because large vesicles can transport a large amount of cargo molecules. However, as our mathematical modeling predicts, an increase in vesicle size leads to bridge fusion and decreases the transportation rate. Actin forces can resolve these conflicting effects because they convert the fusion mode from bridge to explosive in late endosomes in VE cells

      (2) I am confused about how to study heterotypic fusion between late endosomes and lysosomes using only transferrin labeling.

      We are sorry for any confusion this may have caused. Indeed, at first, we discovered that late endosomes shrank and disappeared after labeling of endocytic vesicles with transferrin only (Figure 3A). However, subsequently, we speculated that this disappearance was the result of heterotypic fusion with lysosomes, and to prove this possibility, we developed a double-labeling method in which late endosomes and lysosomes were labeled with 2 different colors (Figure 3B). In short, VE cells were incubated with dextran rhodamine for 20 min and subsequently pulse-labeled with Alexa Fluor 488-labeled transferrin for 5 min: when VE cells were observed, dextran rhodamine was already transported to lysosomes, whereas Alexa Fluor 488-labeled transferrin was still present in late endosomes, enabling the two vesicles to be observed separately.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is concerning that there are several points that are not fully explained regarding microscopic image analysis.

      (a) How were zones 1, 2, and 3 defined and how were the zones determined at each observation? Did the authors determine the zones subjectively based on the approximate size of the vesicles and the passage of time, or statistically by measuring endosomes from images of whole cells? The authors should describe this and also provide the approximate z-directional thickness of each of zones 1, 2, and 3.

      Thank you for pointing out this important issue, which is also raised by Reviewer #1. We initially analyzed the distribution and size of early endosomes, late endosomes, and lysosomes in VE cells by use of vesicle-specific markers (Figure 1B). Thereafter, at each observation, we determined the zones based on the characteristic size of the vesicles and time after labeling of endocytic vesicles. Especially, images of zone 2 and zone 3 were taken by focusing on the z-axis where late endosomes and lysosomes occupied the largest area in the optical slice images, respectively (lines 636-639). As for the z-directional thickness of each zone, we have added a description to the text (lines 123-124, 127-128, and 130-131) and also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1A).

      (b) Regarding "vesicle size" measured from confocal microscopy images: Does "vesicle size" mean surface area or maximum cross-sectional area? In any case, the authors should describe how and what area of the vesicles was measured from the images. The mathematical model used the surface area of the vesicle as a parameter. Better to be consistent.

      Thank you for the important questions. As the reviewer pointed out, the cross-sectional area of endosomes varies depending on the focal plane. To ensure uniformity of the focal plane across different images, we took the images by focusing on the z-axis where late endosomes (zone 2) or lysosomes (zone 3) occupied the largest area in the image. In the focal plane, we measured the size of all intact, unfragmented endosomes. We have now added this explanation to the Method section (lines 636-639).

      (c) The authors showed several time-lapse imaging data without a description of what "0 s" is the starting time of. For example, "0 s" in Figures 2A, B, 3A, and B, may have different meanings. Other data should be carefully examined and described.

      We apologize for the inadequate description. As the reviewer pointed out, each panel has a different meaning of "0s."Therefore, we have added explanation of the meaning of “0s” to the relevant figure legends (Figure 2A, B; Figure 3A, B; Figure 6A, F; Figure 7A, E, F; Figure 8A, Figure 6–figure supplement 1C, Figure 7–figure supplement 1B, Figure 7–figure supplement 3, Figure 7–figure supplement 4).

      (d) The meaning of "fusion time" in Figures 2D and 3F is unclear. Although it was speculated that the authors estimated it from the change in shape of the vesicles, how it was measured should be described.

      We apologize for the inadequate description. To indicate more clearly, we have added an explanation of the "fusion time" to the legend of Figures 2D and 3F (lines 898-899 and line 923, respectively).

      (2) The structure of the paragraph starting on line 158 is inappropriate. The authors state in line 159 that "this disappearance appeared to result from fusion of late endosomes with the underlying lysosomes". However, no hetero-fusion was observed here, only the disappearance of vesicles. The authors should mention that hetero-fusion occurred only after analysis of Figure 3CD.

      This reviewer thinks it is natural to state in this order: first, the disappearance of transferrin-positive vesicles was observed (Figure 3A). Such vesicles became dextran-positive as the transferrin signal began to disappear (Figures 3 B ,C, D). Thus, this is thought to indicate that hetero-fusion has occurred.

      We agree with the reviewer's comment and have rewritten the text following the reviewer's suggestion (lines 163-165, 176-180).

      (3) The mathematical model estimated that the vesicle size of 0.22-1.0 [𝜇𝑚2] is the size to switch the fusion mode. Since this is close to the size of endosomes in general cells, the authors may be able to discuss the generality of the fusion mode theory. It is up to the author to respond to this suggestion or not.

      Thank you for the comments. As our mathematical model depends on the assumption that the osmotic pressure is constant, late endosomes in VE cells, exhibiting a swollen morphology, may have higher osmotic pressure compared with endosomes in other cells and if so, the predicted vesicle size when the fusion mode switches may differ. Thus, we have decided not to mention the relationship between the vesicle size and fusion mode switching.

      (4) In Line 302 the authors mentioned "These results indicated that actin spots on the surface of late endosomes were dynamically regulated, especially in the apical area." However, the t-halves of 11.5s and 18.9s are only slightly different and of the same order, so it would be too much to say that dynamic regulation of actin occurs specifically in the apical region from a difference of this magnitude. The authors should weaken their arguments. It would be good to do a statistical test for significance between the FRAP data.

      Thank you for pointing out this important issue. To highlight the significant difference in the FRAP assay, we have added a new panel showing the statistical analysis of the halftime of recovery of each region of VE cells (Figure 6E). These data indicate that a significance difference in the halftime of recovery (t1/2) between actin spots in the apical and basal regions of zone 2. However, following the reviewer’s comment, we have weakened the description of the FRAP assay results (lines 310-312).

      (5) The discussion section is rather redundant. It could be shortened to be more concise instead of repeating the results.

      Thank you for the comments. We have shortened the Discussion section.

      Minor comments

      In Figure 2C, the statistical test method was not described in the legend.

      Thank you for the comments. We have added the data of the statistical test to the figure legend of Figure 2C (lines 895-896).

      Figure 3G does not look like a normal distribution, so the t-test is inappropriate.

      Thank you for the comments. We have changed the statistical analysis method and used the Mann-Whitney U test. For the same reason, we have changed the analysis method shown in Figure 2C.

      Is Figure 5D the image of zone 1 because it is close to the apical plane? If so, are the IgG-positive structures early endosomes rather than late endosomes? This seems inconsistent with the data in Figure 1.

      Thank you for the comments. The round vesicles observed in this panel are the late endosomes in zone 2. Because most of the internalized fluorescence marker has moved to the late endosomes in zone 2 at this time point (5 min after chasing), early endosomes are not labeled in this image. We have added a dotted line to the x-z axis image (the second top panel) to indicate the depth of the x-y axis image (top panel) in Figure 5D.

      Figure 6B appears to have little or no fluorescence recovery. Is this a typical example? It is also unclear if this is an apical or basal example.

      Thank you for the comments. This image is a typical example. We focused on the dot structures on the surface of late endosomes rather than the fluorescence intensity over the entire photobleached area. To prevent misunderstanding, we have added arrowheads to highlight the actin dot structures that we were analyzing. The FRAP data shown in Figure 6B were obtained at the apical region of zone 2. We have also added this information to the figure legend.

    1. eLife assessment

      This is an important behavioral, pharmacological intervention study of the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort (n=100). The design used drug dosing after learning, allowing the convincing interpretation of catecholamines being involved in the decision process, an effect dependent on baseline working memory capacity. The results also challenge the view that catecholamines operate by modulating behavioural invigoration alone.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision-making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      2) Analytic clarity: what's c^2?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.<br /> The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.<br /> A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.<br /> A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

    4. Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision-making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories regarding how catecholamines modulate the instantiation of Pavlovian biases of decision making. The reviewer rightly notices that we offer three neuroanatomical routes through which methylphenidate might have acted to elicit these effects. It is important to note, however, that the current study does not provide evidence that can disentangle these different hypotheses. Accordingly, these three neuroanatomical routes raise questions for future research.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss a (i)modulation by catecholamines a striatal ‘origin’ of Pavlovian biases, (ii) catecholaminergic modulation of Pavlovian-biases through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these frontal and striatal processes. Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts.  We believe that discussing these possible explanations of our data actually enriches our discussion and strengthen our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this clearer.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this line of reasoning clearer.

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This will be corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Indeed, as we discuss, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal drives of the effects. In our planned revisions, we will try to clarify this point, as per our reply to reviewer 1.

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, in our planned revisions, we will bring forward the statements that clarify the originality of the current experiment.

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We will also clarify that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision will follow up on the suggestion of the reviewer to include discussion about the effects of MPH on noradrenaline and behavioural flexibility (and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and will incorporate these in our planned revision of the manuscript.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement in BIC for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98\=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X2 = 9.5, p=0.002). We will report these findings in the revised manuscript.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [18F]-FDOPA PET imaging is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [18F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

    1. eLife assessment

      This important study offers a powerful empirical test of a highly influential hypothesis in population genetics. It incorporates a large number of animal genomes spanning a broad phylogenetic spectrum and treats them in a rigorous unified pipeline, providing the convincing negative result that effective population size scales neither with the content of transposable elements nor with overall genome size. These observations demonstrate that there is still no simple, global hypothesis that can explain the observed variation in transposable element content and genome size in animals.

    2. Reviewer #1 (Public Review):

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

    3. Reviewer #2 (Public Review):

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that<br /> (i) low-Ne species have more junk in their genomes and<br /> (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

    4. Reviewer #3 (Public Review):

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

    5. Author response:

      Reviewer #1:

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. 

      Several times in the text we use the expression “effect of dN/dS on…” which might indeed suggest a causal relationship. The phrasing refers to dN/dS being used in the regression as an independent variable that can be able to predict the variation of the dependent variables genome size and TE content. We are going to rephrase these expressions so that correlation is not mistaken with causation.

      In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

      The analysis focuses on metazoans for two reasons: one practical and one fundamental. The practical reason is computational. Our analysis included TE annotation, phylogenetic estimation and dN/dS estimation, which would have been very difficult with the hundreds, if not thousands, of plant genomes available. If we had included plants, it would have been natural to include fungi as well, to have a complete set of multicellular eukaryotic genomes, adding to the computational burden. The second fundamental reason is that plants show important genome size differences due to more frequent whole genome duplications (polyploidization) than in animals. It is therefore possible that the effect of selection on genome size is different in these two groups, which would have led us to treat them separately, decreasing the interest of this comparison. For these reasons we chose to focus on animals that still provide very wide ranges of genome size and population size well suited to test the impact of drift.

      Reviewer #2:

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that

      (i) low-Ne species have more junk in their genomes and

      (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      The MHH is arguably the most structured and influential theoretical framework proposed to date based on the null assumption (i), therefore setting the paper up with the MHH is somehow inevitable. Because of this, in the manuscript, we mostly discuss the peculiarities of TE biology that can drive the genome away from the MHH expectations, focusing on the mutational aspect. We however agree that the hazard posed by extra DNA is not limited to the gain of function via the mutation process, but can be linked to many other molecular processes as mentioned above. In a revised manuscript, we will make the concept of hazard more comprehensive and further stress that this applies not only to TEs but any nearly-neutral mutation affecting non-coding DNA.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      We agree that comparing dispersion of the points from the non-phylogenetically corrected correlation with the results of the phylogenetic contrasts intuitively emphasizes the importance of accounting for species relatedness. Just looking at the clade colors in Figure 2 makes immediately stand out that a simple regression hides phylogenetic structure. We will stress this in the discussion to make the point clear.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

      We respectfully disagree with the reviewer that there is currently no evidence for an effect of Ne on genome size evolution. While it is accurate that our large dataset allows us to reject the universality of Ne as the major contributor to genome size variation, this does not exclude the possibility of such an effect in certain contexts. Notably, there are several pieces of evidence that find support for Ne to determine genome size variation and to entail nearly-neutral TE dynamics under certain circumstances, e.g. of particularly strongly contrasted Ne and moderate divergence times (Lefébure et al. 2017; Mérel et al. 2024; Tollis and Boissinot 2013; Ruggiero et al. 2017). The strength of such works is to analyze the short-term dynamics of TEs in response to Ne within groups of species/populations, where the cost posed by extra DNA is likely to be similar. Indeed, the MHH predicts genome size to vary according to the combination of drift and mutation under the nearly-neutral theory of molecular evolution. Our work demonstrates that it is not true universally but does not exclude that it could exist locally. Moreover, defense mechanisms against TEs proliferation are often complex molecular machineries that might or might not evolve according to different constraints among clades. We have detailed these points in the discussion.

      Reviewer #3:

      Summary

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      Strengths

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      Weaknesses

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      There are indeed some discrepancies between our estimates of low complexity repeats and those from the literature due to the approach used. Hence, occasional underestimates or overestimates of repeat content are possible. As noted, the contribution of “Other” repeats to the overall repeat content is generally very low, meaning an underestimation bias. We thank the reviewer for providing this interesting review. We will emphasize it in the discussion of our revised manuscript.

      Not being able to correctly estimate the quantity of satellites might pose a problem for quantifying the total content of junk DNA. However, the overall repeat content mostly composed of TEs correlates very well with genome size, both in the overall dataset and within clades (with the notable exception of birds) so we are confident that this limitation is not the explanation of our negative results. Moreover, while satellite information might be missing, this is not problematic to test our a priori hypothesis since we focus our attention on TEs, whose proliferation mechanism is very different from that of tandem repeats.

      Finally, divergence from the consensus can be estimated only for TEs. Therefore, recently active elements do not include simple and tandem repeats: yet the results based on recent TE content are very similar to those based on the overall repeat content.

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

      Indeed, dnaPipeTE is not good at detecting old TE copies due to the read-based approach, biasing the outcome towards new elements. We agree on TE content being underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. However, the sum of old TEs and recent TEs is extremely well correlated to genome size (Pearson’s correlation: r = 0.87, p-value < 2.2e-16; PIC: slope = 0.22, adj-R2 = 0.42, p-value < 2.2e-16). Our main result therefore does not rely on an accurate estimation of old TEs. In contrast, we hypothesized that recent TEs could be interesting if selection acted on TEs insertion and dynamics rather than on non-coding DNA. Our results demonstrate that this is not the case: it should be noted that in spite of its limits for old TEs, dnaPipeTE is especially fitting for this specific analysis as it is not biased by very repetitive new TE families that are problematic to assemble. We will clearly emphasize the limitation of dnaPipeTE and discuss the consequences on our results in the discussion of the revised manuscript.

      Finally, in a preliminary analysis on the dipteran species, we show that the TE content estimated with dnaPipeTE is generally similar to that estimated from the assembly with earlGrey (Baril et al. 2024) across a good range of genome sizes going from drosophilid-like to mosquito-like (Pearson’s correlation: r = 0.88, p-value = 3.22e-10; see also the corrected Supplementary Figure S2 below). While for these species TEs are probably dominated by recent to moderately recent TEs, Aedes albopictus is an outlier for its genome size and the estimations with the two methods are largely consistent. However, the computation time required to estimate TE content using EarlGrey was significantly longer, with a ~300% increase in computation time, making it a very costly option (a similar issue is applicable to other assembly-based annotation pipelines). Given the rationale presented above, we decided to use dnaPipeTE instead of EarlGrey.

    1. eLife assessment

      This valuable study investigates the immune system's role in pre-eclampsia. The authors map the immune cell landscape of the human placenta and find an increase in macrophages and Th17 cells in patients with pre-eclampsia. Following mouse studies, the authors suggest that the IGF1-IGF1R pathway might play a role in how macrophages influence T cells, potentially driving the pathology of pre-eclampsia. There is solid evidence in this study that will be of interest to immunologists and developmental biologists, however, some of the conclusions require additional detail and/or more appropriate statistical tests.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors utilized human placental samples together with multiple mouse models to explore the mechanisms whereby inflammatory macrophages and T cells are linked to preeclampsia (PE). The authors first undertook CyTOF of placental samples from women with normal pregnancies, PE, gestational diabetes mellitus (GDM), and GDM with superimposed PE (GDM+PE). The authors report an increase of memory-like Th17 cells, memory-like CD8+ T cells, and pro-inflammatory macrophages in PE cases, but not GDM or GDM+PE, together with diminished γδT cells, anti-inflammatory macrophages, and granulocyte myeloid-derived suppressor cells (gMDSC). The authors then undertook several experiments using scRNA-seq, bulk RNA-seq, and flow cytometry in a RUPP model to first show that the transfer of pro-inflammatory macrophages from RUPP mice into normal pregnant mice with depleted macrophages resulted in increased embryo resorption and diminished fetal weight and size. Moreover, pro-inflammatory macrophages induced memory-like Th17 cells in mice. Similarly, injection of T-cells from RUPP mice resulted in increased embryo resorption and diminished fetal weight and size. Such mice that received RUPP-derived T cells displayed similarly worsened outcomes in their second pregnancy in the absence of any additional T cell transfer. The authors identified the IGF1-IGF1R ligand-receptor pair as a factor involved in the macrophage-mediated induction of memory-like Th17 cells, as confirmed by experiments using an IGF1R inhibitor. Finally, the authors transferred IGF1R inhibitor-treated T cells to a pregnant mouse that was administered LPS and depleted of T cells and observed improved outcomes compared to mice that received non-treated T cells. The authors conclude that their study identifies a PE-specific immune cell network regulated by pro-inflammatory macrophages and T cells.

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

    3. Reviewer #2 (Public review):

      Summary:

      Fei, Lu, Shi, et al. present a thorough evaluation of the immune cell landscape in pre-eclamptic human placentas by single-cell multi-omics methodologies compared to normal control placentas. Based on their findings of elevated frequencies of inflammatory macrophages and memory-like Th17 cells, they employ adoptive cell transfer mouse models to interrogate the coordination and function of these cell types in pre-eclampsia immunopathology. They demonstrate the putative role of the IGF1-IGF1R axis as the key pathway by which inflammatory macrophages in the placenta skew CD4+ T cells towards an inflammatory IL-17A-secreting phenotype that may drive tissue damage, vascular dysfunction, and elevated blood pressure in pre-eclampsia, leaving researchers with potential translational opportunities to pursue this pathway in this indication.

      They present a major advance to the field in their profiling of human placental immune cells from pre-eclampsia patients where most extant single-cell atlases focus on term versus preterm placenta, or largely examine trophoblast biology with a much rarer subset of immune cells. While the authors present vast amounts of data at both the protein and RNA transcript level, we, the reviewers, feel this manuscript is still in need of much more clarity in its main messaging, and more discretion in including only key data that supports this main message most effectively.

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      (1) Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      (2) The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      (3) Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      (4) There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      (5) The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      (6) The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      (7) There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      (8) In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      (9) Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      (10) There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      (11) There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      (12) Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

    4. Author response:

      Reviewer #1:

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Comment 1. Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Response1: We thank the reviewers' comments. We have done the green fluorescent protein (GFP) pregnant mice-related animal experiment, which was not shown in this manuscript. The wild-type (WT) female mice were mated with either transgenic male mice, genetically modified to express GFP, or with WT male mice, in order to generate either GFP-expressing pups (GFP-pups) or their genetically unmodified counterparts (WT-pups), respectively. Mice were euthanized on day 18.5 of gestation, and the uteri of the pregnant females and the placentas of the offspring were analyzed using flow cytometry. The majority of macrophages in the uterus and placenta are of maternal origin, which was defined by GFP negative. In contrast, fetal-derived macrophages, distinguished by their expression of GFP, represent a mere fraction of the total macrophage population, signifying their inconsequential or restricted presence amidst the broader cellular landscape. We will added the GPF pregnant mice-related data in Figure 4-figure supplement 1 to explain the different macrophage populations in the uterine and placental cells.

      Comment 2. Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Response 2: We thank the reviewers' comments. In our experiments, PLX3397 or Clodronate Liposomes was used to deplete the macrophages of pregnant mice, and then we injected RUPP-derived pro-inflammatory macrophages and anti-inflammatory macrophages back into PLX3397 or Clodronate Liposomes-treated pregnant mice. And We found that RUPP-derived F480+CD206- pro-inflammatory macrophages induced immune imbalance at the maternal-fetal interface and PE-like symptoms (Figure 4E-4H and Figure 4-figure supplement 1 A-C).

      Comment 3. Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Response 3: We thank the reviewers' comments. PLX3397, the inhibitor of CSF1R, which is needed for macrophage development (Nature. 2023, PMID: 36890231; Cell Mol Immunol. 2022, PMID: 36220994), we have stated that on line 189-191. However, PLX3397 is a small molecule compound that possesses the potential to cross the placental barrier and affect fetal macrophages. We will discuss the impact of this factor on the experiment in the discussion section.

      Comment 4. Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Response 4: We thank the reviewers' comments. Upon analysis, we observed a notable elevation in the frequency of total macrophages within the CD45+ cell population. Then we subsequently performed macrophage clustering and uncovered a marked increase in the frequency of Cluster 0, implying a potential correlation between Cluster 0 and donor-derived cells. RNA sequencing revealed that the F480+CD206- pro-inflammatory donor macrophages exhibited a Folr2+Ccl7+Ccl8+C1qa+C1qb+C1qc+ phenotype, which is consistent with the phenotype of cluster 0 in macrophages observed in single-cell RNA sequencing (Figure 4D and Figure 5E). Therefore, we believe that the donor cells is cluster 0 in macrophages.

      Comment 5. Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

      Response 5: We thank the reviewers' comments. Firstly, our other animal experiments in this manuscript used the Reduction in Uterine Perfusion Pressure (RUPP) mouse model to simulate the pathology of PE. However, the RUPP model requires ligation of the uterine arteries in pregnant mice on day 12.5 of gestation, which hinders T cells returning from the tail vein from reaching the maternal-fetal interface. In addition, this experiment aims to prove that CD4+ T cells are differentiated into memory-like Th17 cells through IGF-1R receptor signalling to affect pregnancy by clearing CD4+ T cells in vivo with an anti-CD4 antibody followed by injecting IGF-1R inhibitor-treated CD4+ T cells. And we proved that injection of RUPP-derived memory-like CD4+ T cells into pregnant rats induces PE-like symptoms (Figure 6). In summary, the application of the LPS model in Figure 8 does not affect the conclusions.

      Reviewer #2:

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      Comment 1. Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      Response 1: We thank the reviewers' comments. According to the reviewer's suggestion, we will proceed with making the necessary revisions. Firstly, We will modify the title of the article to be more specific. Then, we will introduce the RUPP mouse model when interpreted Figure 4. Thirdly, we plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow. Finally, We will diligently correct the grammatical and spelling errors in the article. As for the figure comparing pro- and anti-inflammatory macrophages, The Editor requested a more comprehensive description of the macrophage phenotype during the initial submission. As a result, we conducted the transcriptomes of both uterine-derived pro-inflammatory and anti-inflammatory macrophages and conducted a detailed analysis of macrophages in single-cell data.

      Comment 2. The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      Response 2: We thank the reviewers' comments. Placental villi rather than fetal membranes and decidua were used for CyToF in this study. This detail about how human placenta samples were processed will be added to the Materials and Methods section.

      Comment 3. Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      Response 3: We thank the reviewers' comments. The detail about the analysis of the CyTOF and scRNAseq data will be added in the Materials and Methods section.

      Comment 4. There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      Response 4: We thank the reviewers' comments. In Figure 1, CD45+ immune cells were clustered into 10 subpopulations, which included gdT cells. While Figure 2 displays the further clustering analysis of CD4+T, CD8+T, and gdT cells, with gdT cells being further subdivided into 22 clusters (Figure 2-figure supplement 1C). The number of biological replicates (samples) is consistent with Figure 1.

      Comment 5. The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      Response 5: We thank the reviewers' comments. The t-SNE distributions of the 15 clusters of CD4+ T cells, 18 clusters of CD8+ T cells, and 22 clusters of gdT cells are shown separately in Figure 2A, F, and I. The heatmaps displaying the expression levels of markers in these clusters of CD4+ T cells, CD8+ T cells, and gdT cells are presented in Figure 2-figure supplement 1A, B, and C, respectively. The t-SNE distributions of the 29 clusters of CD11b+ cells are shown in Figure 3A, and the heatmap displaying the expression levels of markers in these clusters is presented in Figure 3B. As for sc-RNA sequencing, the heatmap and UMAP distributions of the 15 clusters of macrophages are shown separately in Figure 5C and 5D. The UMAP distributions and heatmap of the 12 clusters of T/NK cells are shown in Figure 6A and 6B. The UMAP distributions and heatmap of the 9 clusters of T/NK cells are shown in Figure 7A and 7B.

      Comment 6. The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      Response 6: We thank the reviewers' careful checking. During our verification, we found that one sample in the NP group had pregnancy complications other than PE and GMD. The data in Figure 2H-2K was not updated in a timely manner. We will promptly update this data and reanalyze it.

      Comment 7. There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      Response 7: We thank the reviewers' comments. We have done the Treg-related animal experiment, which was not shown in this manuscript. We will add the Treg-related data in Figure 6. The injection of CD4+ T cells derived from RUPP mouse, characterized by a reduced frequency of Tregs, could induce PE-like symptoms in pregnant mice. Additionally, we will add a necessary discussion about Tregs.

      Comment 8. In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      Response 8: We thank the reviewers' comments. Although we do not have additional tissues or cells available to conduct FACS or CyTOF staining, including for CD66b, we plan to utilize CD15 and CD66b antibodies for immunofluorescence staining of placental tissue. Suppressing effector T cells is a signature feature of MDSCs, and T cells may also influence the functions of MDSCs, we will refer to this review and discuss it in the Discussion section of the article.

      Comment 9. Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      Response 9: We thank the reviewers' comments. We already have the additional data on the efficiency ofmacrophage depletion involving PLX3397 and clodronate liposomes, which were not present in this manuscript, and we'll add it to the manuscript. The clodronate piece is mentioned in the main text (Line 197-201), but only briefly described, because the results using clodronate we obtained were similar to those using PLX3397.

      Comment 10. There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      Response 10: We thank the reviewers' comments. We plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow.

      Comment 11. There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      Response 11: We thank the reviewers' comments. We will search for more literature and reference additional studies that have conducted similar analyses.

      Comment 12. Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

      We thank the reviewers' comments. As stated in the Statistical Analysis section (lines 601-604), the Kruskal-Wallis test was used to compare the results of experiments with multiple groups. Comparisons between the two groups in Figures 6-7 were conducted using Student's t-test. The aforementioned statistical methods will be included in the figure legends.

    1. eLife assessment

      This important work advances our understanding of how mechanical forces transmitted by blood flow contribute to cardiac development by identifying id2b as a flow-responsive factor that is required for valve development and calcium-mediated cardiac contractility and its downstream mechanism of action. However, the evidence supporting the conclusions is incomplete and would benefit from more rigorous approaches. With additional support of the main conclusions, the work will be of interest to those working on developmental biology, heart development, and congenital heart disease.

    2. Reviewer #1 (Public review):

      Summary:<br /> Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:<br /> Their methods, data and analyses broadly support their claims.

      Weaknesses:<br /> The molecular mechanism is somewhat preliminary.

    3. Reviewer #2 (Public review):

      Summary:<br /> Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:<br /> The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:<br /> The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricane treatment was a compensatory response to the down-regulation of id2b expression.

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

    4. Reviewer #3 (Public review):

      Summary:<br /> How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:<br /> Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:<br /> Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:

      Their methods, data and analyses broadly support their claims.

      Weaknesses:

      The molecular mechanism is somewhat preliminary.

      We thank the reviewer for the constructive comments. To further elucidate the molecular mechanisms underlying the observed phenotypes, we will conduct the following experiments: (1) perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos and in id2b-deleted embryos; (2) use RNAscope to detect the expression of id2b in developing embryos; (3) validate the interaction between Id2b and Tcf3b in vivo; and (4) conduct CUT&Tag experiments in developing zebrafish embryos to further validate the Tcf3b binding sites upstream of nrg1.

      Reviewer #2 (Public review):

      Summary:

      Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:

      The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:

      The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricaine treatment was a compensatory response to the down-regulation of id2b expression.

      As suggested by the reviewer, we will perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos, as well as in id2b-deleted embryos.

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      We have examined the expression levels of id2b in both klf2a and klf2b mutants. The whole mount in situ results clearly demonstrate a decrease in id2b signal in both mutants. As noted by the reviewer, klf2 is a transcriptional regulator, suggesting that the regulation of id2b may occur at the transcriptional level. However, dissecting the molecular mechanisms underling the crosstalk between klf2 and id2b is beyond the scope of the present study.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

      We agree with the reviewer and will perform additional experiments to validate the interaction between Id2b and Tcf3b in vivo. Due to the lack of antibodies targeting these proteins, we will overexpress Flag-id2b and HA-Tcf3b in zebrafish embryos and conduct a co-IP analysis.

      Reviewer #3 (Public review):

      Summary:

      How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:

      Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:

      Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

      We thank the reviewer for the constructive assessments. Our specific responses are as follows:

      (1) As all the morpholinos used in this study, including those targeting bmp ligands, the cilia gene ift88, and tcf3b, have been published and validated using genetic mutants in previous studies, we believe these loss-of-function analyses are sufficient to delineate their role in regulating id2b expression or function.

      (2) To assess the role of BMP versus blood flow in regulating endocardial id2b expression, we plan to perform live imaging in the id2b:GFP knockin line prior to the initiation of the heartbeat, with or without of BMP inhibitors.

      (3) We will revise the data presentation and use bar graphs with individual data points.

      (4) We plan to perform additional Co-IP experiment in zebrafish embryos to assess the interaction between Tcf3b and Id2b.

      (5) To further validate the tcf3b binding sites upstream of nrg1, we will conduct CUT&Tag experiments in developing zebrafish embryos.

    1. eLife assessment

      This valuable work analyzes how specialized cells in the auditory cells, known as the octopus cells, can detect coincidences in their inputs at the submillisecond time scale. While previous work indicated that these cells receive no inhibitory inputs, the present study unambiguously demonstrates that these cells receive inhibitory glycinergic inputs. The physiologic impact of these inputs needs to be studied further. It remains incomplete at present but could be improved by addressing caveats related to similar sizes of excitatory postsynaptic potentials and spikes in the octopus neurons.

    1. eLife assessment

      This study presents a valuable finding on the role of dopamine receptor D2R in dopaminergic neurons DAN-c1 and mushroom body neurons (Y201-GAL4 pattern) on aversive and appetitive conditioning. The evidence supporting the claims of the authors is solid and promotes the investigation using fly larvae, which have interesting advantages in the time required for obtaining experimental animals and the use of optogenetics. The work will be of interest to researchers studying neuronal control of behaviour and learning and memory in general.

    2. Reviewer #2 (Public Review):

      Summary:<br /> The study wanted to functionally identify individual DANs that mediate larval olfactory<br /> learning. Then search for DAN-specific driver strains that mark single dopaminergic neurons, which subsequently can be used to target genetic manipulations of those neurons. 56 GAL4 drivers identifying dopaminergic neurons were found (Table 1) and three of them drive the expression of GFP to a single dopaminergic neuron in the third-instar larval brain hemisphere. The DAN driver R76F02-AD;R55C10-DBD appears to drive the expression to a dopaminergic neuron innervating the lower peduncle (LP), which would be DAN-c1.<br /> Split-GFP reconstitution across synaptic partners (GRASP) technique was used to investigate the "direct" synaptic connections from DANs to the mushroom body. Potential synaptic contact between DAN-c1 and MB neurons (at the lower peduncle) were detected.<br /> Then single odor associative learning was performed and thermogenetic tools were used (Shi-ts1 and TrpA1). When trained at 34{degree sign}C, the complete inactivation of dopamine release from DAN-c1 with Shibirets1 impaired aversive learning (Figure 2h), while Shibirets1 did not affect learning when trained at room temperature (22{degree sign}C). When paired with a gustatory stimulus (QUI or SUC), activation of DAN-c1 during training impairs both aversive and appetitive learning (Figure 2k).<br /> They examined the expression pattern of D2R in fly brains and were found in dopaminergic neurons and the mushroom body (Figure 3). To inspect whether the pattern of GFP signals indeed reflected the expression of D2R, three D2R enhancer driver strains (R72C04, R72C08, and R72D03-GAL4) were crossed with the GFP-tagged D2R strain.<br /> D2R knockdown (UAS-RNAi) in dopaminergic neurons driven by TH-GAL4 impaired larval aversive learning. Using a microRNA strain (UAS-D2R-miR), a similar deficit was observed. Crossing the GFP-tagged D2R strain with a DAN-c1-mCherry strain demonstrated the expression of D2R in DAN-c1 (Figure 4a). Knockdown of D2R in DAN-c1 impaired aversive learning with the odorant pentyl acetate, while appetitive learning was unaffected (Figure 4e). Sensory and motor functions appear not affected by D2R suppression.<br /> To exclude possible chronic effects of D2R knockdown during development, optogenetics was applied at distinct stages of the learning protocol. ChR2 was expressed in DAN-c1, and blue light was applied at distinct stages of the learning protocol. Optogenetic activation of DAN-c1 during training impaired aversive learning, not appetitive learning (Figure 5b-d).<br /> Knockdown of D2Rs in MB neurons by D2R-miR impaired both appetitive and aversive learning (Figure 6a). Activation of MBNs during training impairs both larval aversive and appetitive learning.<br /> Finally, based on the data the authors propose a model where the effective learning requires a balanced level of activity between D1R and D2R (Figure 7).

      Strengths:<br /> The work is well written, clear, and concise. They use well documented strategies to examine GAL4 drivers with expression in a single DAN, behavioral performance in larvae with distinct genetic tools including those to do thermo and optogenetics in behaving flies. Altogether, the study was able to expand our understanding of the role of D2R in DAN-c1 and MB neurons in the larva brain.

      Weaknesses:<br /> Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.<br /> The study provides insight into the role of D2R in associative learning expanding our understanding and might be a reference similar to previous key findings (Qi and Lee, 2014, https://doi.org/10.3390/biology3040831).

    3. Reviewer #3 (Public Review):

      It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odour side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.<br /> To implicate a role of dopamine in DANs, previous work used e.g. RNAi against the dopamine-synthesizing TH enzyme (Rohwedder et al, cited).

      It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

    4. Author response:

      Reviewer #1 (Public Review):

      Weakness #1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As introduced in the results section, we screened 57 driver strains based on previous studies, either they were reported identifying a single (a pair of) dopaminergic neuron (DAN) in larvae or identifying only several DANs in the adult brain indicating the potential of identifying single dopaminergic neuron in larvae. In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae4, while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains5,6. We examined these strains and only some of them labeled single DANs in 3rd instar larval brains (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the strain in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is a strain we screened labeling only a single DAN in the 3rd instar larval brains. Others (Figure 1g, h, l, and m) we still describe them as strains labeling single DANs, but they also label one to several non-DANs. In Figure 1, we mainly showed the strains labeling single DANs. The labeling patterns of other screened driver strains were summarized in Table1. Since all brain images of the rest 47 strains are available, we will state in Fig S1 that additional brain images can be provided upon request.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows single DA neuron in each brain hemisphere. Additional GFP (+) signals were often observed, but not from cell bodies of DANs because they were not stained by a TH antibody. These additional GFP (+) signals were mainly neurites, including axonal terminals, but could be false positive signals or weakly stained non-neuronal cell bodies. This conclusion was based on analysis of a total of 22 larval brains. We will add this in the text or Fig S1 caption. Enlarged insert of GFP (+) signals will be added also to Figure S1c.  

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for the suggestion. MB320C mainly labels PPL1-y1pedc in the adult brain, with one or two other weakly labeled cells. It will be interesting to investigate the pattern of this driver in 3rd instar larval brains. If it only covers DAN-c1, we can try to knock-down D2R in this strain to check whether it can repeat our results. This will be an interesting fly strain to test, but we believe that it will not be necessary for our current manuscript as DAN-c1 driver is very specific (for details, refer to our response to Reviewer#3). However, this line will be very useful for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. (2020) shows strongly labeled four neurons on each brain hemisphere9, indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree that the words ‘necessary’ and ‘sufficient’ are too exclusive for other neurons. As mentioned in the Discussion part, we do think other dopaminergic neurons may also be involved in larval aversive learning. We are going to re-phrase these words by replacing them with more logically appropriate words, such as ‘important’, ‘essential’, or ‘mediating’.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is a great point! Yes, we cannot rule out the possibility that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine. The experimental results with TRPA1 could be caused by depletion of dopamine, or DA inactivation due to prolonged depolarization or adaptation. However, we still think that our hypothesis on the over-excitation of DAN-c1 is more consistent with our experimental results and other published data. Our justification is as follows:

      (1) Associative learning occurs only when the CS and US are paired. In wild type larvae, a specific odor (conditioned stimulus, CS, such as pentyl acetate) depolarizes a subset of Kenyon cells in the mushroom body, while gustatory unconditioned stimulus (US, quinine) induces dopamine release from DAN-c1 to the lower peduncle (LP) compartment in the mushroom body (Figure 7a). Only when the CS and US are paired, calcium influx caused by CS and Gas activated by D1R binding to dopamine will turn on a mushroom body specific version of adenylyl cyclase, rutabaga, which is the co-incidence detector in associative learning (Figure 7d).

      (2) Rutabaga transforms ATP into cAMP, activating PKA signaling pathway and modifying the synaptic strength from mushroom body neurons (MBN, also called Kenyan cells) to the mushroom body output neurons (MBON, Figure 7d). This change in synaptic strength will lead to learned responses when the same odor appears again.

      (3) In our work, we found D2R is expressed in DAN-c1, and knockdown D2R in DAN-c1 impairs larval aversive learning. As D2R reduces cAMP level and neuronal excitability3, we hypothesized that knockdown of D2R in DAN-c1 would remove the inhibition of D2R auto-receptor, and lead to more dopamine (DA) release when US (quinine) was delivered compared to the wild type larvae. The elevated DA release along with calcium influx caused by CS increases the cAMP level in MBN, which leads to the learning deficit (over-excitation, Figure 7b). Mutant larvae with excessive cAMP, dunce, showed aversive learning deficiency, supporting our hypothesis2.

      (4) Our results of TRPA1 can be explained by this over-excitation hypothesis. When DAN-c1 is activated (34C) in distilled water group, the artificial activation mimicked the gustatory activation of quinine. The larvae showed the aversive learning responses towards the odor (Figure 2k DW group). When DAN-c1 is activated (34C) in sucrose group, the artificial activation mimicked the gustatory activation of quinine, so the larvae showed a learning response combining both appetitive and aversive learning (Figure 2k SUC group).

      (5) When DAN-c1 is activated (34C) in quinine group, the artificial activation and the gustatory activation of quinine lead to elevated DA release from DAN-c1. During training, this elevated DA caused over-excitation of MBN, leading to failure of aversive learning (Figure 2k QUI group), which had a similar phenotype compared to larvae with D2R knockdown in DAN-c1.

      (6) Similarly, optogenetic activation of DAN-c1 during aversive training, leads to elevated DA release from DAN-c1 (both gustatory activation of quinine and artificial activation). This would also cause over-excitation of MBN, and lead to failure of aversive learning. Artificial activation in other stages (resting or testing) won’t cause elevated DA release during training, so the aversive learning was not affected (Figure 5b).

      (7) However, when optogenetic activation was applied during training, we did not observe aversive learning responses in the distilled water group, or a reduction in the sucrose group (Figure 5c, Figure 5d). Our explanation is that the optogenetic stimulus we applied is too strong, DAN-c1 has already released elevated DA in both groups. So, the aversive learning in these groups has already been impaired, they just showed the corresponding learning responses to distilled water or sucrose.

      (8) We also applied this over-excitation to activate MBNs. As MBN takes over both appetitive and aversive learnings, over-excitation of MBNs led to deficit in both types of learning, which follows our hypothesis (Figure 6).

      In summary, we hypothesized that DAN-c1 restricts DA release via activation of D2R, which is important for larval aversive learning. D2R knockdown or artificial activation of DAN-c1 during training would induce elevated DA release, leading to over-excitation of MBNs and failure of aversive learning.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3 c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al., (2023) used the antibody from Draper et al.10. We have tried the same antibody, but we were not able to observe clear signals after staining. Maybe it is not specific for the neurons in the fly larval brain, or our staining protocol did not fit with this antibody.

      Unfortunately, we were not able to find Lam (1999) paper.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We also think that other DANs may be involved in aversive learning. We re-analyzed the learning assay data, seemingly D2R knockdown in DAN-g1 with miR partially affected aversive learning when trained with pentyl acetate (Figure S4e). We are going to build single statistic panels for DAN-g1 and DAN-d1. However, neither larvae with D2R knockdown in DAN-g1 using miR trained with propionic acid (Figure S5a), nor larvae with D2R knockdown in DAN-g1 using RNAi trained with pentyl acetate (Figure S5b) showing aversive learning deficit. We will add paragraphs about this in both Results and Discussion sections.

      Reviewer #2 (Public Review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Public Reviewer #1.

      Reviewer #3 (Public Review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and the suggestions. For the strain R76F02AD; R55C10DBD, we examined 22 third instar larval brains expressing GFP or Syt-GFP and Den-mCherry, all of them clearly labeled DAN-c1. Half of them only labeled DAN-c1, the rest have 1 to 5 weak labeled soma without neurites. Barely 1 or 2 strong labeled cells appear. These non-DAN-c1 neurons are seldom dopaminergic neurons. In VNC, 8 out of 12 do not label cells, 3 have 2-4 strong labeled cells. These data supported that R76F02AD;R55C10DBD exclusively labeled DAN-c1 in 3rd instar larval brains.

      For the question about the pattern of R76F02AD; R55C10DBD and the expression pattern of D2R in larval body, it is an interesting question. However, our main focus was on the central nervous system and the learning behaviors in fruit fly larvae, we may investigate this question in the future.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted this single odor larval learning paradigm from Honjo’s papers1,2. In these works, Honjo et al. first designed and performed this single odor paradigm for larval olfactory associative learning. To address the reviewer’s question about the potential non-associative effects of the 30-min quinine or sucrose exposure, we would like to defend it primarily based on results from Honjo et al. (2005 and 2009). They applied the odorant to the larvae after training, only the ones had paired training with both odor and unconditioned stimulus (quinine or sucrose) showed learning responses. Larvae exposed 30 min in only odorant or unconditioned stimulus did not show different response to the odor compared to the naïve group1,2. To validate this paradigm induces associative learning responses, they also tested the paradigm from three aspects:

      (1) The odor responses are associative. Honjo et al. showed only when the odorant paired with unconditioned stimulus would induce corresponding attraction or repulsion of larvae to the odor. Neither odorant alone, unconditioned stimulus alone, nor temporal dissociation of odorant and unconditioned stimulus would induce learning responses.

      (2) The odor responses are odor specific. When applied a second odorant that was not used for training, larvae only showed learning responses to the unconditioned stimulus paired odor. This result ruled out the explanation of a general olfactory suppression and indicates larvae can discriminate and specifically alter the responses to the odor paired with unconditioned stimulus. Although the two-odor reciprocal training is not used, these results can show the association of unconditioned stimulus and the corresponding paired odor.

      (3) Well known learning deficit mutants did not show learned responses in this learning paradigm. Honjo et al. tested mutants (e.g., rut and dnc) showing learning deficits in the adult stage with two odor reciprocal learning paradigm. These mutant larvae also failed to show learning responses tested with the single odor larval learning paradigm.

      (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid), as well as two D2R knockdown strains (UAS-miR and UAS-RNAi for D2R). We obtained similar results for larvae with D2R knockdown in DAN-c1. In addition, our naïve olfactory, naïve gustatory, and locomotion data ruled out the possibilities that the responses were caused by impaired sensory or motor functions. Comparison with the control group (odor paired with distilled water) ruled out the potential effects if habituation existed. All these results supported this single odor learning paradigm is reliable to assess the learning abilities of Drosophila larvae. And the failure of reduction in R.I when larvae with D2R knockdown in DAN-c1 were trained in quinine paired with the odorant is caused by deficit in aversive learning ability. We will add a paragraph to address this in the Discussion part.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      It is a good question. We gave 5 min during the testing stage to allow the larvae to wander in the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaching to -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibirets1 gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling1. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibirets1 insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibirets1 has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila_11, and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for _Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning12,13. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR.

      For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e). We will re-organize the figures to make them easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We will read through this paper and try to add it as possible explanations for the learning mechanisms. As we introduced in the Discussion section, the learning mechanism is quite complex, mixing both non-linear neuronal circuits and multiple signaling pathways, in responding to complex environmental learning contexts. We will try to develop a better hypothesis with the best compatibility to accommodate our results with published data.

      Reference

      (1) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (2) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (3) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (4) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (5) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (6) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (7) Hartenstein, V., Cruz, L., Lovick, J. K. & Guo, M. Developmental analysis of the dopamine-containing neurons of the Drosophila brain. J Comp Neurol 525, 363-379 (2017). https://doi.org/10.1002/cne.24069

      (8) Aso, Y. et al. The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014). https://doi.org/10.7554/eLife.04577

      (9) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (10) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (11) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (12) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (13) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

    1. eLife assessment

      This important study investigates the relationship between transcription factor condensate formation, transcription, and 3D gene clustering of the MET regulon in the model organism S. cerevisiae. The authors provide solid experimental evidence that transcription factor condensates enhance transcription of MET-regulated genes, but evidence for the role of Met4 IDRs and Met4-containing condensates in mediating target gene clustering in the MET regulon is not as strong. This paper will be of interest to molecular biologists working on chromatin and transcription, although its impact would be strengthened by further investigation.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, James Lee, Lu Bai, and colleagues use a multifaceted approach to investigate the relationship between transcription factor condensate formation, transcription, and 3D gene clustering of the MET regulon in the model organism S. cerevisiae. This study represents a second clear example of inducible transcriptional condensates in budding yeast, as most evidence for transcriptional condensates arises from studies of mammalian systems. In addition, this study links the genomic location of transcriptional condensates to the potency of transcription of a reporter gene regulated by the master transcription factor contained in the condensate. The strength of evidence supporting these two conclusions is strong. Less strong is evidence supporting the claim that Met4-containing condensates mediate the clustering of genes in the MET regulon.

      Strengths:

      The manuscript is for the most part clearly written, with the overriding model and specific hypothesis being tested clearly explained. Figure legends are particularly well written. An additional strength of the manuscript is that most of the main conclusions are supported by the data. This includes the propensity of Met4 and Met32 to form puncta-like structures under inducing conditions, formation of Met32-containing LLPS-like droplets in vitro (within which Met4 can colocalize), colocalization of Met4-GFP with Met4-target genes under inducing conditions, enhanced transcription of a Met3pr-GFP reporter when targeted within 1.5 - 5 kb of select Met4 target genes, and most impressively, evidence that several MET genes appear to reposition under transcriptionally inducing conditions. The latter is based on a recently reported novel in vivo methylation assay, MTAC, developed by the Bai lab.

      Comments on Revision:

      The authors have adequately addressed most of my concerns. However, the most salient issue - that the work fails to show convincing evidence that nuclear condensates per se drive MET gene clustering - remains. Since the genetic approach led to ambiguous results, another way to link MET gene clustering to TF condensate formation is to perturb the condensates with 1,6-hexanediol. If 1,6-HD treatment dissolves condensates and concomitant MET clustering (while the impact of 2,5-HD is much less) then the conclusion is more solid. Absent such evidence, the authors are left with a correlation, and they should consider toning down the title and abstract (and conclusions stated elsewhere). For example, a more accurate title might be "Transcription Factor Condensates Correlate with MET Gene Clustering and Mediate Enhancement in Gene Expression".

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript combines live yeast cell imaging and other genomic approaches to study how transcription factor (TF) condensates might help organize and enhance the transcription of the target genes in the methionine starvation response pathway. The authors show that the TFs in this response can form phase separated condensates through their intrinsically disordered regions (IDRs), and mediate the spatial clustering of the related endogenous genes as well as reporter inserted near the endogenous target loci.

      Strengths:

      This work uses rigorous experimental approaches, including imaging of endogenously labeled TFs, determining expression and clustering of endogenous target genes and reporter integrated near the endogenous target loci. The importance of TFs is shown by rapid degradation. Single cell data are combined with genomic sequencing-based assays. Control loci engineered in the same way are usually included. Some of these controls are very helpful in showing the pathway-specific effect of the TF condensates in enhancing transcription.

      Weaknesses:

      The main weakness of this work is that the role of IDR and phase separation in mediating the target gene clustering is unclear. TF IDRs may have many functions including mediating phase separation and binding to other transcriptional molecules (not limited to proteins). The authors did not get clear results on gene clustering upon IDR deletion. IDR deletion may affect binding of other molecules (not the general transcription machinery) that are specifically important for target gene transcription. If the self-association of the IDR is the main driving force of the clustering and target gene transcription enhancement, replacing this IDR with totally unrelated IDRs that have been shown to mediate phase separation in non-transcription systems would preserve the gene clustering and transcription enhancement effects. However, this type of replacement experiment is challenging for endogenous locus.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors probe the connections between clustering of the Met4/32 transcription factors (TFs), clustering of their regulatory targets, and transcriptional regulation. While there is an increasing number of studies on TF clustering in vitro and in vivo, there is an important need to probe whether clustering plays a functional role in gene expression. Another important question is whether TF clustering leads to the clustering of relevant gene targets in vivo. Here the authors provide several lines of evidence to make a compelling case that Met4/32 and their target genes cluster and that this leads to an increase in transcription of these genes in the induced state. First, they found that, in the induced state, Met4/32 forms co-localized puncta in vivo. This is supported by in vitro studies showing that these TFs can form condensates in vitro with Med32 being the driver of these condensates. They found that two target genes, MET6 and MET13 have a higher probability of being co-localized with Met4 puncta compared with non-target loci. Using a targeted DNA methylation assay, they found that MET13 and MET6 show Met4-dependent long-range interactions with other Met4-regulated loci, consistent with the clustering of at least some target genes under induced conditions. Finally, by inserting a Met4-regulated reporter gene at variable distances from MET6, they provide evidence that insertion near this gene is a modest hotspot for activity.

      Comments on revised version:

      In this revised manuscript, the authors have achieved a good balance between revising the text/figures, and explaining why some lines of experiments proposed by reviewers are either not practical or beyond the scope of this work. I think that the revised study is an important contribution to understanding the function of transcription factors, TF condensates, and gene localization in a stress-responsive system.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, James Lee, Lu Bai, and colleagues use a multifaceted approach to investigate the relationship between transcription factor condensate formation, transcription, and 3D gene clustering of the MET regulon in the model organism S. cerevisiae. This study represents a second clear example of inducible transcriptional condensates in budding yeast, as most evidence for transcriptional condensates arises from studies of mammalian systems. In addition, this study links the genomic location of transcriptional condensates to the potency of transcription of a reporter gene regulated by the master transcription factor contained in the condensate. The strength of evidence supporting these two conclusions is strong. Less strong is evidence supporting the claim that Met4-containing condensates mediate the clustering of genes in the MET regulon.

      Strengths:

      The manuscript is for the most part clearly written, with the overriding model and specific hypothesis being tested clearly explained. Figure legends are particularly well written. An additional strength of the manuscript is that most of the main conclusions are supported by the data. This includes the propensity of Met4 and Met32 to form puncta-like structures under inducing conditions, formation of Met32-containing LLPS-like droplets in vitro (within which Met4 can colocalize), colocalization of Met4-GFP with Met4-target genes under inducing conditions, enhanced transcription of a Met3pr-GFP reporter when targeted within 1.5 - 5 kb of select Met4 target genes, and most impressively, evidence that several MET genes appear to reposition under transcriptionally inducing conditions. The latter is based on a recently reported novel in vivo methylation assay, MTAC, developed by the Bai lab.

      Weaknesses:

      My principal concern is that the authors fail to show convincing evidence for a key conclusion, highlighted in the title, that nuclear condensates per se drive MET gene clustering. Figure 4E demonstrates that Met4 molecules, not condensates per se, are necessary for fostering distant cis and trans interactions between MET6 and three other Met4 targets under -met inducing conditions. In addition, the paper would be strengthened by discussing a recent study conducted in yeast that comes to many of the same conclusions reported here, including the role of inducible TF condensates in driving 3D genome reorganization (Chowdhary et al, Mol. Cell 2022).

      Following the reviewer’s advice, we carried out MTAC with the VP near MET6 in WT Met4 and ΔIDR2.3 strains (results shown below). The conclusions are somewhat ambiguous. For long-distance interactions with MUP1, YKG9, STR3, and MET13, we indeed observe decreased MTAC signals close to background levels in the ΔIDR2.3 strain, which aligns with the model suggesting that Met4 condensation promotes clustering among Met4 targeted genes. However, we also noticed significant decreases in the local MTAC signals (HIS3 and MET6). It is possible that the changes in Met4 condensates alter the chromosomal folding near MET6, thereby affecting the local MTAC signals. Alternatively, LacI-M.CviPI (the methyltransferase) could be induced to a lesser extent in the ΔIDR2.3 strain, leading to a genome-wide decrease in MTAC signals. Due to this ambiguity, we decided not to include the following plot in the main figure.

      Author response image 1.

      We discussed Hsf1 and added the suggested reference on page 13.

      Other concerns:

      (1) A central premise of the study is that the inducible formation of condensates underpins the induction of MET gene transcription and MET gene clustering. Yet, Figure 1 suggests (and the authors acknowledge) that puncta-like Met4-containing structures pre-exist in the nuclei of non-induced cells. Thus, the transcription and gene reorganization observed is due to a relatively modest increase in condensate-like structures. Are we dealing with two different types of Met4 condensates? (For example, different combinations of Met4 with its partners; Mediator- or Pol II-lacking vs. Mediator- or Pol II-containing; etc.?) At the very least, a comment to this effect is necessary.

      Although Met4 can form smaller puncta in the +met condition (Figure 1A), it cannot be recruited to its target genes due to the absence of its sequence-specific binding partners, Met31 and Met32 (these two factors are actively degraded in the +met condition). Consistently, in the +met condition, Met4 shows extremely low genome-wide ChIP signals (Figure 3C). Therefore, these Met4 puncta in +met do not have organize the 3D genome or have gene regulatory functions. This discussion is added on page 12.

      (2) Using an in vitro assay, the authors demonstrate that Met4 colocalizes with Met32 LLPS droplets (Figure 2F). Is the same true in vivo - that is, is Met32 required for Met4 condensation? This could be readily tested using auxin-induced degradation of Met32. Along similar lines, the claim that Met32 is required for MET gene clustering (line 250) requires auxin-induced degradation of this protein.

      As the reviewer pointed out above, cells in the +met condition also show small Met4 puncta. In this condition, Met32 is essentially undetectable (Met31 level is even lower and remains undetectable even in the -met conditions). Therefore, Met4 does not strictly require the presence of Met32 in vivo (may require other factors or modifications). Met4 does not have DNA-binding activity, and therefore it cannot target and organize chromosomes on its own. Although we did not do the Met32 degradation experiment, we measured the 3D genome conformation in +met and showed that there are no detectable interactions among Met4 target genes.

      (3) The authors use a single time point during -met induction (2 h) to evaluate TF clustering, transcription (mRNA abundance), and 3D restructuring. It would be informative to perform a kinetic analysis since such an analysis could reveal whether TF clustering precedes transcriptional induction or MET gene repositioning. Do the latter two phenomena occur concurrently or does one precede the other?

      We appreciate the reviewer’s insightful question. It is indeed intriguing to consider whether TF clustering precedes transcriptional induction and MET gene clustering. However, as mentioned on page 12 of our manuscript, this experiment poses significant challenges. The low intensities of the Met4 and Met32 signals necessitate high excitation for imaging, which also makes them prone to photo-bleaching. Consequently, we have been unable to measure the dynamics of Met4 and Met32 puncta in vivo, let alone co-image them with DNA/RNA. Undertaking this experiment will require considerable effort, which we plan to pursue in the future.

      (4) Based on the MTAC assay, MET13 does not appear to engage in trans interactions with other Met4 targets, whereas MET6 does (Figures 4C and 4E). Does this difference stem from the greater occupancy of Met4 at MET6 vs. MET13, greater association of another Met co-factor with the chromatin of MET6 vs. MET13, or something else?

      We were also surprised by this result, given that MET13 emerged as one of the strongest transcriptional hotspots in our previous screen. It also exhibits one of the highest Met4 ChIP signals and is closely associated with the nuclear pore complex. Our earlier findings indicate that DNA dynamics near the VP significantly influence the MTAC signal; specifically, a VP with constrained motion is less effective at methylating interacting sites (Li et al., 2024). Therefore, it is plausible that MET13 is associated with a large Met4 condensate, which constrains the motion of nearby chromatin and diminishes MTAC efficiency.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript combines live yeast cell imaging and other genomic approaches to study how transcription factor (TF) condensates might help organize and enhance the transcription of the target genes in the methionine starvation response pathway. The authors show that the TFs in this response can form phase-separated condensates through their intrinsically disordered regions (IDRs), and mediate the spatial clustering of the related endogenous genes as well as reporter inserted near the endogenous target loci.

      Strengths:

      This work uses rigorous experimental approaches, such as imaging of endogenously labeled TFs, determining expression and clustering of endogenous target genes, and reporter integration near the endogenous target loci. The importance of TFs is shown by rapid degradation. Single-cell data are combined with genomic sequencing-based assays. Control loci engineered in the same way are usually included. Some of these controls are very helpful in showing the pathway-specific effect of the TF condensates in enhancing transcription.

      Weaknesses:

      Perhaps the biggest weakness of this work is that the role of IDR and phase separation in mediating the target gene clustering is unclear. This is an important question. TF IDRs may have many functions including mediating phase separation and binding to other transcriptional molecules (not limited to proteins and may even include RNAs). The effect of IDR deletion on reduced Fano number in cells could come from reduced binding with other molecules. This should be tested on phase separation of the purified protein after IDR deletion. Also, the authors have not shown IDR deletion affects the clustering of the target genes, so IDR deletion may affect the binding of other molecules (not the general transcription machinery) that are specifically important for target gene transcription. If the self-association of the IDR is the main driving force of the clustering and target gene transcription enhancement, can one replace this IDR with totally unrelated IDRs that have been shown to mediate phase separation in non-transcription systems and still see the gene clustering and transcription enhancement effects? This work has all the setup to test this hypothesis.

      We thank the reviewer for raising this point, and we tried more in vitro and in vivo experiments with Met4 IDR deletions. See the answer to Reviewer 1 for the in vivo 3D mapping experiment.

      We purified Met4-ΔIDR2 with an MBP tag, but its low yield made labeling and conducting thorough experiments challenging. At concentrations above ~10 μM, the protein tends to aggregate, while at lower concentrations, it remains diffusive in solution and does not form condensates. When we mixed purified Met4-ΔIDR2 with Met32, we observed reduced partitioning inside Met32 condensates compared to the full-length Met4. As the reviewer noted, this diminished interaction may contribute to the decreased puncta formation observed in vivo. This result is added to the manuscript on page 11 and supplementary figure 5.

      The Met4 protein was tagged with MBP but Met 32 was not. MBP tag is well known to enhance protein solubility and prevent phase separation. This made the comparison of their in vitro phase behavior very different and led the authors to think that maybe Met32 is the scaffold in the co-condensates. If MBP was necessary to increase yield and solubility during expression and purification, it should be cleaved (a protease cleavage site should be engineered) to allow phase separation in vitro.

      Following the reviewer’s advice, we purified Met4-TEV-MBP so that the MBP can be cleaved off. Unfortunately, concentrated Met4-TEV-MBP needs to be stored at high salt (400mM) to be soluble. When exchanged into a suitable buffer for TEV cleavage (≤200 mM NaCl), nearly all soluble protein aggregates. Attempts to digest the protein in storage buffer results in observable aggregation before significant cleavage (see below).  

      Author response image 2.

      Are ATG36 and LDS2 also supposed to be induced by -met? This should be explained clearly. The signals are high at -met.

      Genomic loci ATG36 and LDS2 were chosen as controls because they are not bound by Met TFs (ChIP-seq tracks) and their expressions are not induced by -met (RNA-seq data). This information is added to the manuscript on page 9. When MET3pr-GFP reporter is inserted into these loci, GFP is induced by -met (because it is driven by the MET3 promoter), but the induction level is less than the same reporter inserted into the transcriptional hotspot like MET13 and MET6 (Figure 6E, also see Du et al., Plos Genetics, 2017).

      ChIP-seq data:

      Author response image 3.

      RNA-seq counts:

      Author response table 1.

      Figure 6B, the Met4-GFP seems to form condensates at all three loci without a very obvious difference, though 6C shows a difference. 6C is from only one picture each. The authors should probably quantify the signals from a large number of randomly selected pictures (cells) and do statistics.

      If we understand this comment correctly, the reviewer is referring to the fact that all three loci in Figure 6B appear to show a peak in GFP intensity. This pattern emerges because these images are averaged among many cells (number of cells analyzed in 6B has been added to the Figure legends). GFP intensities near the center will always be higher because peripheral pixels are more likely to fall outside the nuclei boundaries, where Met4 signals are absent (same as in Figure 3F). Importantly, MET6 locus shows higher intensity near the center in comparison to PUT1 and ATG36, indicating its co-localization with Met4 condensates.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors probe the connections between clustering of the Met4/32 transcription factors (TFs), clustering of their regulatory targets, and transcriptional regulation. While there is an increasing number of studies on TF clustering in vitro and in vivo, there is an important need to probe whether clustering plays a functional role in gene expression. Another important question is whether TF clustering leads to the clustering of relevant gene targets in vivo. Here the authors provide several lines of evidence to make a compelling case that Met4/32 and their target genes cluster and that this leads to an increase in transcription of these genes in the induced state. First, they found that, in the induced state, Met4/32 forms co-localized puncta in vivo. This is supported by in vitro studies showing that these TFs can form condensates in vitro with Med32 being the driver of these condensates. They found that two target genes, MET6 and MET13 have a higher probability of being co-localized with Met4 puncta compared with non-target loci. Using a targeted DNA methylation assay, they found that MET13 and MET6 show Met4-dependent long-range interactions with other Met4-regulated loci, consistent with the clustering of at least some target genes under induced conditions. Finally, by inserting a Met4-regulated reporter gene at variable distances from MET6, they provide evidence that insertion near this gene is a modest hotspot for activity.

      Weaknesses:

      (1) Please provide more information on the assay for puncta formation (Figure 1). It's unclear to me from the description provided how this assay was able to quantitate the number of puncta in cells.

      Due to the variation in puncta size and intensity (as illustrated in Figure 1A), counting the number of puncta would be highly subjective with arbitrary cutoffs. Therefore, we chose to calculate the CV and Fano values instead, which are unbiased measures. Proteins that form puncta will exhibit greater pixel-to-pixel variations in GFP intensity, resulting in higher CV and Fano values.

      (2) How does the number of puncta in cells correspond with the number of Met-regulated genes? What are the implications of this calculation?

      As previously mentioned, defining the exact number of Met4 puncta is challenging. The number of puncta does not necessarily have one-to-one correspondence to the number of Met4 target genes. Some puncta may not be associated with chromosomes, while others may interact with multiple genes.

      (3) A control for chromosomal insertion of the Met-regulated reporter was a GAL4 promoter derivative reporter. However, this control promoter seems 5-10 fold more active than the Met-regulated promoter (Figure 6). It's possible that the high activity from the control promoter overcomes some other limiting step such that chromosomal location isn't important. It would be ideal if the authors used a promoter with comparable activity to the Met-reporter as a control.

      We agree with the reviewer that it will be better to use another promoter with comparable activity. Indeed, this was our rationale for selecting the attenuated GAL1 promoter over the WT version; however, it still exhibited substantially higher activity than the MET3pr. Unfortunately, we do not have a promoter from a different pathway that is calibrated to match the activity level of MET3pr. Nonetheless, MET17pr has much higher activity (~3 fold) than MET3pr, and we observed similar degree of stimulus from the hotspot in comparison to the control locus for both promoters (1.5-2-fold increase in GFP expression) (Figure 6E & F). This suggests that the observed effects are more likely to depend on the activation pathway and TF identity rather than the promoter strength.

      (4) It seems like transcription from a very large number of genes is altered in the Met4 IDR mutant (Figure 7F). Why is this and could this variability affect the conclusions from this experiment?

      We agree with the reviewer that ΔIDR 2.3 truncation affects the expression of 2711 (P-adj <0.05) genes (1339 up,1372 down). We suspect that this is due to the decreased expression of Met4 target genes, leading to altered levels of methionine and other sulfur-containing metabolites. Such changes would have a global impact on gene expression. Importantly, despite the similar number of genes that show up vs down regulation in the ΔIDR 2.3 strain, almost all Met4 targets showed decreased expression (Fig 7F). This supports the model where Met4 condensates lead to increased expression in its target genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) The introduction contains multiple miscitations. Rather than gene clustering, most of the studies and reviews cited (e.g., lines 35-39) report interactions between genomic loci (E-E, E-P, and P-P). There are other claims not supported by the papers cited. Moreover, the authors lump together original research papers and reviews within a given group without distinguishing which is which.

      We thank the reviewer for pointing this out. We reorganized the references in the introduction.

      (2) One option to address the concern regarding the lack of evidence that nuclear condensates per se drive MET gene clustering is to test the impact of Met4 ΔIDR2.3 on MTAC signals.

      We carried out the suggested experiment. See answer above (Reviewer #1, Question #1).

      (3) Authors claim that there are significant differences between values depicted in Figures 1B and 3G. Statistical tests are necessary to show this.

      Significance values were calculated in comparison to free GFP using two-tailed Student’s t-test in 1B,1C, and 3G. The corresponding figure legends are updated.

      (4) How are the data in Figures 3F, G, and 6B, C generated? This is unclear from the information provided in the Figure legends and Materials and Methods.

      For each cell, we projected the highest mCherry and GFP intensity at each pixel for all z positions onto a 2D plane (MIP). The MIP images were aligned with the mCherry dot at the center and averaged among all cells. To calculate the GFP intensities like in Figure 3G and 6C, a single line was drawn across the center and the GFP profile was analyzed by ImageJ. We now describe this in the corresponding figure legends, and the Materials and Methods are also updated.

      (5) Typos/ unclear writing: lines 24, 58, 79, 82, 84, 96, 117, 121, 131, 142, 147, 161 (terminus, not "terminal"), 250, 325, 349, 761 (was, not "are"). For several of these: "condense" is not "condensate"; for many others: inappropriate use of "the". Supplementary Figure 1 legend: not "a single nuclei" instead "a single nucleus".

      We thank the reviewer for pointing this out. We tried our best to correct grammatical errors.

      (6) Define GAL1Spr (Figure 6F).

      The GAL1S promoter is an attenuated GAL1 promoter that lacks two out of the four Gal4 binding site. The original paper is now cited in the manuscript on page 10.  

      (7) Figure 7B, C: there appears to be an inconsistency between the image and bar graph value for ΔIDR3.

      The Fano values calculated in 7C are averaged among a population of cells (we added the cell numbers to the legend), while the image in 7B is an example of an individual nucleus. There is some cell-to-cell variability in how the Met4 appears. To be more representative, we chose a different image for ΔIDR3.

      (8) Supplementary Tables: use descriptive titles for file names.

      This is corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      Figure 4F is not cited in the text, and the color legend seems wrong for targeted and control.

      Figure 4F is now cited in the text. The labels were corrected.

    1. eLife assessment

      In this important study, the findings have theoretical and practical implications beyond a single subfield; the work supports the role of breast carcinoma amplified sequence 2 (Bcas2) in positively regulating primitive wave hematopoiesis through amplification of beta-catenin-dependent (canonical) Wnt signaling. The study is convincing, using appropriate and validated methodology in line with the current state-of-the-art; there is a first-rate analysis of a strong phenotype with highly supportive mechanistic data. The findings shed light on the controversial question of whether, when, and how canonical Wnt signaling may be involved in hematopoietic development. The work will be of interest to hematologists but also to developmental biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ning et al. reported that Bcas2 played an indispensable role in zebrafish primitive hematopoiesis via sequestering β-catenin in the nucleus. The authors showed that loss of Bcas2 caused primitive hematopoietic defects in zebrafish. They unraveled that Bcas2 deficiency promoted β-catenin nuclear export via a CRM1-dependent manner in vivo and in vitro. They further validated that BCAS2 directly interacted with β-catenin in the nucleus and enhanced β-catenin accumulation through its CC domains. They unveil a novel insight into Bcas2, which is critical for zebrafish primitive hematopoiesis via regulating nuclear β-catenin stabilization rather than its canonical pre-mRNA splicing functions. Overall, the study is impressive and well-performed, although there are also some issues to address.

      Strengths:

      The study unveils a novel function of Bcas2, which is critical for zebrafish primitive hematopoiesis by sequestering β-catenin. The authors validated the results in vivo and in vitro. Most of the figures are clear and convincing. This study nicely complements the function of Bcas2 in primitive hematopoiesis.

      Weaknesses:

      A portion of the figures were over-exposed.

    3. Reviewer #2 (Public Review):

      Summary:

      Ning and colleagues present studies supporting a role for breast carcinoma amplified sequence 2 (Bcas2) in positively regulating primitive wave hematopoiesis through amplification of beta-catenin-dependent (canonical) Wnt signaling. The authors present compelling evidence that zebrafish bcas2 is expressed at the right time and place to be involved in primitive hematopoiesis, that there are primitive hematopoietic defects in hetero- and homozygous mutant and knockdown embryos, that Bcas2 mechanistically positively regulates canonical Wnt signaling, and that Bcas2 is required for nuclear retention of B-cat through physical interaction involving armadillo repeats 9-12 of B-cat and the coiled-coil domains of Bcas2. Overall, the data and writing are clean, clear, and compelling. This study is a first-rate analysis of a strong phenotype with highly supportive mechanistic data. The findings shed light on the controversial question of whether, when, and how canonical Wnt signaling may be involved in hematopoietic development. We detail some minor concerns and questions below, which if answered, we believe would strengthen the overall story and resolve some puzzling features of the phenotype. Notwithstanding these minor concerns, we believe this is an exceptionally well-executed and interesting manuscript.

      Strengths:

      (1) The study features clear and compelling phenotypes and results.

      (2) The manuscript narrative exposition and writing are clear and compelling.

      (3) The authors have attended to important technical nuances sometimes overlooked, for example, focusing on different pools of cytosolic or nuclear b-catenin.

      (4) The study sheds light on a controversial subject: regulation of hematopoietic development by canonical Wnt signaling and presents clear evidence of a role.

      (5) The authors present evidence of phylogenetic conservation of the pathway.

      Weaknesses:

      (1) The authors present compelling data that Bcas2 regulates nuclear retention of B-cat through physical association involving binding between the Bcas2 CC domains and B-cat arm repeats 9-12. Transcriptional activation of Wnt target genes by B-cat requires physical association between B-cat and Tcf/Lef family DNA binding factors involving key interactions in Arm repeats 2-9 (Graham et al., Cell 2000). Mutually exclusive binding by B-cat regulatory factors, such as ICAT that prevent Tcf-binding is a documented mechanism (e.g. Graham et al., Mol Cell 2002). It would appear - based on the arm repeat usage by Bcas2 (repeats 9-12)-that Bcas2 and Tcf binding might not be mutually exclusive, which would support their model that Bcas2 physical association with B-cat to retain it in the nucleus would be compatible with co-activation of genes by allowing association with Tcf. It might be nice to attempt a three-way co-IP of these factors showing that B-cat can still bind Tcf in the presence of Bcas2, or at least speculate on the plausibility of the three-way interaction.

      (2) A major way that canonical Wnt signaling regulates hematopoietic development is through regulation of the LPM hematopoietic competence territories by activating expression of cdx1a, cdx4, and their downstream targets hoxb5a and hoxa9a (Davidson et al., Nature 2003; Davidson et al., Dev Biol 2006; Pilon et al., Dev Biol 2006; Wang et al., PNAS 2008). Could the authors assess (in situ) the expression of cdx1a, cdx4, hoxb5a, and hoxa9a in the bcas2 mutants?

      (3) The authors show compellingly that even heterozygous loss of bcas2 has strong Wnt-inhibitory effects. If Bcas2 is required for canonical Wnt signaling and bcas2 is expressed ubiquitously from the 1-cell stage through at least the beginning of gastrulation, why do bcas2 KO embryos not have morphological axis specification defects consistent with loss of early Wnt signaling, like loss of head (early), or brain anteriorization (later)? Could the authors provide some comments on this puzzle? Or if they do see any canonical Wnt signaling patterning defects in het- or homozygous embryos, could they describe and/or present them?

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript utilized zebrafish bcas2 mutants to study the role of bcas2 in primitive hematopoiesis and further confirms that it has a similar function in mice. Moreover, they showed that bcas2 regulates the transition of hematopoietic differentiation from angioblasts via activating Wnt signaling. By performing a series of biochemical experiments, they also showed that bcas2 accomplishes this by sequestering b-catenin within the nucleus, rather than through its known function in pre-mRNA splicing.

      Strengths:

      The work is well-performed, and the manuscript is well-written.

      Weaknesses:

      Several issues need to be clarified.

      (1) Is wnt signaling also required during hematopoietic differentiation from angioblasts? Can the authors test angioblast and endothelial markers in embryos with wnt inhibition? Also, can the authors add export inhibitor LMB to the mouse mutants to test if sequestering of b-catenin by bcas2 is conserved during primitive hematopoiesis in mice?

      (2) Bcas2 is required for primitive myelopoiesis in ALM. Does bcas2 play a similar function in primitive myelopoiesis, or is bcas2/b-catenin interaction more important for hematopoietic differentiation in PLM?

      (3) Is it possible that CC1-2 fragment sequester b-catenin? The different phenotypes between this manuscript and the previous article (Yu, 2019) may be due to different mutations in bcas2. Is it possible that the bcas2 mutation in Yu's article produces a complete CC1-2 fragment, which might sequester b-catenin?

      (4) Can the author clarify what embryos the arrows point to in SI Figure 2D? In SI Figure 6B and B', can the author clarify how the nucleus and cytoplasm are bleached? In B, the nucleus also appears to be bleached.

    5. Author Response:

      Thank you very much for your consideration and assessment. We really appreciate the generous comments from the reviewers on our manuscript entitled “BCAS2 promotes primitive hematopoiesis by sequestering β-catenin within the nucleus”. The comments are very helpful for the improvement of our work. We would like to provide the following provisional revision plan to address the public reviews:

      1. To clarify if Bcas2 also promotes primitive myelopoiesis by enhancing nuclear accumulation of β-catenin, bcas2 morpholino will be injected into the Tg(coro1a:EGFP) zebrafish embryos at 1-cell stage, and subsequently the β-catenin distribution in the myeloid cells will be examined. Tg(coro1a:EGFP) is commonly used to track both macrophages and neutrophils.

      2. According to the reviewers’ comments, we will quantify the fluorescence intensity in the cell nucleus and cytoplasm in Figure 3H. Meanwhile, we will adjust the exposure of Figure 5C and Figure 7E, or replaced the figures with high-resolution ones.

      3. Previous studies have reported that β-catenin can bind directly to CRM1 through its central armadillo (ARM) repeats region. β-catenin region containing ARM repeats 10 and the C terminus are essential for its nuclear export (Koike M, et al., The Journal of Biological Chemistry, 2004). In our research, BCAS2 has been demonstrated to bind to the 9-12 ARM repeats of β-catenin. Therefore, it is highly likely that Bcas2 may compete with CRM1 for binding with the nuclear export signal peptide on β-catenin. To further test this possibility, we will transfect HEK293T cells with constructs expressing full-length or truncated forms of β-catenin, and then examine their nuclear distribution. 

      4. To validate if BCAS2 affects CRM1-dependent nuclear export of other classical factors, we plan to knock down or overexpress BCAS2 in HeLa cells, and detect the distribution of ATG1 and CDC37L, which have been identified as CRM1 cargoes.

      5. Considering that the ARM repeats bound by Bcas2 (repeats 9-12) and Tcf (repeats 2-9) might not be mutually exclusive, it is indeed appealing to investigate whether β-catenin can simultaneously interact with Tcf and Bcas2. We will follow review’s suggestion to perform a three-way co-immunoprecipitation assay. Plasmids encoding these three proteins will be co-transfected into cells. Cell lysates will be immunoprecipitated using antibodyspecific to the bait protein (e.g., β-catenin) and eluted proteins will be analyzed using antibodies specific to the other two proteins.

      6. To elucidate that canonical Wnt signaling regulates hematopoietic development by activating expression of cdx1acdx4, and their downstream targets hoxb5a and hoxa9a as previously reported, we intend to examine the expression of cdx4 and hoxa9a in bcas2+/- embryos at 10 ss by performing in situ hybridization.

      7. To further validate whether Wnt signaling is required during endothelial differentiation from angioblasts, wild-type embryos will be subjected to treatment with Wnt inhibitor CCT036477 and the expression of hemangioblast markers npas4lscl, and gata2 and endothelial markers fli1 will be analyzed using in situ hybridization.

      8. In order to clarify whether coiled-coil (CC) domain 1-2 of Bcas2 is sufficient to interact with β-catenin and restore the primitive hematopoietic defect, we will overexpress CC1-2 in Tg(gata1:GFP) embryos injected with bcas2 morpholino, and then investigate the distribution of β-catenin, as well as gata1 expression at 10 ss in these embryos.

    1. eLife assessment

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. The authors provide solid anatomical and phylogenetic evidence in support of a new interpretation of the homology of dorsal sutures in trilobites and their relatives.

    2. Reviewer #1 (Public Review):

      Summary:

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dosal and vental anatomies of a potential new taxon of atiopodans that are closely related to trolobites. Authors assigned their specimens to Acanthomeridion serratum, and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda.

      Strengths:

      New specimens are highly qualified and informative. The morphology of dorsal exoskeleton, except for the supposed free cheek, were well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses.

    3. Reviewer #3 (Public Review):

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are proposed be associated with ventral plates that the authors homologise with the free cheeks of trilobites (although also testing alternative homologies). An update of a published phylogenetic dataset permits reconsideration of whether dorsal ecdysial sutures have a single or multiple origins in trilobites and their relatives.

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variation within a single species. New microtomographic data shed light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dosal and vental anatomies of a pothential new taxon of atiopodans that are closely related to trolobites. Authors assigned their specimens to Acanthomeridion serratum, and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critially, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda.

      Strengths:

      New specimens are highly qualified and informative. The morphology of dorsal exoskeleton, except for the supposed free cheek, were well illustrated and described in detail, which provide a wealth of information for taxonmic and phylogenic analyses.

      Weaknesses:

      The weaknesses of this work is obvious in a number of aspects. Technically, ventral morphlogy is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphomitric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. I am confused by author's description of free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of cephalic shield, e.g. hypostome and fixgena. Critically, homology of cephalic slits (eye slits, eye notch, doral suture, facial suture) not extensivlely discussed either morphologically or functionally. Finally, authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can be explain a deep homology of cephalic suture in molecular level and multiple co-options within the Atiopoda.

      Comments on the revised version:

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      Some more details of the changing article shape and overall length of antennae has been added to the description.

      (2) There are also imprecise descriptions of features.

      Measurements, dimensions and multiple figures are provided for many features in the text and the supplement includes more figures. In total, 11 figures are provided with details (photographs or measurements) of the material.

      (3) Ontogeny of the cephalon was not described.

      A sentence has been added to the description to note the changing width:length of the cephalon during ontogeny, with a reference to Figure 6.

      (3) The critical head element is the so called "ventral plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives an indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      As noted in the diagnosis for the genus, these notches are interpreted to accommodate the eye stalks. The homology of the ventral plates is discussed at length in the manuscript, and is the focus of the three sets of phylogenetic analyses performed.

      Reviewer #3 (Public Review):

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are proposed be associated with ventral plates that the authors homologise with the free cheeks of trilobites (although also testing alternative homologies). An update of a published phylogenetic dataset permits reconsideration of whether dorsal ecdysial sutures have a single or multiple origins in trilobites and their relatives.

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variation within a single species. New microtomographic data shed light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites.

      I think the revision does a satisfactory job of reconciling the data and analyses with the conclusions drawn from them. Referee 1's valid concerns about whether a synonymy of Acanthomeridion anacanthus is justified have been addressed by the addition of a length/width scatterplot in Figure 6. Referee 2's doubts about homology between the librigenae of trilobites and ventral plates of Acanthomeridion have been taken on board by re-running the phylogenetic analyses with a coding for possible homology between the ventral plates and the doublure of olenelloid trilobites. The authors sensibly added more trilobite terminals to the matrix (including Olenellus) and did analyses with and without constraints for olenelloids being a grade at the base of Trilobita. My concerns about counting how many times dorsal sutures evolved on a consensus tree have been addressed (the authors now play it safe and say "multiple" rather than attempting to count them on a bushy topology). The treespace visualisation (Figure 9) is a really good addition to the revised paper.

      Weaknesses:

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in Zhiwenia/Protosutura, Acanthomeridion and Trilobita. The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). This paper is not a game-changer because these questions have been asked several times over the past seven years, but there are solid, worthy advances made here.

      I'd like to see some of the most significant figures from the Supplementary Information included in the main paper so they will be maximally accessed. The "stick-like" exopods are not best illustrated in the main paper; their best imagery is in Figure S1. Why not move that figure (or at least its non-redundant panels) as well as the reconstruction (Figure S7) to the main paper? The latter summarises the authors' interpretation that a large axe-shaped hypostome appears to be contiguous with ventral plates.

      We have moved these figures from the supplementary information to the main text, and renumbered figures accordingly. Fig S1 has now been split – panels a and b are in the main text (new Fig. 4), with the remainder staying as Fig S1. Fig S7 is now Fig. 8 in the main text.

      The specimens depict evidence for three pairs of post-antennal cephalic appendages but it's a bit hard to picture how they functioned if there's no room between the hypostome and ventral plates. Also, a comment is required on the reconstruction involving all cephalic appendages originating against/under the hypostome rather the first pair being paroral near the posterior end of the hypostome and the rest being post-hypostomal as in trilobites.

      A short comment has been added to the caption.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      (2) There are also imprecise descriptions of features (see my annotations in submitted ms).

      (3) Ontogeny of the cephalon was not described.

      (3) The critical head element is the so called "vental plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives a indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      The references swap back and forth between journal titles being abbreviated or written out in full. Please standardise this to journal format rather than alternating between two different styles.

      Line 145: Perez-Peris et al. (2021) should be cited as the source for the Anacheirurus appendages.

      Added, thank you.

      Line 310: The El Albani et al (2024) paper on ellipsocephaloid appendages should be noted in connection with an A+4 (rather than A+3) head in trilobites.

      Added.

      Minor or trivial corrections:

      Line 51: move the three citations to follow "arthropods" rather than following "artiopodans", as none of these papers are specifically about Artiopoda.

      Changed thank you

      Caption to Figure 1 and line 100: Acanthomeridion appears in Figure 1 and in the text with no context. Please weave it into the text appropriately.

      Line 136: The data were...

      Corrected

      Line 164: upper case for Morphobank.

      Corrected

      Line 183: spelling of "Village" (not "Vallige").

      Corrected

      Line 197: I suggest using "articles" rather than "podomeres" for the antenna (as you did in line 232).

      Changed thank you

      Line 269: "gnathobasal spine (rather than "spin").

      Changed thank you

      Line 272: "Exopods" is used here but elsewhere "exopodites" is used.

      Exopodites is now used throughout

      Line 359: "can been seen" is awkward and, as evolutionary patterns are inferred rather than "seen", could be reworded as "... loss of the eye slit has been inferred...".

      Reworded as suggested

      Line 422 and 423: As two referees asked in the first round of review, delete "iconic" and "symbolic".

      Deleted as suggested

      Line 467: "librigena-like".

      Corrected

    1. eLife assessment

      This important computational study provides new insights into how neural dynamics may lead to time-evolving behavioral errors as observed in certain working-memory tasks. By combining ideas from efficient coding and attractor neural networks, the authors construct a two-module network model to capture the sensory-memory interactions and the distributed nature of working memory representations. They provide convincing evidence supporting that their two-module network, although none of the alternative circuit structures they considered can account for error patterns reported in orientation-estimation tasks with delays.

    2. Reviewer #1 (Public Review):

      Summary:

      Working memory is imperfect - memories accrue error over time and are biased towards certain identities. For example, previous work has shown memory for orientation is more accurate near the cardinal directions (i.e., variance in responses is smaller for horizontal and vertical stimuli) while being biased towards diagonal orientations (i.e., there is a repulsive bias away from horizontal and vertical stimuli). The magnitude of errors and biases increase the longer an item is held in working memory and when more items are held in working memory (i.e., working memory load is higher). Previous work has argued that biases and errors could be explained by increased perceptual acuity at cardinal directions. However, these models are constrained to sensory perception and do not explain how biases and errors increase over time in memory. The current manuscript builds on this work to show how a two-layer neural network could integrate errors and biases over a memory delay. In brief, the model includes a 'sensory' layer with heterogenous connections that lead to the repulsive bias and decreased error at the cardinal directions. This layer is then reciprocally connected with a classic ring attractor layer. Through their reciprocal interactions, the biases in the sensory layer are constantly integrated into the representation in memory. In this way, the model captures the distribution of biases and errors for different orientations that has been seen in behavior and their increasing magnitude with time. The authors compare the two-layer network to a simpler one-network model, showing that the one model network is harder to tune and shows an attractive bias for memories that have lower error (which is incompatible with empirical results).

      Strengths:

      The manuscript provides a nice review of the dynamics of items in working memory, showing how errors and biases differ across stimulus space. The two-layer neural network model is able to capture the behavioral effects as well as relate to neurophysiological observations that memory representations are distributed across sensory cortex and prefrontal cortex.

      The authors use multiple approaches to understand how the network produces the observed results. For example, analyzing the dynamics of memories in the low-dimensional representational space of the networks provides the reader with an intuition for the observed effects.

      As a point of comparison with the two-layer network, the authors construct a heterogenous one-layer network (analogous to a single memory network with embedded biases). They argue that such a network is incapable of capturing the observed behavioral effects but could potentially explain biases and noise levels in other sensory domains where attractive biases have lower errors (e.g., color).

      The authors show how changes in the strength of Hebbian learning of excitatory and inhibitory synapses can change network behavior. This argues for relatively stronger learning in inhibitory synapses, an interesting prediction.

      The manuscript is well-written. In particular, the figures are well done and nicely schematize the model and the results.

      Weaknesses:

      Despite its strengths, the manuscript does have some weaknesses. These weaknesses are adequately discussed in the manuscript and motivate future research.

      One weakness is that the model is not directly fit to behavioral data, but rather compared to a schematic of behavioral data. As noted above, the model provides insight into the general phenomenon of biases in working memory. However, because the models are not fit directly to data, they may miss some aspects of the data.

      In addition, directly fitting the models to behavioral data could allow for a broader exploration of parameter space for both the one-layer and two-layer models (and their alternatives). Such an approach would provide stronger support for the papers claims (such as "....these evolving errors...require network interaction between two distinct modules."). That being said, the manuscript does explore several alternative models and also acknowledges the limitation of not directly fitting behavior, due to difficulties in fitting complex neural network models to data.

      One important behavioral observation is that both diffusive noise and biases increase with the number of items in working memory. The current model does not capture these effects and it isn't clear how the model architecture could be extended to capture these effects. That being said, the authors note this limitation in the Discussion and present it as a future direction.

      Overall:

      Overall, the manuscript was successful in building a model that captured the biases and noise observed in working memory. This work complements previous studies that have viewed these effects through the lens of optimal coding, extending these models to explain the effects of time in memory. In addition, the two-layer network architecture extends previous work with similar architectures, adding further support to the distributed nature of working memory representations.

    3. Reviewer #2 (Public Review):

      In this manuscript, Yang et al. present a modeling framework to understand the pattern of response biases and variance observed in delayed-response orientation estimation tasks. They combine a series of modeling approaches to show that coupled sensory-memory networks are in a better position than single-area models to support experimentally observed delay-dependent response bias and variance in cardinal compared to oblique orientations. These errors can emerge from a population-code approach that implements efficient coding and Bayesian inference principles and is coupled to a memory module that introduces random maintenance errors. A biological implementation of such operation is found when coupling two neural network modules, a sensory module with connectivity inhomogeneities that reflect environment priors, and a memory module with strong homogeneous connectivity that sustains continuous ring attractor function. Comparison with single-network solutions that combine both connectivity inhomogeneities and memory attractors shows that two-area models can more easily reproduce the patterns of errors observed experimentally.

      Strengths:

      The model provides an integration of two modeling approaches to the computational bases of behavioral biases: one based on Bayesian and efficient coding principles, and one based on attractor dynamics. These two perspectives are not usually integrated consistently in existing studies, which this manuscript beautifully achieves. This is a conceptual advancement, especially because it brings together the perceptual and memory components of common laboratory tasks.

      The proposed two-area model provides a biologically plausible implementation of efficient coding and Bayesian inference principles, which interact seamlessly with a memory buffer to produce a complex pattern of delay-dependent response errors. No previous model had achieved this.

      Weaknesses:

      The correspondence between the various computational models is not clearly shown. It is not easy to see clearly this correspondence because network function is illustrated with different representations for different models. In particular, the Bayesian model of Figure 2 is illustrated with population responses for different stimuli and delays, while the attractor models of Figure 3 and 4 are illustrated with neuronal tuning curves but not population activity.

      The proposed model has stronger feedback than feedforward connections between the sensory and memory modules (J_f = 0.1 and J_b = 0.25). This is not the common assumption when thinking about hierarchical processing in the brain. The manuscript argues that error patterns remain similar as long as the product of J_f and J_b is constant, so it is unclear why the authors preferred this network example as opposed to one with J_b = 0.1 and J_f = 0.25.

    4. Reviewer #3 (Public Review):

      Summary:

      The present study proposes a neural circuit model consisting of coupled sensory and memory networks to explain the circuit mechanism of the cardinal effect in orientation perception which is characterized by the bias towards the oblique orientation and the largest variance at the oblique orientation.

      Strengths:

      The authors have done numerical simulations and preliminary analysis of the neural circuit model to show the model successfully reproduces the cardinal effect. And the paper is well-written overall. As far as I know, most of the studies on the cardinal effect are at the level of statistical models, and the current study provides one possibility of how neural circuit models reproduce such an effect.

      Weaknesses:

      There are no major weaknesses and flaws in the present study, although I suggest the author conduct further analysis to deepen our understanding of the circuit mechanism of the cardinal effects.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      I appreciate the revisions made by the author which address all of my concerns.

      Nevertheless, I have some new questions when I read the paper again. These questions are not necessarily criticisms of the paper, which may reflect the gap in my understanding. Meanwhile, it also reflects the writing might be improved further.

      - Fig. 1:

      I understand that a critical assumption for generating the required result is that the oblique orientation has lower "energy" than the cardinal orientation (Fig. 1G). Meanwhile, I always have a concept that typically the energy is defined as the negative of log probability. If we take the log probability plotted in Fig. 1A, that will generate an energy landscape that is upside down compared with current Fig. 1G. How should I understand this discrepancy?

      As the reviewer pointed out, a higher prior distribution near cardinal orientations causes cardinal attraction in typical Bayesian models, which can correspond to lower energy around these orientations. Additionally, in the context of learning natural statistics, Hebbian plasticity in excitatory connections strengthens recurrent connections and drives attraction toward more prevalent stimuli within neural circuits.

      However, as demonstrated by Wei and Stocker (2015), Bayesian inference model can also produce cardinal repulsion when optimizing encoding efficiency. In our network, this efficient encoding is achieved through heterogeneous lateral connections and inhibitory Hebbian plasticity in the sensory module, resulting in lower energy near oblique orientations. Thus, the shape of prior distribution does not have a direct one-to-one correspondence with the bias pattern or the dynamic energy landscape. 

      - Fig. 3 and its corresponding text.

      I understand and agree the Fig. 3B&C that neurons near cardinal orientations are shaper and denser. But why the stimulus representation around cardinal orientations are sparser compared with the oblique orientation? Isn't more neurons around cardinal orientation implying a less sparser representation?

      Indeed, with sharper tuning curves, having more neurons can result in a sparser representation. Consider an extreme case where each orientation, discretized by 1°, is represented by only one active neuron with a tuning width of 1°. While this would require more neurons to represent overall stimuli compared to cases with wider tuning curves, each stimulus would be represented by fewer neurons, aligning with the traditional concept of sparse coding.

      However, in Fig. 3 and corresponding text, we did not measure the sparseness of active neurons for each orientation. Instead, we used the term ‘sparser representation’ to describe the increased distance between representations of different stimuli near the cardinal orientations. Although this increased distance can be consistent with the traditional concept of sparse coding, to avoid any confusion, we have revised the term ‘sparser representation’ to ‘more dispersed representation’ in the 3rd paragraph in pg. 5 and the 3rd paragraph in pg. 6.

    1. eLife assessment

      The study presents a potentially valuable approach by combining two measurements (pHLA binding and pHLA-TCR binding) to improve predictions of which mutations in colorectal cancer are likely to be presented to and recognised by the immune system. While this approach is promising, the evidence supporting the primary claim remains somewhat incomplete. The experimental validation of the computational predictions with actual immune responses is still limited, despite the increase in sample size from 4 to 8 in this revision.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.<br /> (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.<br /> (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.<br /> (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weakness:

      The authors have made comprehensive revisions to the original version of the article, and this version has now addressed my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper reports a number of somewhat disparate findings on a set of colorectal tumour and infiltrating T-cells. The main finding is a combined machine-learning tool which combines two previous state-of-the-art tools, MHC prediction, and T-cell binding prediction to predict immunogenicity. This is then applied to a small set of neoantigens and there is a small-scale validation of the prediciton at the end.

      Strengths:

      The prediction of immunogenic neoepitopes is an important and unresolved question.

      Weaknesses:

      The paper contains a lot of extraneous material not relevant to the main claim. Conversely, it lacks important detail on the major claim.

      (1) The analysis of T cell repertoire in Figure 2 seems irrelevant to the rest of the paper. As far as I could ascertain, this data is not used further.

      We appreciate the reviewer for their valuable feedback. We concur with the reviewer's observation that the analysis of the TCR repertoire in Figure 2 should be moved to the supplementary section. We have moved Figures 2B to 2F to Supplementary Figure 2.

      However, the analysis of TCR profiles is still presented in Figure 2, as it plays a pivotal role in the process of neoantigen selection. This is because the TCR profiles of eight (out of 28) patients were used for neoantigen prediction. We have added the following sentences to the results section to explain the importance of TCR profiling: “Furthermore, characterizing T cell receptors (TCRs) can complement efforts to predict immunogenicity.” (Results, Lines 311-312, Page 11)

      (2) The key claim of the paper rests on the performance of the ML algorithm combining NETMHC and pmtNET. In turn, this depends on the selection of peptides for training. I am unclear about how the negative peptides were selected. Are they peptides from the same databases as immunogenic petpides but randomised for MHC? It seems as though there will be a lot of overlap between the peptides used for testing the combined algorithm, and the peptides used for training MHCNet and pmtMHC. If this is so, and depending on the choice of negative peptides, it is surely expected that the tools perform better on immunogenic than on non-immunogenic peptides in Figure 3. I don't fully understand panel G, but there seems very little difference between the TCR ranking and the combined. Why does including the TCR ranking have such a deleterious effect on sensitivity?

      We thank the reviewer for their valuable feedback. We believe the reviewer implies 'MHCNet' as NetMHCpan and 'pmtMHC' as pMTnet tools. First, the negative peptides, which have been excluded from PRIME (1), were not randomized with MHC (HLA-I) but were randomized with TCR only. Secondly, the positive peptides selected for our combined algorithms are chosen from many databases such as 10X Genomics, McPAS, VDJdb, IEDB, and TBAdb, while MHCNet uses peptides from the IEDB database and pMTNet uses a totally different dataset from ours for training. Therefore, there is not much overlap between our training data and the training datasets for MHCNet and pMTNet. Thus, the better performance of our tool is not due to overlapping training datasets with these tools or the selection of negative peptides.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8).

      Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively. The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      (3) The key validation of the model is Figure 5. In 4 patients, the authors report that 6 out 21 neo-antigen peptides give interferon responses > 2 fold above background. Using NETMHC alone (I presume the tool was used to rank peptides according to binding to the respective HLAs in each individual, but this is not clear), identified 2; using the combined tool identified 4. I don't think this is significant by any measure. I don't understand the score shown in panel E but I don't think it alters the underlying statistic.

      Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5)

      In conclusion, the paper demonstrates that combining MHCNET and pmtMHC results in a modest increase in the ability to discriminate 'immunogenic' from 'non-immunogenic' peptide; however, the strength of this claim is difficult to evaluate without more knowledge about the negative peptides. The experimental validation of this approach in the context of CRC is not convincing.

      Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.

      (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.

      (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.

      (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weaknesses:

      (1) While multiple advanced tools and algorithms are used, the study could benefit from a more detailed explanation of the rationale behind algorithm choice and parameter settings, ensuring reproducibility and transparency.

      We thank the reviewer for their comment. We have revised the explanation regarding the rationale behind algorithm choice and parameter settings as follows: “We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with 'k' set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using 9 folds for training and 1 fold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model's ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model's performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features.” (Method, lines 225-234, page 8).

      (2) While pHLA-TCR binding displayed higher specificity, its lower sensitivity compared to pHLA binding suggests a trade-off between the two measures. Optimizing the balance between sensitivity and specificity could be crucial for the practical application of these predictions in clinical settings.

      We appreciate the reviewer's suggestion. Due to the limited availability of patient blood samples and time constraints for validation, we have chosen to prioritize high specificity and positive predictive value to enhance the selection of neoantigens.

      (3) The experimental validation was performed on a limited number of patients (four), which might affect the generalizability of the findings. Increasing the number of patients for validation could provide a more comprehensive assessment of the model's performance.

      This has been addressed earlier. Here, we restate it as follows: Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      This study presents a new approach of combining two measurements (pHLA binding and pHLA-TCR binding) in order to refine predictions of which patient mutations are likely presented to and recognized by the immune system. Improving such predictions would play an important role in making personalized anti-cancer vaccinations more effective.

      Strengths:

      The study combines data from pre-existing tools pVACseq and pMTNet and applies them to a CRC patient population, which the authors show may improve the chance of identifying immunogenic, cancer-derived neoepitopes. Making the datasets collected publicly available would expand beyond the current datasets that typically describe caucasian patients.

      Weaknesses:

      It is unclear whether the pNetMHCpan and pMTNet tools used by the authors are entirely independent, as they appear to have been trained on overlapping datasets, which may explain their similar scores. The pHLA-TCR score seems to be driving the effects, but this not discussed in detail.

      The HLA percentile from NetMHCpan and the TCR ranking from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides.Additionally, we partitioned the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%), ensuring no overlap between the training and testing datasets.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8). We also included the dataset construction workflow in Supplementary Figure 1.

      Due to sample constraints, the authors were only able to do a limited amount of experimental validation to support their model; this raises questions as to how generalizable the presented results are. It would be desirable to use statistical thresholds to justify cutoffs in ELISPOT data.

      We chose a cutoff of 2 for ELISPOT, following the recommendation of the study by Moodie et al. (2). The study provides standardized cutoffs for defining positive responses in ELISPOT assays. It presents revised criteria based on a comprehensive analysis of data from multiple studies, aiming to improve the precision and consistency of immune response measurements across various applications.

      Some of the TCR repertoire metrics presented in Figure 2 are incorrectly described as independent variables and do not meaningfully contribute to the paper. The TCR repertoires may have benefitted from deeper sequencing coverage, as many TCRs appear to be supported only by a single read.

      We appreciate the reviewer’s feedback. We have moved Figures 2B through 2F to Supplementary Figure 2. We agree with the reviewer that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. The TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite the variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Please open source the raw and processed data, code, and software output (NetMHCpan, pMTnet), which are important to verify the results.

      NetMHCpan and pMTNet are publicly available software tools (3, 4). In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model).

      (2) Comparison with more state-of-the-art neoantigen prediction models could provide a more comprehensive view of the combined model's performance relative to the current field.

      To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (5), NeoPeptide (6), dbPepneo (7), Tantigen (8), and TSNAdb (9), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10X Genomics Chromium Single Cell Immune Profiling (10-13).The findings indicate that the combined model from pMTNet and NetMHCpan outperforms NetTCR tool (14). To address the reviewer's inquiry, we have incorporated these results in Supplementary Table 6.

      (3) While the combined model shows a positive overall rank coverage score, indicating improved ranking accuracy, the scores are relatively low. Further refinement of the model or the inclusion of additional predictive features might enhance the ranking accuracy.

      We appreciate the reviewer’s suggestion. The RankCoverageScore provides an objective evaluation of the rank results derived from the final peptide list generated by the two tools. The combined model achieved a higher RankCoverageScore than pMTNet, indicating its superior ability to identify immunogenic peptides compared to existing in silico tools. In order to provide a more comprehensive assessment, we included an additional four validated samples to recalculate the rank coverage score. The results demonstrate a notable difference between NetMHCpan and the Combined model (-0.37 and 0.04, respectively). We have incorporated these findings into Supplementary Figure 6 to address the reviewer's question. Additionally, we have modified Figure 5E to present a simplified demonstration of the superior performance of the combined model compared to NetMHCpan.

      (4) Collect more public data and fine-tune the model. Then you will get a SOTA model for neoantigen selection. I strongly recommend you write Python scripts and open source.

      We thank the reviewer for their feedback. We have made the raw and processed data, as well as the model, available on GitHub. Additionally, we have gathered more public data and conducted evaluations to assess its efficiency compared to other methods. You can find the repository here: https://github.com/QuynhPham1220/Combined-model.

      Reviewer #3 (Recommendations For The Authors):

      The Methods section seems good, though HLA calling is more accurate using arcasHLA than OptiType. This would be difficult to correct as OptiType is integrated into pVACtools.

      We chose Optitype for its exceptional accuracy, surpassing 99%, in identifying HLA-I alleles from RNA-Seq data. This decision was informed by a recent extensive benchmarking study that evaluated its performance against "gold-standard" HLA genotyping data, as described in the study by Li et al.(15). Furthermore, we have tested two tools using the same RNA-Seq data from FFPE samples. The allele calling accuracy of Optitype was found to be superior to that of Acras-HLA. To address the reviewer's question, we have included these results in Supplementary Table 2, along with the reference to this decision (Method, line 200, page 07).

      I am not sufficiently expert in machine learning to assess this part of the methods.<br /> TCR beta repertoire analysis of biopsy is highly variable; though my expertise lies largely in sequencing using the 10X genomics platform, typically one sees multiple RNAs per cell. Seeing the majority of TCRs supported by only a single read suggests either problems with RNA capture (particularly in this case where the recovered RNA was split to allow both RNAseq and targeted TCR seq) or that the TCR library was not sequenced deeply enough. I'd like to have seen rarefaction plots of TCR repertoire diversity vs the number of reads to ensure that sufficiently deep sequencing was performed.

      We appreciate the suggestions provided by the reviewer. We agree that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. In addition, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them. We have already added the rarefaction plots of TCR repertoire diversity versus the number of reads in Figure 2C. These have been added to the main text (lines 329-335).

      In order to support the authors' conclusions that MSI-H tumors have fewer TCR clonotypes than MSS tumors (Figure S2a) I would have liked to see Figure 2a annotated so that it was easy to distinguish which patient was in which group, as well as the rarefaction plots suggested above, to be sure that the difference represented a real difference between samples and not technical variance (which might occur due to only 4 samples being in the MSI-H group).

      We thank the reviewer for their recommendation. Indeed, it's worth noting that the number of MSI-H tumors is fewer than the MSS groups, which is consistent with the distribution observed in colorectal cancer, typically around 15%. This distribution pattern aligns with findings from several previous studies, as highlighted in these studies (16, 17). To provide further clarification on this point, we have included rarefaction plots illustrating TCR repertoire diversity versus the number of reads in Supplementary Figure 3 (line 339). Additionally, MSI-H and MSS samples have been appropriately labeled for clarity.

      The authors write: "in accordance with prior investigations, we identified an inverse relationship between TCR clonality and the Shannon index (Supplementary Figure S1)" >> Shannon index is measure of TCR clonality, not an independent variable. The authors may have meant TCR repertoire richness (the absolute number of TCRs), and the Shannon index (a measure of how many unique TCRs are present in the index).

      We thank the reviewer for their comment regarding the correlation between the number of TCRs and the Shannon index. We have revised the figure to illustrate the relationship between the number of TCRs and the Shannon index, and we have relocated it to Figure 2B.

      The authors continue: "As anticipated, we identified only 58 distinct V (Figure 2C) and 13 distinct J segments (Figure 2D), that collectively generated 184,396 clones across the 27 tumor tissue samples, underscoring the conservation of these segments (Figure 2C & D)" >> it is not clear to me what point the authors are making: it is well known that TCR V and J genes are largely shared between Caucasian populations (https://pubmed.ncbi.nlm.nih.gov/10810226/), and though IMGT lists additional forms of these genes, many are quite rare and are typically not included in the reference sequences used by repertoire analysis software. I would clarify the language in this section to avoid the impression that patient repertoires are only using a restricted set of J genes.

      We thank for the reviewer’s feedback. We have revised the sentence as follows: " As anticipated, we identified 59 distinct V segments (Supplementary Figure 2C) and 13 distinct J segments (Supplementary Figure 2D), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Supplementary Figure 2C & D)” (Result, lines 354-356, page 12)

      As a result I would suggest moving Figure 2 with the exception of 2A into the supplementals - I would have been more interested in a plot showing the distribution of TCRs by frequency, i.e. how what proportion of clones are hyperexpanded, moderately expanded etc. This would be a better measure of the likely immune responses.

      We thank the reviewer for their comment. With the exception of Figure 2A, we have relocated Figures 2B through 2F to Supplementary Figure 2.

      The authors write "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic peptides (Supplementary Table 3)" >> The authors mean to refer to Table S4.

      We appreciate the reviewer's feedback. Here's the revised sentence: "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary Table 5)” (lines 368-370).

      The authors write "As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank < 2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p< 0.00001)" >> this is not surprising, as tools such as NetMHCpan are trained on databases of immunogenic peptides, and thus it is likely that these aren't independent measures (in https://academic.oup.com/nar/article/48/W1/W449/5837056 the authors state that "The training data have been vastly extended by accumulating MHC BA and EL data from the public domain. In particular, EL data were extended to include MA data"). In the pMTNet paper it is stated that pMNet encoded pMHC information using "the exact data that were used to train the netMHCpan model" >> While I am not sufficiently expert to review details on machine learning training models, it would seem that the pHLA scores from NetMHCpan and pMTNet may not be independent, which would explain the concordance in scores that the authors describe in Figures 3B and 3D. I would invite the authors to comment on this.

      The HLA percentiles from NetMHCpan and TCR rankings from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides. NetMHCpan is trained to predict peptide-MHC class I interactions by integrating binding affinity and MS eluted ligand data, using a second output neuron in the NNAlign approach. This setup produces scores for both binding affinity and ligand elution. In contrast, pMTNet predicts TCR binding specificity of class I pMHCs through three steps:

      (1) Training a numeric embedding of pMHCs (class I only) to numerically represent protein sequences of antigens and MHCs.

      (2) Training an embedding of TCR sequences using stacked auto-encoders to numerically encode TCR sequence text strings.

      (3) Creating a deep neural network combining these two embeddings to integrate knowledge from TCRs, antigenic peptide sequences, and MHC alleles. Fine-tuning is employed to finalize the prediction model for TCR-pMHC pairing.

      Therefore, pHLA scores from NetMHCpan and pMTNet are independent. Furthermore, Figures 3B and 3D do not show concordance in scores, as there was no equivalence in the percentage of immunogenic and non-immunogenic peptides in the two groups (≥2 HLA percentile and ≥2 TCR percentile).

      Many of the authors of this paper were also authors of the epiTCR paper, would this not have been a better choice of tool for assessing pHLA-TCR binding than pMTNet?

      When we started this project, EpiTCR had not been completed. Therefore, we chose pMTNet, which had demonstrated good performance and high accuracy at that time. The validated performance of EpiTCR is an ongoing project that will implement immunogenic assays (ELISpot and single-cell sequencing) to assess the prediction and ranking of neoantigens. This study is also mentioned in the discussion: "Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation." (lines 532-535).

      In Figure 3G it would appear that the pHLA-TCR score is driving the interaction, could the authors comment on this?

      The authors sincerely appreciate the reviewer for their valuable feedback. Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively.

      The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      In Figure 4A I would invite the authors to comment on how they chose the sample sizes they did for the discovery and validation datasets: the numbers seem rather random. I would question whether a training dataset in which 20% of the peptides are immunogenic accurately represents the case in patients, where I believe immunogenic peptides are less frequent (as in Figure 5).

      We aimed to maximize the number of experimentally validated immunogenic peptides, including those from viruses, with only a small percentage from tumors available for training. This limitation is inherent in the field. However, our ultimate objective is to develop a tool capable of accurately predicting peptide immunogenicity irrespective of their source. Therefore, the current percentage of immunogenic peptides may not accurately reflect real-world patient cases, but this is not crucial to our development goals.

      For Figure 5C I would invite the authors to consider adding a statistical test to justify the cutoff at 2fold enrichments.

      Thank you for your feedback. Instead of conducting a statistical test, we have implemented standardized cutoffs as defined in the cited study (2). This research introduces refined criteria for identifying positive responses in ELISPOT assays through a comprehensive analysis of data from multiple studies. These criteria aim to improve the accuracy and consistency of immune response measurements across various applications. The reference to this study has been properly incorporated into the manuscript (Method, line 281, page 10).

      Minor points:

      "paired white blood cells" >> use "paired Peripheral Blood Mononuclear Cells".

      We appreciate the reviewer for the feedback. We agree with the reviewer's observation. The sentence has been revised as follows: "Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient's HLA-I allele profile and the gene expression levels of mutated genes." (Introduction, lines 55-58, page 2).

      "while RNA sequencing determines the patient's HLA-I allele profile and gene expression levels of mutated genes." >> RNA sequencing covers both the mutant and reference form of the gene, allowing assessment of variant allele frequency.

      "the current approach's impact on patient outcomes remains limited due to the scarcity of effective immunogenic neoantigens identified for each patient" >> Some clearer language here would have been preferred as different tumor types have different mutational loads

      We thank the reviewer for their valuable feedback. We agree with the reviewer's observation. The passage has been revised accordingly: “The current approach's impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens.” (Introduction, lines 62-64, page 3).

      References

      (1) J. Schmidt et al., Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Rep Med 2, 100194 (2021).

      (2) Z. Moodie et al., Response definition criteria for ELISPOT assays revisited. Cancer Immunol Immunother 59, 1489-1501 (2010).

      (3) V. Jurtz et al., NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360-3368 (2017).

      (4) T. Lu et al., Deep learning-based prediction of the T cell receptor-antigen binding specificity. Nat Mach Intell 3, 864-875 (2021).

      (5) J. Xia et al., NEPdb: A Database of T-Cell Experimentally-Validated Neoantigens and Pan-Cancer Predicted Neoepitopes for Cancer Immunotherapy. Front Immunol 12, 644637 (2021).

      (6) W. J. Zhou et al., NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. Database (Oxford) 2019 (2019).

      (7) X. Tan et al., dbPepNeo: a manually curated database for human tumor neoantigen peptides. Database (Oxford) 2020 (2020).

      (8) G. Zhang, L. Chitkushev, L. R. Olsen, D. B. Keskin, V. Brusic, TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinformatics 22, 40 (2021).

      (9) J. Wu et al., TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis. Genomics Proteomics Bioinformatics 16, 276-282 (2018).

      (10) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2.

      (11) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2.

      (12) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2.

      (13) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2.

      (14) A. Montemurro et al., NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRalpha and beta sequence data. Commun Biol 4, 1060 (2021).

      (15) G. Li et al., Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 16, eade2886 (2024).

      (16) Z. Gatalica, S. Vranic, J. Xiu, J. Swensen, S. Reddy, High microsatellite instability (MSI-H) colorectal carcinoma: a brief review of predictive biomarkers in the era of personalized medicine. Fam Cancer 15, 405-412 (2016).

      (17) N. Mulet-Margalef et al., Challenges and Therapeutic Opportunities in the dMMR/MSI-H Colorectal Cancer Landscape. Cancers (Basel) 15 (2023).

    1. eLife assessment

      This important manuscript demonstrates that UGGT1 is involved in preventing the premature degradation of endoplasmic reticulum (ER) glycoproteins through the re-glucosylation of their N-linked glycans following release from the calnexin/calreticulin lectins. The authors include a wealth of convincing data in support of their findings, although extending these findings to other types of substrates, such as secreted proteins, could further demonstrate the global importance of this mechanism for protein trafficking through the secretory pathway. This will work will be of interest to scientists interested in ER protein quality control, proteostasis, and protein trafficking.

    2. Reviewer #1 (Public review):

      Summary:

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT1-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes the glycoprotein degradation.

      Weaknesses:

      NA

    3. Reviewer #2 (Public review):

      In this study, Ninagawa et al., sheds light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO , they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      This study convincingly demonstrates that many unstable misfolded glycoproteins undergo accelerated degradation without UGGTs. Also, this study provides evidence of a "tug of war" model involving UGGTs (pulling glycoproteins to being refolded) and EDEMs (pulling glycoproteins to ERAD).

      The study explores the physiological role of UGGT, particularly examining the impact of ATF6α in UGGT knockout cells' stress response. The authors further investigate the physiological consequences of accelerated ATF6α degradation, convincingly demonstrating that cells are sensitive to ER stress in the absence of UGGTs and unable to mount an adequate ER stress response.

      These findings offer significant new insights into the ERAD field, highlighting UGGT1 as a crucial component in maintaining ER protein homeostasis. This represents a major advancement in our understanding of the field.

    4. Reviewer #3 (Public review):

      This valuable manuscript demonstrates the long-held prediction that the glycosyltransferase UGGT slows degradation of endoplasmic reticulum (ER)-associated degradation substrates through a mechanism involving re-glucosylation of asparagine-linked glycans following release from the calnexin/calreticulin lectins. The evidence supporting this conclusion is solid using genetically-deficient cell models and well established biochemical methods to monitor the degradation of trafficking-incompetent ER-associated degradation substrates, although this could be improved by better defining of the importance of UGGT in the secretion of trafficking competent substrates. This work will be of specific interest to those interested in mechanistic aspects of ER protein quality control and protein secretion.

      The authors have attempted to address my comments from the previous round of review, although some issues still remain. For example, the authors indicate that it is difficult to assess how UGGT1 influences degradation of secretion competent proteins, but this is not the case. This can be easily followed using metabolic labeling experiments, where you would get both the population of protein secreted and degraded under different conditions. Thus, I still feel that addressing the impact of UGGT1 depletion on the ER quality control for secretion competent protein remains an important point that could be better addressed in this work.

      Further, in the previous submission, the authors showed that UGGT2 depletion demonstrates a similar reduction of ATF6 activation to that observed for UGGT1 depletion, although UGGT2 depletion does not reduce ATF6 protein levels like what is observed upon UGGT1 depletion. In the revised manuscript, they largely remove the UGGT2 data and only highlight the UGGT1 depletion data. While they are somewhat careful in their discussion, the implication is that UGGT1 regulates ATF6 activity by controlling its stability. The fact that UGGT2 has a similar effect on activity, but not stability, indicates that these enzymes may have other roles not directly linked to ATF6 stability. It is important to include the UGGT2 data and explicitly highlight this point in the discussion. Its fine to state that figuring out this other function is outside the scope of this work but removing it does not seem appropriate.

      As I mentioned in my previous review, I think that this work is interesting and addresses an important gap in experimental evidence supporting a previously asserted dogma in the field. I do think that the authors would be better suited for highlighting the limitations of the study, as discussed above. Ultimately, though, this is an important addition to the literature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes glycoprotein degradation.

      Weaknesses:

      Less clear, though, is the involvement of UGGT2 in the process. Also, to this reviewer, some data do not necessarily support the conclusion.

      Major criticisms:

      (1) One of the biggest problems I had on reading through this manuscript is that, while the authors appeared to generate UGGTs-KO cells from HCT116 and HeLa cells, it was not clearly indicated which cell line was used for each experiment. I assume that it was HCT116 cells in most cases, but I did not see that it was clearly mentioned. As the expression level of UGGT2 relative to UGGT1 is quite different between the two cell lines, it would be critical to know which cells were used for each experiment.

      Thank you for this comment. We have clarified this point, especially in the figure legends.

      (2) While most of the authors' conclusion is sound, some claims, to this reviewer, were not fully supported by the data. Especially I cannot help being puzzled by the authors' claim about the involvement of UGGT2 in the ERAD process. In most of the cases, KO of UGGT2 does not seem to affect the stability of ERAD substrates (ex. Fig. 1C, 2A, 3D). When the author suggests that UGGT2 is also involved in the ERAD, it is far from convincing (ex. Fig. 2D/E). Especially because now it has been suggested that the main role of UGGT2 may be distinct from UGGT1, playing a role in lipid quality control (Hung, et al., PNAS 2022), it is imperative to provide convincing evidence if the authors want to claim the involvement of UGGT2 in a protein quality control system. In fact, it was not clear at all whether even UGGT1 is also involved in the process in Fig. 2D/E, as the difference, if any, is so subtle. How the authors can be sure that this is significant enough? While the authors claim that the difference is statistically significant (n=3), this may end up with experimental artifacts. To say the least, I would urge the authors to try rescue experiments with UGGT1 or 2, to clarify that the defect in UGGT-DKO cells can be reversed. It may also be interesting to see that the subtle difference the authors observed is indeed N-glycan-dependent by testing a non-glycosylated version of the protein (just like NHK-QQQ mutants in Fig. 2C).

      We appreciate this comment. According to this comment, we reevaluated the importance of UGGT2 for ER-protein quality control. As this reviewer mentioned, KO of UGGT2 does not affect the stability of ATF6a, NHK, rRI332-Flag or EMC1-△PQQ-Flag (Fig. 1E, 2A, and 3DE). Furthermore, we tested whether overexpression of UGGT2 reverses the phenotype of UGGT-DKO regarding the degradation rate of NHK, and we found that it did not affect the degradation rate of NHK, whereas overexpression of UGGT1 restored the degradation rate to that in WT cells.

      Author response image 1.

      Collectively, these facts suggest that the role of UGGT2 in ER protein quality control is rather limited in HCT116 cells. Therefore, we have decided not to mention UGGT2 in the title, and weakened the overall claim that UGGT2 contributes to ER protein quality control. Tissues with high expression of UGGT2 or cultured cells other than HCT116 would be appropriate for revealing the detailed function of UGGT2.

      To this reviewer, it is still possible that the involvement of UGGT1 (or 2, if any) could be totally substrate-dependent, and the substrates used in Fig 2D or E happen not to be dependent to the action of UGGTs. To the reviewer, without the data of Fig. 2D and E the authors provide enough evidence to demonstrate the involvement of UGGT1 in preventing premature degradation of glycoprotein ERAD substrates. I am just afraid that the authors may have overinterpreted the data, as if the UGGTs are involved in stabilization of all glycoproteins destined for ERAD.

      Based on the point this reviewer mentioned, we decided to delete previous Fig. 2D and 2E. There may be more or less efficacy of UGGT1 for preventing early degradation of substrates.

      (3) I am a bit puzzled by the DNJ treatment experiments. First, I do not see the detailed conditions of the DNJ treatment (concentration? Time?). Then, I was a bit surprised to see that there were so little G3M9 glycans formed, and there was about the same amount of G2M9 also formed (Figure 1 Figure supplement 4B-D), despite the fact that glucose trimming of newly syntheized glycoproteins are expected to be completely impaired (unless the authors used DNJ concentration which does not completely impair the trimming of the first Glc). Even considering the involvement of Golgi endo-alpha-mannosidase, a similar amount of G3M9 and G2M9 may suggest that the experimental conditions used for this experiment (i.e. concentration of DNJ, duration of treatment, etc) is not properly optimized.

      We think that our experimental condition of DNJ treatment is appropriate to evaluate the effect of DNJ. Referring to the other papers (Ali and Field, 2000; Karlsson et al., 1993; Lomako et al., 2010; Pearse et al., 2010; Tannous et al., 2015), 0.5 mM DNJ is appropriate. In our previously reported experiment, 16 h treatment with kifunensine mannosidase inhibitor was sufficient for N-glycan composition analysis prior to cell collection (Ninagawa et al., 2014), and we treated cells for a similar time in Figure 1-Figure Supplement 4 and 5 (and Figure 1-Figure Supplement 6). We could see the clear effect of DNJ to inhibit degradation of ATF6a with 2 hours of pretreatment (Fig. 1G). Furthermore, our results are very reasonable and consistent with previous findings that DNJ increased GM9 the most (Cheatham et al., 2023; Gross et al., 1983; Gross et al., 1986; Romero et al., 1985). In addition to DNJ, we used CST for further experiments in new figures (Fig. 1H and Figure 1-Figure supplement 6). DNJ and CST are inhibitors of glucosidase; DNJ is a stronger inhibitor of glucosidase II, while CST is a stronger inhibitor of glucosidase I (Asano, 2000; Saunier et al., 1982; Szumilo et al., 1987; Zeng et al., 1997). An increase in G3M9 and G2M9 was detected using CST (Figure1-Figure Supplement 6). Like DNJ, CST also inhibited ATF6a degradation in UGGT-DKO cells (Fig. 1H). These findings show that our experimental condition using glucosidase inhibitor is appropriate and strongly support our model (Fig. 5). Differences between the effects of DNJ and CST are now described in our manuscript pages 8 to 10.

      Reviewer #2 (Public Review):

      In this study, Ninagawa et al., shed light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO cells, they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      While this study convincingly demonstrates early degradation of misfolded glycoproteins in the absence of UGGTs, my major concern is the need for additional experiments to support the "tug of war" model involving UGGTs and EDEMs in influencing the substrate's fate - whether misfolded glycoproteins are pulled into the folding or degradation route. Specifically, it would be valuable to investigate how overexpression of UGGTs and EDEMs in WT cells affects the choice between folding and degradation for misfolded glycoproteins. Considering previous studies indicating that monoglucosylation influences glycoprotein solubility and stability, an essential question is: what is the nature of glycoproteins in UGGTKO/EDEMKO and potentially UGGT/EDEM overexpression cells? Understanding whether these substrates become more soluble/stable when GM9 versus mannose-only translation modification accumulates would provide valuable insights.

      In the new figure 2DE, we conducted overexpression experiments of structure formation factors UGGT1 and/or CNX, and degradation factors EDEMs. While overexpression of structure formation factors (Fig. 2DE) and KO of degradation factors (Ninagawa et al., 2015; Ninagawa et al., 2014) increased stability of substrates, KO of UGGT1 (Fig. 1E, 2A and 3DF) and overexpression of degradation factors (Fig. 2DE) (Hirao et al., 2006; Hosokawa et al., 2001; Mast et al., 2005; Olivari et al., 2005) accelerated degradation of substrates. A comparison of the properties of N-glycan with the normal type and the type without glucoses was already reported (Tannous et al., 2015). The rate of degradation of substrate was unchanged, but efficiency of secretion of substrates was affected.

      The study delves into the physiological role of UGGT, but is limited in scope, focusing solely on the effect of ATF6alpha in UGGT KO cells' stress response. It is crucial for the authors to investigate the broader impact of UGGT KO, including the assessment of basal ER proteotoxicity levels, examination of the general efflux of glycoproteins from ER, and the exploration of the physiological consequences due to UGGT KO. This broader perspective would be valuable for the wider audience. Additionally, the marked increase in ATF4 activity in UGGTKO requires discussion, which the authors currently omit.

      We evaluated the sensitivity of WT and UGGT1-KO cells to ER stress (Figure 4G). KO of UGGT1 increased the sensitivity to ER stress inducer Tg, indicating the importance of UGGT1 for resisting ER stress.

      We add the following description in the manuscript about ATF4 activity in UGGT1-KO: “In addition to this, UGGT1 is necessary for proper functioning of ER resident proteins such as ATF6a (Fig. 4B-F). It is highly possible that ATF6a undergoes structural maintenance by UGGT1, which could be necessary to avoid degradation and maintain proper function, because ATF6a with more rigid in structure tended to remain in UGGT1-KO cells (Fig. 4C). Responses of ERSE and UPRE to ER stress, which require ATF6a, were decreased in UGGT1-KO cells (Fig. 4DE). In contrast, ATF4 reporter activity was increased in UGGT1-KO cells (Fig. 4F), while the basal level of ATF4 in UGGT1-KO cells was comparable with that in WT (Figure 1-Figure supplement 2B). The ATF4 pathway might partially compensate the function of the ERSE and UPRE pathways in UGGT1-KO cells in acute ER stress. This is now described on Page 17 in our manuscript.

      The discussion section is brief and could benefit from being a separate section. It is advisable for the authors to explore and suggest other model systems or disease contexts to test UGGT's role in the future. This expansion would help the broader scientific community appreciate the potential applications and implications of this work beyond its current scope.

      Thank you for making this point. The DISCUSSION part has now been separated in our manuscript. We added some points in the manuscript about other model organisms and diseases in the DISCUSSION as follows: “ Our work focusing on the function of mammalian UGGT1 greatly advances the understanding how ER homeostasis is maintained in higher animals. Considering that Saccharomyces cerevisiae does not have a functional orthologue of UGGT1 (Ninagawa et al., 2020a) and that KO of UGGT1 causes embryonic lethality in mice (Molinari et al., 2005), it would be interesting to know at what point the function of UGGT1 became evolutionarily necessary for life. Related to its importance in animals, it would also be of interest to know what kind of diseases UGGT1 is associated with. Recently, it has been reported that UGGT1 is involved in ER retention of Trop-2 mutant proteins, which are encoded by a causative gene of gelatinous drop-like corneal dystrophy (Tax et al., 2024). Not only this, but since the ER is known to be involved in over 60 diseases (Guerriero and Brodsky, 2012), we must investigate how UGGT1 and other ER molecules are involved in diseases.”

      Reviewer #3 (Public Review):

      This manuscript focuses on defining the importance of UGGT1/2 in the process of protein degradation within the ER. The authors prepared cells lacking UGGT1, UGGT2, or both UGGT1/UGGT2 (DKO) HCT116 cells and then monitored the degradation of specific ERAD substrates. Initially, they focused on the ER stress sensor ATF6 and showed that loss of UGGT1 increased the degradation of this protein. This degradation was stabilized by deletion of ERAD-specific factors (e.g., SEL1L, EDEM) or treatment with mannose inhibitors such as kifunesine, indicating that this is mediated through a process involving increased mannose trimming of the ATF6 N-glycan. This increased degradation of ATF6 impaired the function of this ER stress sensor, as expected, reducing the activation of downstream reporters of ER stress-induced ATF6 activation. The authors extended this analysis to monitor the degradation of other well-established ERAD substrates including A1AT-NHK and CD3d, demonstrating similar increases in the degradation of destabilized, misfolding protein substrates in cells deficient in UGGT. Importantly, they did experiments to suggest that re-overexpression of wild-type, but not catalytically deficient, UGGT rescues the increased degradation observed in UGGT1 knockout cells. Further, they demonstrated the dependence of this sensitivity to UGGT depletion on N-glycans using ERAD substrates that lack any glycans. Ultimately, these results suggest a model whereby depletion of UGGT (especially UGGT1 which is the most expressed in these cells) increases degradation of ERAD substrates through a mechanism involving impaired re-glucosylation and subsequent re-entry into the calnexin/calreticulin folding pathway.

      I must say that I was under the impression that the main conclusions of this paper (i.e., UGGT1 functions to slow the degradation of ERAD substrates by allowing re-entry into the lectin folding pathway) were well-established in the literature. However, I was not able to find papers explicitly demonstrating this point. Because of this, I do think that this manuscript is valuable, as it supports a previously assumed assertion of the role of UGGT in ER quality control. However, there are a number of issues in the manuscript that should be addressed.

      Notably, the focus on well-established, trafficking-deficient ERAD substrates, while a traditional approach to studying these types of processes, limits our understanding of global ER quality control of proteins that are trafficked to downstream secretory environments where proteins can be degraded through multiple mechanisms. For example, in Figure 1-Figure Supplement 2, UGGT1/2 knockout does not seem to increase the degradation of secretion-competent proteins such as A1AT or EPO, instead appearing to stabilize these proteins against degradation. They do show reductions in secretion, but it isn't clear exactly how UGGT loss is impacting ER Quality Control of these more relevant types of ER-targeted secretory proteins.

      We appreciate your comment. It is certainly difficult to assess in detail how UGGT1 functions against secretion-competent proteins, but we think that the folding state of these proteins is improved, which avoids their degradation and increases their secretion. In Figure 1-Figure supplement 2E, there is a clear decrease in secretion of EPO in UGGT1-KO cells, suggesting that UGGT1 also inhibits degradation of such substrates. Note that, as shown in Fig. 3A-C, once a protein forms a solid structure, it is rarely degraded in the ER.

      Lastly, I don't understand the link between UGGT, ATF6 degradation, and ATF6 activation. I understand that the idea is that increased ATF6 degradation afforded by UGGT depletion will impair activation of this ER stress sensor, but if that is the case, how does UGGT2 depletion, which only minimally impacts ATF6 degradation (Fig. 1), impact activation to levels similar to the UGGT1 knockout (Fig 4)? This suggests UGGT1/2 may serve different functions beyond just regulating the degradation of this ER stress sensor. Also, the authors should quantify the impaired ATF6 processing shown in Fig 4B-D across multiple replicates.

      According to this valuable comment, we reevaluated our manuscript. As this reviewer mentioned, involvement of UGGT2 in the activation of ATF6a cannot be explained only by the folding state of ATF6a. Thus, the part about whether UGGT2 is effective in activating ATF6 is outside the scope of this paper. The main focus of this paper is the contribution of UGGT1 to the ER protein quality control mechanism.

      Ultimately, I do think the data support a role for UGGT (especially UGGT1) in regulating the degradation of ERAD substrates, which provides experimental support for a role long-predicted in the field. However, there are a number of ways this manuscript could be strengthened to further support this role, some of which can be done with data they have in hand (e.g., the stats) or additional new experiments.

      In this revision period, to further elucidate the function of UGGT, we did several additional experiments (new figures Fig. 1H, 2DE, 4G and, Figure 1-Figure Supplement 6). We hope that these will bring our papers up to the level you have requested.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Abbreviations: GlcNAc, N-acetylglucosamines -> why plural?

      Corrected.

      (2) Abstract: to this reviewer, it may not be so common to cite references in the abstract.

      We submit this manuscript to eLife as “Research Advances”. In the instructions of eLife for “Research Advances”, there is the description: “A reference to the original eLife article should be included in the abstract, e.g. in the format “Previously we showed that XXXX (author, year). Here we show that YYYY.” We follow this.

      (3) Introduction: "as the site of biosynthesis of approximately one-third of all proteins." Probably this statement needs a citation?

      We added the reference there. You can also confirm this in “The Human Protein Atlas” website. https://www.proteinatlas.org/humanproteome/tissue/secretome

      (4) Figure 1F - the authors claimed that maturation of HA was delayed also in UGGT2 cells, but it was not at all clear to me. Rescue experiments with UGGT2 would be desired.

      We agree with this reviewer, but there was a statistically significant difference in the 80 min UGGT2-KO strain. Previously, it was reported that HA maturation rate was not affected by UGGT2 (Hung et al., 2022). We think that the difference is not large. A rescue experiment of UGGT2 on the degradation of NHK was conducted, and is shown in this response to referees.

      (5) Figure 4A, here also the authors claim that UGGT2 is "slightly" involved in folding of ATF6alpha(P) but it is far from convincing to this reviewer.

      Now we also think that involvement of UGGT2 in ER protein quality control should be examined in the future.

      (6) Page 11, line 7 from the bottom: "peak of activation was shifted from 1 hour to 4 hours after the treatment of Tg in UGGT-KO cells". I found this statement a bit awkward; how can the authors be sure that "the peak" is 4 hours when the longest timing tested is 4 hours (i.e. peak may be even later)?

      Corrected. We deleted the description.

      (7) Page 11, line 4 "a more rigid structure that averts degradation" Can the authors speculate what this "rigid" structure actually means? The reviewer has to wonder what kind of change can occur to this protein with or without UGGT1. Binding proteins? The difference in susceptibility against trypsin appears very subtle anyway (Figure 4 Figure Supplement 1).

      Let us add our thoughts here: Poorly structured ATF6a is immediately routed for degradation in UGGT1-KO cells. As a result, ATF6a with a stable or rigid structure have remained in the UGGT1-KO strain. ATF6a with a metastable state is tended to be degraded without assistance of UGGT1.

      (8) Figure 1 Figure supplement 2; based on the information provided, I calculate the relative ratio of UGGT2/UGGT1 in HCT116 which is 4.5%, and in HeLa 26%. Am I missing something? Also significant figure, at best, should be 2, not 3 (i.e. 30%, not 29.8%).

      Corrected. Thank you for this comment.

      Reviewer #2 (Recommendations For The Authors):

      (1) The effect in Fig. 2B with UGGT1-D1358A add-back is minimal. Testing the inactive and active add-back on other substrates, such as ATF6alpha, which undergoes a more rapid degradation, would provide a more comprehensive assessment.

      To examine the effect of full length and inactive mutant of UGGT1 in UGGT1-KO and UGGT2-KO on the rate of degradation of endogenous ATF6a, we tried to select more than 300 colonies stably expressing full-length Myc-UGGT1/2, UGGT1/2-Flag, and UGGT1/2 (no tag), and their point mutant of them. However, no cell lines expressing nearly as much or more UGGT1/2 than endogenous ones were obtained. The expression level of UGGT1 seemed to be tightly regulated. A low-expressing stable cell line could not recover the phenotype of ATF6a degradation.

      We also tried to measure the degradation rate of exogenously expressed ATF6a. But overexpressed ATF6a is partially transported to the Golgi and cleaved by proteases, which makes it difficult to evaluate only the effect of degradation.

      (2) In reference to this statement on pg. 11:

      "This can be explained by the rigid structure of ATF6(P) lacking structural flexibility to respond to ER stress because the remaining ATF6(P) in UGGT1-KO cells tends to have a more rigid structure that averts degradation, which is supported by its slightly weaker sensitivity to trypsin (Figure 4-figure supplement 1A). "

      The rationale for testing ATF6(P) rigidity via trypsin digestion needs clarification. The authors should provide more background, especially if it relates to previous studies demonstrating UGGT's influence on substrate solubility. If trypsin digestion is indeed addressing this, it should be applied consistently to all tested misfolded glycoproteins, ensuring a comprehensive approach.

      We now provide more background with three references about trypsin digestion. Trypsin digestion allows us to evaluate the structure of proteins originated from the same gene, but it can sometimes be difficult to comparatively evaluate the structure of proteins originated from different genes. For example, antitrypsin is resistant to trypsin by its nature, which does not necessarily mean that antitrypsin forms a more stable structure than other proteins. NHK, a truncated version of antitrypsin, is still resistant to trypsin compared with other substrates.

      (3) Many of the figures described in the manuscript weren't referred to a specific panel. For example, pg. 12 "Fig. 1E and Fig.5," the exact panel for Fig. 5 wasn't referenced.

      Thank you for this comment. Corrected.

      (4) For experiments measuring the composition of glycoproteins in different KO lines, it is necessary to do the experiment more than once for conducting statistical analysis and comparisons. Moreover, the authors did not include raw composition data for these experiments. Statistical analysis should also be done for Fig. 4E-F.

      Our N-glycan composition data (Figure 1-Figure supplement 5 and 6C) is consistent with previous our papers (George et al., 2021; George et al., 2020; Ninagawa et al., 2015; Ninagawa et al., 2014). We did it twice in the previous study and please refer to it regarding statistical analysis (George et al., 2020). We add the raw composition data of N-glycan (Figure 1-Figure supplement 4 and 6B). In Fig. 4D-F, now statistical analysis is included.

      Ali, B.R., and M.C. Field. 2000. Glycopeptide export from mammalian microsomes is independent of calcium and is distinct from oligosaccharide export. Glycobiology. 10:383-391.

      Asano, N. 2000. Glycosidase-Inhibiting Glycomimetic Alkaloids. Biological Activities and Therapeutic Perspectives. Journal of Synthetic Organic Chemistry, Japan. 58:666-675.

      Cheatham, A.M., N.R. Sharma, and P. Satpute-Krishnan. 2023. Competition for calnexin binding regulates secretion and turnover of misfolded GPI-anchored proteins. J Cell Biol. 222.

      George, G., S. Ninagawa, H. Yagi, J.I. Furukawa, N. Hashii, A. Ishii-Watabe, Y. Deng, K. Matsushita, T. Ishikawa, Y.P. Mamahit, Y. Maki, Y. Kajihara, K. Kato, T. Okada, and K. Mori. 2021. Purified EDEM3 or EDEM1 alone produces determinant oligosaccharide structures from M8B in mammalian glycoprotein ERAD. Elife. 10.

      George, G., S. Ninagawa, H. Yagi, T. Saito, T. Ishikawa, T. Sakuma, T. Yamamoto, K. Imami, Y. Ishihama, K. Kato, T. Okada, and K. Mori. 2020. EDEM2 stably disulfide-bonded to TXNDC11 catalyzes the first mannose trimming step in mammalian glycoprotein ERAD. Elife. 9:e53455.

      Gross, V., T. Andus, T.A. Tran-Thi, R.T. Schwarz, K. Decker, and P.C. Heinrich. 1983. 1-deoxynojirimycin impairs oligosaccharide processing of alpha 1-proteinase inhibitor and inhibits its secretion in primary cultures of rat hepatocytes. Journal of Biological Chemistry. 258:12203-12209.

      Gross, V., T.A. Tran-Thi, R.T. Schwarz, A.D. Elbein, K. Decker, and P.C. Heinrich. 1986. Different effects of the glucosidase inhibitors 1-deoxynojirimycin, N-methyl-1-deoxynojirimycin and castanospermine on the glycosylation of rat alpha 1-proteinase inhibitor and alpha 1-acid glycoprotein. Biochem J. 236:853-860.

      Hirao, K., Y. Natsuka, T. Tamura, I. Wada, D. Morito, S. Natsuka, P. Romero, B. Sleno, L.O. Tremblay, A. Herscovics, K. Nagata, and N. Hosokawa. 2006. EDEM3, a soluble EDEM homolog, enhances glycoprotein endoplasmic reticulum-associated degradation and mannose trimming. J Biol Chem. 281:9650-9658.

      Hosokawa, N., I. Wada, K. Hasegawa, T. Yorihuzi, L.O. Tremblay, A. Herscovics, and K. Nagata. 2001. A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO reports. 2:415-422.

      Hung, H.H., Y. Nagatsuka, T. Solda, V.K. Kodali, K. Iwabuchi, H. Kamiguchi, K. Kano, I. Matsuo, K. Ikeda, R.J. Kaufman, M. Molinari, P. Greimel, and Y. Hirabayashi. 2022. Selective involvement of UGGT variant: UGGT2 in protecting mouse embryonic fibroblasts from saturated lipid-induced ER stress. Proc Natl Acad Sci U S A. 119:e2214957119.

      Karlsson, G.B., T.D. Butters, R.A. Dwek, and F.M. Platt. 1993. Effects of the imino sugar N-butyldeoxynojirimycin on the N-glycosylation of recombinant gp120. Journal of Biological Chemistry. 268:570-576.

      Lomako, J., W.M. Lomako, C.A. Carothers Carraway, and K.L. Carraway. 2010. Regulation of the membrane mucin Muc4 in corneal epithelial cells by proteosomal degradation and TGF-beta. Journal of cellular physiology. 223:209-214.

      Mast, S.W., K. Diekman, K. Karaveg, A. Davis, R.N. Sifers, and K.W. Moremen. 2005. Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins. Glycobiology. 15:421-436.

      Ninagawa, S., T. Okada, Y. Sumitomo, S. Horimoto, T. Sugimoto, T. Ishikawa, S. Takeda, T. Yamamoto, T. Suzuki, Y. Kamiya, K. Kato, and K. Mori. 2015. Forcible destruction of severely misfolded mammalian glycoproteins by the non-glycoprotein ERAD pathway. J Cell Biol. 211:775-784.

      Ninagawa, S., T. Okada, Y. Sumitomo, Y. Kamiya, K. Kato, S. Horimoto, T. Ishikawa, S. Takeda, T. Sakuma, T. Yamamoto, and K. Mori. 2014. EDEM2 initiates mammalian glycoprotein ERAD by catalyzing the first mannose trimming step. J Cell Biol. 206:347-356.

      Olivari, S., C. Galli, H. Alanen, L. Ruddock, and M. Molinari. 2005. A novel stress-induced EDEM variant regulating endoplasmic reticulum-associated glycoprotein degradation. J Biol Chem. 280:2424-2428.

      Pearse, B.R., T. Tamura, J.C. Sunryd, G.A. Grabowski, R.J. Kaufman, and D.N. Hebert. 2010. The role of UDP-Glc:glycoprotein glucosyltransferase 1 in the maturation of an obligate substrate prosaposin. J Cell Biol. 189:829-841.

      Romero, P.A., B. Saunier, and A. Herscovics. 1985. Comparison between 1-deoxynojirimycin and N-methyl-1-deoxynojirimycin as inhibitors of oligosaccharide processing in intestinal epithelial cells. Biochem J. 226:733-740.

      Saunier, B., R.D. Kilker, J.S. Tkacz, A. Quaroni, and A. Herscovics. 1982. Inhibition of N-linked complex oligosaccharide formation by 1-deoxynojirimycin, an inhibitor of processing glucosidases. Journal of Biological Chemistry. 257:14155-14161.

      Szumilo, T., G.P. Kaushal, and A.D. Elbein. 1987. Purification and properties of the glycoprotein processing N-acetylglucosaminyltransferase II from plants. Biochemistry. 26:5498-5505.

      Tannous, A., N. Patel, T. Tamura, and D.N. Hebert. 2015. Reglucosylation by UDP-glucose:glycoprotein glucosyltransferase 1 delays glycoprotein secretion but not degradation. Molecular biology of the cell. 26:390-405.

      Zeng, Y., Y.T. Pan, N. Asano, R.J. Nash, and A.D. Elbein. 1997. Homonojirimycin and N-methyl-homonojirimycin inhibit N-linked oligosaccharide processing. Glycobiology. 7:297-304.

    1. eLife assessment

      This fundamental study reports the most comprehensive neurotransmitter atlas of any organism to date, using fluorescent knock-in reporter lines. The work is comprehensive, rigorous, and compelling. The tool will be used by broad audience of scientists interested in neuronal cell type differentiation and function, and could be a seminal reference in the field.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and colleagues conducted a study to determine the neurotransmitter identity of all neurons in C. elegans hermaphrodites and males. They used CRISPR technology to introduce fluorescent gene expression reporters into the genomic loci of NT pathway genes. This approach is expected to better reflect in vivo gene expression compared to other methods like promoter- or fosmid-based transgenes, or available scRNA datasets. The study presents several noteworthy findings, including sexual dimorphisms, patterns of NT co-transmission, neuronal classes that likely use NTs without direct synthesis, and potential identification of unconventional NTs (e.g. betaine releasing neurons). The data is well-described and critically discussed, including a comparison with alternative methods. Although many of the observations and proposals have been previously discussed by the Hobert lab, the current study is particularly valuable due to its comprehensiveness. This NT atlas is the most complete and comprehensive of any nervous system that I am aware of, making it an extremely important tool for the community.

      Strengths:

      Very compelling study presenting the most comprehensive neurotransmitter (NT) map of any model so far, using state-of-the art tools and validations. The work is very important not only as a resource but also for our understanding that (NT) function of neurons is best understood taking into consideration the full set of genes implicated in NT metabolism and transport.

      Weaknesses:

      None, all have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Together with the known anatomical connectivity, molecular atlasses paves the way toward functional maps of the nervous system of C. elegans. Along with the analysis of previous scRNA sequencing and reporter strains, new expression patterns are generated for hermaphrodite and males based on CRISPR-knocked-in GFP reporter strains and the use of the color-coded Neuropal strain to accurately identify neurons. Beyond a map of the known neurotransmitters (GABA, Acetylcholine, Glutamate, dopamine, serotonin, tyramine, octopamine), the atlas also identifies neurons likely using betaine and suggests sets of neurons employing new unknown monoaminergic transmission, or using exclusively peptidergic neurotransmission.

      Strengths:

      The use of CRISPR reporter alleles and of the Neuropal strain to assign neurotransmitter usage to each neuron is much more rigourous than previous analysis and reveal intriguing differences between scRNA seq, fosmid reporter and CRISPR knock-in approaches. The differences between approaches are discussed.

      Weaknesses:

      All have been addressed.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Wang et al. provides the most comprehensive description and comparison of the expression of the different genes required to synthesize, transport and recycle the most common neurotransmitters (Glutamate, Acetylcholine, GABA, Serotonin, Dopamine, Octopamine and Tyramine) used by hermaphrodite and male C. elegans. This paper will be a seminal reference in the field. Building and contrasting observations from previous studies using fosmid, multicopy reporters and single cell sequencing, they now describe CRISPR/Cas-9-engineered reporter strains that, in combination with the multicolor pan-neuronal labeling of all C. elegans neurons (NeuroPAL), allows rigorous elucidation of neurotransmitter expression patterns. These novel reporters also illuminate previously unappreciated aspects of neurotransmitter biology in C. elegans, including sexual dimorphism of expression patterns, co-transmission and the elucidation of cell-specific pathways that might represent new forms of neurotransmission.

      Strengths:

      The authors set to establish neurotransmitter identities in C. elegans males and hermaphrodites via varying techniques, including integration of previous studies, examination of expression patterns and generation of endogenous CRISPR-labeled alleles. Their study is comprehensive, detailed and rigorous, and achieve the aims. It is an excellent reference for the field, particularly those interested in biosynthetic pathways of neurotransmission and their distribution in vivo, in neuronal and non-neuronal cells.

      Weaknesses:

      No weaknesses noted. The authors do a great job linking their characterizations with other studies and techniques, leading credence to their findings. As the authors note, there are sexually dimorphic differences across animals, and varying expression patterns of enzymes. While it is unlikely there will be huge differences in the reported patterns across individual animals, it is possible that these expression patterns could vary developmentally, or based on physiological or environmental conditions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers and editor for their helpful comments and suggestions. In response, we have revised the manuscript in two main ways:

      (1) To address the comments about rearranging figures and tables, we added a new Figure 3 that summarizes neurotransmitter assignments across all neuron classes. Our rationale for this change is detailed below.

      (2) To address the comment on clarifying neurotransmitter synthesis versus uptake, we analyzed two additional reporter alleles that tag the monoamine uptake transporters for 5-HT and potentially tyramine. These results are now presented in a new Figure 8 and corresponding sections in the manuscript. Related tables have been updated to include this expression data. Two more authors have been added due to their contributions to these experiments.

      For more detailed changes, please see our responses to the specific reviewer's comments as well as the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Wang and colleagues conducted a study to determine the neurotransmitter identity of all neurons in C. elegans hermaphrodites and males. They used CRISPR technology to introduce fluorescent gene expression reporters into the genomic loci of NT pathway genes. This approach is expected to better reflect in vivo gene expression compared to other methods like promoter- or fosmid-based transgenes, or available scRNA datasets. The study presents several noteworthy findings, including sexual dimorphisms, patterns of NT co-transmission, neuronal classes that likely use NTs without direct synthesis, and potential identification of unconventional NTs (e.g. betaine releasing neurons). The data is well-described and critically discussed, including a comparison with alternative methods. Although many of the observations and proposals have been previously discussed by the Hobert lab, the current study is particularly valuable due to its comprehensiveness. This NT atlas is the most complete and comprehensive of any nervous system that I am aware of, making it an extremely useful tool for the community. 

      Reviewer #2 (Public Review):

      Summary: 

      Together with the known anatomical connectivity of C. elegans, a neurotransmitter atlas paves the way toward a functional connectivity map. This study refines the expression patterns of key genes for neurotransmission by analyzing the expression patterns from CRISPR-knocked-in GFP reporter strains using the color-coded Neuropal strain to identify neurons. Along with data from previous scRNA sequencing and other reporter strains, examining these expression patterns enhances our understanding of neurotransmitter identity for each neuron in hermaphrodites and the male nervous system. Beyond the known neurotransmitters (GABA, Acetylcholine, Glutamate, dopamine, serotonin, tyramine, octopamine), the atlas also identifies neurons likely using betaine and suggests sets of neurons employing new unknown monoaminergic transmission, or using exclusively peptidergic transmission. 

      Strengths: 

      The use of CRISPR reporter alleles and of the Neuropal strain to assign neurotransmitter usage to each neuron is much more rigorous than previous analysis and reveals intriguing differences between scRNA seq, fosmid reporter, and CRISPR knock-in approaches. Among other mechanisms, these differences between approaches could be attributed to 3'UTR regulatory mechanisms for scRNA vs. knockin or titration of rate-limited negative regulatory mechanisms for fosmid vs. knockin. It would be interesting to discuss this and highlight the occurrences of these potential phenomena for future studies.  

      We recognize that readers of this study may be interested in understanding the differences between the three approaches. Therefore, in the Introduction, we addressed the potential risk of overexpression artifacts associated with multicopy transgenes, such as fosmid-based reporters, which can affect rate-limiting negative regulatory mechanisms. Additionally, in the Discussion, we included a section titled 'Comparing approaches and caveats of expression pattern analysis' to further explore these comparative methods and their associated nuances.

      Weaknesses: 

      For GABAergic transmission, one shortcoming arises from the lack of improved expression pattern by a knockin reporter strain for the GABA recapture symporter snf-11. In its absence, it is difficult to make a final conclusion on GABA recapture vs GABA clearance for all neurons expressing the vesicular GABA transporter neurons (unc-47+) but not expressing the GAD/UNC-25 gene e.g. SIA or R2A neurons. At minima, a comparison of the scRNA seq predictions versus the snf-11 fosmid reporter strain expression pattern would help to better judge the proposed role of each neuron in GABA clearance or recycling. 

      The snf-11 fosmid-based reporter data shows very good overlap with scRNA seq predictions (now included in Supp. Table S1). 

      But there are two much stronger reasons why we did not seek to further the analysis of expression of the snf-11 GABA uptaker:

      (1) Due to available anti-GABA staining data, we do know which neurons have the potential to take up GABA (via SNF-11).

      (2) Focusing on SNF-11 function rather than expression, we can ask which neurons lose anti-GABA staining in snf-11 mutants.

      Both of these types of analyses have been done in an earlier study from our lab (Gendrel et al., 2016, PMID 27740909), which, among other things, investigated GABA uptake mechanisms via SNF-11. Apart from analyzing the expression of a fosmid-based snf-11 reporter, we immunostained worms for GABA in both snf-11 mutant and wild type backgrounds (results summarized in Tables 1 and 2 of Gendrel et al.). Of the neurons that typically stain for GABA (Table 1, Gendrel et al.), two neuron classes (ALA and AVF) lost the staining in snf-11 mutants, suggesting that these neurons likely uptake GABA via SNF-11. Importantly, one of the neurons the reviewer mentioned, R2A, stains for GABA in both wild type and snf-11 mutants, indicating that it likely does not uptake GABA via SNF-11. The other neuron mentioned, SIA, does not stain for GABA in wild type (Table 2, Gendrel et al.), hence not a GABA uptake neuron. In cases like SIA and other neurons, where a neuron does not express unc-25 but does express unc-47 reporters (either fosmid or CRISPR reporter alleles), we speculate that UNC-47 transport another neurotransmitter.

      Considering the complexities of different tagging approaches, like T2A-GFP and SL2-GFP cassettes, in capturing post-translational and 3'UTR regulation is important. The current formulation is simplistic. e.g. after SL2 trans-splicing the GFP RNA lacks the 5' regulatory elements, T2A-GFP self-cleavage has its own issues, and the his-44-GFP reporter protein does certainly have a different post-translational life than vesicular transporters or cytoplasmic enzymes. 

      Yes, agreed, these points are mentioned in the Introduction and discussed in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      Do all splicing variants of neurotransmitter-related genes translate into functional proteins? The possibility that some neurons express a non-functional splice variant, leading to his-74-GFP reporter expression without functional neurotransmitter-related protein production is not addressed. 

      We thank the reviewer for bringing up this really interesting point, which we had not considered. First and foremost, with the exception of unc-25 (discussed in the next point), for all other genes that produce multiple splice forms, we made sure to append our tag (at 5’ or 3’ end) such that the expression of all splice forms is captured. The reviewer raises the interesting point that in an alternative splicing scenario, some of the cells that express the primary transcript may “switch” to an inactive form. While we cannot exclude this possibility, we have confirmed by sequence analysis in WormBase that in five of the six cases where there is alternative splicing, the alternatively spliced exon lies outside the conserved, functionally relevant (enzymatic or structural) domain. In one case, unc-25, a shorter isoform is produced that does cut into the functionally relevant domain; however, since all unc-25 reporter allele expression cells are also staining positive for GABA, this may not be an issue. 

      Also, one tagged splice variant of unc-25 is expected to fail to produce a GFP reporter, can this cause trouble? 

      Yes, there is indeed a third splice variant of unc-25 with an alternative C-terminus. To address potential expression of this isoform, we CRISPR-engineered another reporter, unc-25(ot1536[unc-25b.1::t2a::gfp::h2b]), in which the inserted t2a::gfp::h2b sequences are fused to the C-terminus of the alternative splice form, but we did not observe any expression of this reporter. Now included in the manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, Wang et al. provide the most comprehensive description and comparison of the expression of the different genes required to synthesize, transport, and recycle the most common neurotransmitters (Glutamate, Acetylcholine, GABA, Serotonin, Dopamine, Octopamine, and Tyramine) used by hermaphrodite and male C. elegans. This paper will be a seminal reference in the field. Building and contrasting observations from previous studies using fosmid, multicopy reporters, and single-cell sequencing, they now describe CRISPR/Cas-9-engineered reporter strains that, in combination with the multicolor pan-neuronal labeling of all C. elegans neurons (NeuroPAL), allows rigorous elucidation of neurotransmitter expression patterns. These novel reporters also illuminate previously unappreciated aspects of neurotransmitter biology in C. elegans, including sexual dimorphism of expression patterns, cotransmission, and the elucidation of cell-specific pathways that might represent new forms of neurotransmission. 

      Strengths: 

      The authors set out to establish neurotransmitter identities in C. elegans males and hermaphrodites via varying techniques, including integration of previous studies, examination of expression patterns, and generation of endogenous CRISPR-labeled alleles. Their study is comprehensive, detailed, and rigorous, and achieves the aims. It is an excellent reference for the field, particularly those interested in biosynthetic pathways of neurotransmission and their distribution in vivo, in neuronal and non-neuronal cells. 

      Weaknesses: 

      No weaknesses were noted. The authors do a great job linking their characterizations with other studies and techniques, giving credence to their findings. As the authors note, there are sexually dimorphic differences across animals and varying expression patterns of enzymes. While it is unlikely there will be huge differences in the reported patterns across individual animals, it is possible that these expression patterns could vary developmentally, or based on physiological or environmental conditions. It is unclear from the study how many animals were imaged for each condition, and if the authors noted changes across individuals during development (could be further acknowledged in the discussion?)  

      We have updated the Methods section to specify the number of animals used for imaging. We agree with the reviewer that documenting the developmental dynamics of neurotransmitter expression would be interesting. However, except for one gene (tph-1, Fig. S2), we did not analyze the expression during different developmental stages for most genes in this study. Following the reviewer's suggestion, we have included this as a potential future direction in "Conclusions" at the end of the revised manuscript.

      Recommendations for the authors:

      After the consultation session, a common suggestion from the reviewers is to bring the tables more upfront, perhaps even in the form of legible main Figures and in alphabetical order of neurons; since we believe that the study will be in the long-term often used for these data; while the Figures with fluorescent expression patterns could be moved to the supplemental information. 

      We appreciate the reviewers' and editor's acknowledgment of the tables' possibly frequent usage by the field. We have considered carefully how to order the data presentation. We prefer to keep most of the fluorescent figures in the main text because they convey important subtleties that we want the reader to be aware of.

      To address the suggestions to bring key data more upfront, we have added an entirely new figure (Figure 3) before the ensuing data figures that summarized expression patterns of the fluorescent reporters. This new figure (A) summarizes the neurotransmitter use for all neuron classes and (B) illustrates this information within worm schematics, showing the position of neurons in the whole worm. This figure serves as a good overview of neurotransmitter assignments but also specifically refers to the more extensive data and supplementary tables with detailed notes. We believe this solution effectively balances the need for comprehensive information and ease of reference.

      Reviewer #1 (Recommendations for The Authors):

      Suggestions: 

      (1) The study contains up to 10 Figures with gene expression patterns; however, I believe the community will use this paper mostly in the future for its summarizing tables. I wonder if it would be more useful to edit the tables and move them to the main figures while most fluorescent reporter images could be moved to the supplementary part. 

      Yes, as mentioned above, we made new summary table & schematic upfront. We do prefer to keep primary data in main figure body. Please see above (Public Review & Response).

      (2) In the section titled 'Neurotransmitter Synthesis versus Uptake', the author's wording could be more careful. The data rather suggests functions for individual neuronal classes, such as clearance neurons or signaling neurons. However, these functions remain hypotheses until further detailed studies are conducted to test them. 

      These are fair points. We have made several improvements: 

      (1) In the referenced section, we added a sentence at the end of the paragraph on betaine to suggest the importance of future functional studies.

      (2) We analyzed reporter allele expression for two additional genes: the known uptake transporter for 5-HT (mod-5, reporter allele vlc47) and the predicted uptake transporter for tyramine (oct-1, reporter allele syb8870). The results from these experiments are presented in the new Figure 8 and discussed in Results and Discussion correspondingly. We also collaborated with Curtis Loer, who conducted anti-5-HT staining in wild type and mod-5 mutant animals (results shown in Figure 12). These experiments have enhanced our understanding of 5-HT uptake mechanisms and potential tyramine uptake mechanisms.

      (3) At the end of the Conclusions, we emphasized the need for future detailed studies to test the functions of neurotransmitter synthesis and uptake.

      (3) Page 21; add to the discussion: neurons could use mainly electrical synapses for communication. Especially for RMG neurons, this might be the case (in addition to neuropeptide communication). 

      “Main usage” is a difficult term to use. If there were neurons that are clearly devoid of any form of synaptic vesicle (small or DCV; note that RMG has plenty of DCVs), but show robust and reproducible electrical synapses, we would agree that such neurons could primarily be a “coupling” neuron. But this call is very hard to make for any C. elegans neuron (RMG included) and hence we prefer to not add further to an already quite long Discussion section.

      (4) Page 23: I believe that multi-copy promoter-based transgenes (despite array suppression mechanisms) could be potentially more sensitive than single-copy insertion of fluorescent reporters. In our lab, we observed this a couple of times. This could be discussed. 

      We discuss this in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      We have also added a third possibility (i.e. technical issues related to neuron-ID) in the revised manuscript.   

      Reviewer #2 (Recommendations For The Authors): 

      Comment during consultation session: As for my feedback on the lack of an SNF-11 reporter strain, exercising more caution in their conclusions would suffice for me. Other comments are simple edits/discussion.  

      Please see above.  

      Several neurotransmitter symporters exist in the C. elegans genome, does any express specifically in the "orphan" UNC-47+ neurons? 

      Yes, good point, we considered this possibility, but of the >10 SLC6-family of neurotransmitter reporters, only the classic, de-orphanized ones that we discuss here in the paper show robust scRNA signals (as discussed in the paper) and none of those give clues about the orphan unc-47(+) neurons.

      Based on UNC-47+ expression the article suggests a "Novel inhibitory neurotransmitter". Why would any new neurotransmitter using UNC-47 be necessarily inhibitory? The presence of one potential glycine-gated anion channel and one GPCR in C. elegans genome sounds poor evidence to suggest a sign of glycine or b-alanine transmission. 

      Yes, agreed, it does not need to be inhibitory. Fixed in Results and Discussion. 

      To help readers the expression of the knocked in GFP in neurons should not be reported as binary in table S1 which leads to a feeling of strong discrepancy between scRNA seq and CRISPR GFP, which is not the case.  

      There might be some misunderstanding regarding the coloring in this table. To clarify, the green-filled Excel cells denote the expression of reporters utilized in prior studies, rather than the CRISPR reporter alleles. Expression of the CRISPR alleles is instead indicated on the left side of the neuron names, marked as "CRISPR+" in green font. For signifying absence of expression, we used "no CRISPR" in red font in the first submission. We have now changed it into "CRISPR-" for greater clarity.

      The variable expression of reporter GFP between individuals for the same neuron is intriguing. It is unclear if this is observed only for dim neurons or can be more of an ON/OFF expression. 

      Variability only occurs for dim expression. We have now clarified this point in Discussion, "Comparing approaches and caveats of expression pattern analysis".

      The multiple occurrences of co-transmission, especially in male neurons, are interesting. It will be interesting in the future to establish whether the neurotransmitters are synaptically segregated or coreleased. As the section on sexual dimorphism of neurotransmitter usage does not discuss novel information coming from this study, it is not very necessary. 

      Agreed. We added this perspective to the Discussion, "Co-transmission of multiple neurotransmitters".  

      In the abstract, dopamine is missing in the main known transmitter.  

      Fixed. Thanks for spotting this.

      Reviewer #3 (Recommendations For The Authors): 

      Great article. Minor suggestions to strengthen presentation: 

      Figure 1B is hard to interpret. There could be more intuitive ways of representing the data and the methodologies that support a given expression pattern. Neurons should also be reordered by alphabetical order rather than expression levels to facilitate finding them.  

      We considered alternative ways of presenting this data, but, regrettably, did not come up with a better approach. To clarify, the primary focus of Fig. 1B is to compare expression of previously reported reporters and scRNA data, which was quite literally the initial impetus for our analysis, i.e. we noted strong scRNA signals that had not previously been supported by transgenic reporter data. For a comprehensive version of the table that includes more details on the expression of CRISPR reporter alleles, please refer to Table S1, which we referenced in the figure legend.   

      GFP-only channel images in Figures 3, 4, 5, and 9 sometimes show dim signals that the authors are highlighting as new findings. We recommend using the inverted grayscale version of that channel since the contrast of dim signals is more noticeable to the human eye rather than when the image is colorized. 

      Good point, we implemented these suggestions in the figures the reviewer mentioned, now re-numbered Figures 4, 5, 6, and 12. For Figure 6 (tph-1, bas-1, and cat-1 expression in hermaphrodites), we used a new cat-1 head image to reflect the newly identified ASI and AVL expression that wasn’t readily visible in the original projection used in the earlier version of this manuscript. We also added grayscale images in Figure 13 to reflect dim tbh-1 expression in IL2 neurons more clearly.

      A plan to integrate this new information into WormAtlas. The C. elegans community is characterized by the open sharing of information on platforms that are user-friendly and accessible. Ideally, the new information would not just 'erase' what was observed before but will describe the new observations and will let the community reach their own conclusions since there is no perfect method and even these CRISPR/Cas9 reporter strains are only proxy for gene expression that subject to post-transcriptional regulation since they depend on T2A and SL2 sequences. 

      We completely agree with the reviewer’s suggestion. We will coordinate with WormAtlas on integrating this new information. 

      In the case of neurons that were removed from using a specific neurotransmitter, like PVQ. What do the authors conclude overall, if it does not use glutamate, are there any new hypotheses to what it could be using?

      Since all neurons express multiple neuropeptides, we hypothesize neurons such as PVQ may be primarily peptidergic. This is included in Discussion, "Neurons devoid of canonical neurotransmitter pathway genes may define neuropeptide-only neurons".  

      In Table S5, the I4 neuron is listed as a variable for eat-4 expression but in Table S1 it says that there was no CRISPR expression detected. Which one is correct? 

      Thanks for spotting this. Table S5 is correct, we saw very dim and variable expression of the eat-4 reporter allele in I4. Table S1 is fixed now.

      Additional discussion points that might be important for the community: 

      CRIPSR strains used here should be deposited in the CGC. 

      Yes, all strains generated in this study have already been deposited to CGC. 

      It would be great to have an additional discussion point on how the neural clusters in CenGEN were defined based on the fosmid reporter expression, so in a way using the defining factor as one that was already defined by it might make results confusing. 

      Neural cluster definition in CeNGEN did not rely on isolated data points but on the combination of many expression reagents, each with its own shortcomings, but in combination providing reliable identification. Since one feedback we have gotten from many readers of our manuscript is that it is already very long as is, we prefer not to dilute the discussion further.

      It would be important to discuss the rate of neurotransmitter genes that have variable expression patterns. Are any of those genes used in NeuroPAL to define specific neuronal classes? This is important to describe as NeuroPAL labeling is being used to define neuronal identity. 

      All the reporters used in NeuroPAL are promoter-based, very robust and do not include the full loci of genes, so they are not directly comparable with the CRISPR reporter alleles in this study. However, we recognize that some expression pattern variability could be confusing. We have discussed this more in the section "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

    1. eLife assessment

      The study presents compelling evidence that the melanocortin system originating in the arcuate nucleus of the hypothalamus plays a crucial role in puberty onset, representing a significant advance in our understanding of reproductive biology. The work, which represents a fundamental advance, employs innovative approaches and benefits from the combined expertise of two respected laboratories, enhancing the robustness of the findings. Given the potential impact on human health and the strength of the evidence presented, this work will likely influence the field substantially and may inform future clinical applications.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that proopiomelanocortin (POMC) neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      • innovative and novel
      • technically sound
      • well-designed
      • thorough

      Weaknesses:

      There were no major weaknesses identified.

    3. Reviewer #2 (Public Review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R (melanocortin 4 receptor) signalling on kisspeptin (Kiss1) neurons. The initial question is a good one. Infertility in human MC4R mutations has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether MC4R directly regulates the hypothalamic-pituitary-gonadal (HPG) axis has not been thoroughly examined. Here, the researchers assembled an elegant combination of loss and gain of function in vivo experiments, specifically targeting MC4R expression in Kiss1 neurons. This is an excellent experimental design and one that should provide compelling evidence for whether there is a direct role for melanocortin signalling in arcuate Kiss1 neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). Still, the magnitude of these responses and the overall effect on fertility were relatively minor, as mice lacking MC4R in Kiss1 neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in Kiss1 neurons in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different Kiss1 neuronal populations. The study provides some novel insights into how direct actions of melanocortin signalling via the MC4R in Kiss1 neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss and gain of function experiments provide a conceptually simple but hugely informative experimental design, which is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic Kiss1 neurons and not necessarily simply caused as a consequence of obesity. Validation experiments for these studies are needed, given their great prominence in the manuscript, because these are critical to interpretation.

      Weaknesses:

      (1) Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      (2) The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      (3) Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

    4. Reviewer #3 (Public Review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in the field of neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not completely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it's hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at the age of puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system are in line with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

    5. Author response:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

    1. eLife assessment

      This work introduces a Python package, Avian Vocalization Analysis (AVN) that provides several key analysis pipelines for segmentation, annotation, and visualization of zebra finch song. AVN can be used to predict the stage of song development, quantify acoustic similarity, and detect abnormalities associated with deprived auditory feedback or social isolation. The methods are solid and are likely to provide a useful tool for scientists aiming to automate the analysis of large datasets of zebra finch vocalizations.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation.

      Strengths:

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs.

      Weaknesses:

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature).

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths:

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses:

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows.

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-and-maximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%.

      Strengths:

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable.

      Weaknesses:

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here?

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training? Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method.

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets. 

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure. 

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. 

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song. 

      We thank the reviewer for this suggestion, and plan to include a comparison of the triplet loss embedding space to the VAE space for song similarity comparisons in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field. 

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding. 

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver. 

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods. 

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.  

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches: 

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions.

      (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods. 

      We recognize the similarities between these approaches, and plan to include a comparison of triplet loss embeddings compared with MMD and VAE embeddings compared with MMD and EMD in the revised manuscript. Thank you for this suggestion.  

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability. 

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term. 

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field. 

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies. 

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs. 

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. 

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We will revise the original manuscript to make this clearer. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      There appears to be some misunderstanding regarding our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and provide some additional explanation in the manuscript. First, we are not training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128-dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of twodimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a self-supervised learning task, as it does require syllable labels to generate the triplets. A common self-supervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we plan to include a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript.  

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, there appears to be some misunderstanding of our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful lowdimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN). 

      We did compare multiple methods for syllable segmentation (WhisperSeg,  TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.  

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird. 

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in testing. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1seg scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and nonstationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was ever any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see supplemental figure 2b), but still very high precision scores (supplemental figure 2a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Fig 3c) or syllable duration entropy (supplemental figure 7a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) and be willing to dedicate the time and resources to manually labeling a subset of recordings from each of their birds, we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.  

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well. 

      We appreciate the reviewer’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings share with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets.  

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data. 

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology and is outside the scope of our current efforts.  

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method. 

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human annotations for each individual bird being analyzed using AVN was never the goal of our pipeline, would require significant changes to AVN’s design, and is outside the scope of this manuscript.  

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one. 

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#SyllableRepetitions

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy. 

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We will expand our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (supplementary figure 2). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments nonvocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label, but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another. 

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and supplemental figure 4b&e. We will also expand our discussion of these different types of errors in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy. 

      We apologize for not making this distinction sufficiently clear in the manuscript and will add additional explanation to the main text to make the reasoning more apparent. We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space.  

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate. 

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. Anecdotally, we observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, but we will add an additional supplementary figure to the revised manuscript showing this.

    1. eLife assessment

      Zhou et al. introduce cascading neural activations, known as 'replay', into a context-maintenance and retrieval model (CMR) that has been previously used to capture a range of memory phenomena. The proposed 'CMR-replay' model outperforms its CMR predecessor in a compelling way, and thus, the work makes important strides towards understanding the empirical memory literature as well as some of the cognitive functions of replay. Notable limitations include the scope of the model with respect to established aspects of memory consolidation, such as the stages and physiology of sleep, and the lack of integration with highly relevant associative and deep learning theories.

    2. Reviewer #1 (Public Review):

      Summary:

      Zhou and colleagues developed a computational model of replay that heavily builds on cognitive models of memory in context (e.g., the context-maintenance and retrieval model), which have been successfully used to explain memory phenomena in the past. Their model produces results that mirror previous empirical findings in rodents and offers a new computational framework for thinking about replay.

      Strengths:

      The model is compelling and seems to explain a number of findings from the rodent literature. It is commendable that the authors implement commonly used algorithms from wakefulness to model sleep/rest, thereby linking wake and sleep phenomena in a parsimonious way. Additionally, the manuscript's comprehensive perspective on replay, bridging humans and non-human animals, enhanced its theoretical contribution.

      Weaknesses:

      This reviewer is not a computational neuroscientist by training, so some comments may stem from misunderstandings. I hope the authors would see those instances as opportunities to clarify their findings for broader audiences.

      (1) The model predicts that temporally close items will be co-reactivated, yet evidence from humans suggests that temporal context doesn't guide sleep benefits (instead, semantic connections seem to be of more importance; Liu and Ranganath 2021, Schechtman et al 2023). Could these findings be reconciled with the model or is this a limitation of the current framework?

      (2) During replay, the model is set so that the next reactivated item is sampled without replacement (i.e., the model cannot get "stuck" on a single item). I'm not sure what the biological backing behind this is and why the brain can't reactivate the same item consistently. Furthermore, I'm afraid that such a rule may artificially generate sequential reactivation of items regardless of wake training. Could the authors explain this better or show that this isn't the case?

      (3) If I understand correctly, there are two ways in which novelty (i.e., less exposure) is accounted for in the model. The first and more talked about is the suppression mechanism (lines 639-646). The second is a change in learning rates (lines 593-595). It's unclear to me why both procedures are needed, how they differ, and whether these are two different mechanisms that the model implements. Also, since the authors controlled the extent to which each item was experienced during wakefulness, it's not entirely clear to me which of the simulations manipulated novelty on an individual item level, as described in lines 593-595 (if any).

      As to the first mechanism - experience-based suppression - I find it challenging to think of a biological mechanism that would achieve this and is selectively activated immediately before sleep (somehow anticipating its onset). In fact, the prominent synaptic homeostasis hypothesis suggests that such suppression, at least on a synaptic level, is exactly what sleep itself does (i.e., prune or weaken synapses that were enhanced due to learning during the day). This begs the question of whether certain sleep stages (or ultradian cycles) may be involved in pruning, whereas others leverage its results for reactivation (e.g., a sequential hypothesis; Rasch & Born, 2013). That could be a compelling synthesis of this literature. Regardless of whether the authors agree, I believe that this point is a major caveat to the current model. It is addressed in the discussion, but perhaps it would be beneficial to explicitly state to what extent the results rely on the assumption of a pre-sleep suppression mechanism.

      (4) As the manuscript mentions, the only difference between sleep and wake in the model is the initial conditions (a0). This is an obvious simplification, especially given the last author's recent models discussing the very different roles of REM vs NREM. Could the authors suggest how different sleep stages may relate to the model or how it could be developed to interact with other successful models such as the ones the last author has developed (e.g., C-HORSE)? Finally, I wonder how the model would explain findings (including the authors') showing a preference for reactivation of weaker memories. The literature seems to suggest that it isn't just a matter of novelty or exposure, but encoding strength. Can the model explain this? Or would it require additional assumptions or some mechanism for selective endogenous reactivation during sleep and rest?

      (5) Lines 186-200 - Perhaps I'm misunderstanding, but wouldn't it be trivial that an external cue at the end-item of Figure 7a would result in backward replay, simply because there is no potential for forward replay for sequences starting at the last item (there simply aren't any subsequent items)? The opposite is true, of course, for the first-item replay, which can't go backward. More generally, my understanding of the literature on forward vs backward replay is that neither is linked to the rodent's location. Both commonly happen at a resting station that is further away from the track. It seems as though the model's result may not hold if replay occurs away from the track (i.e. if a0 would be equal for both pre- and post-run).

      (6) The manuscript describes a study by Bendor & Wilson (2012) and tightly mimics their results. However, notably, that study did not find triggered replay immediately following sound presentation, but rather a general bias toward reactivation of the cued sequence over longer stretches of time. In other words, it seems that the model's results don't fully mirror the empirical results. One idea that came to mind is that perhaps it is the R/L context - not the first R/L item - that is cued in this study. This is in line with other TMR studies showing what may be seen as contextual reactivation. If the authors think that such a simulation may better mirror the empirical results, I encourage them to try. If not, however, this limitation should be discussed.

      (7) There is some discussion about replay's benefit to memory. One point of interest could be whether this benefit changes between wake and sleep. Relatedly, it would be interesting to see whether the proportion of forward replay, backward replay, or both correlated with memory benefits. I encourage the authors to extend the section on the function of replay and explore these questions.

      (8) Replay has been mostly studied in rodents, with few exceptions, whereas CMR and similar models have mostly been used in humans. Although replay is considered a good model of episodic memory, it is still limited due to limited findings of sequential replay in humans and its reliance on very structured and inherently autocorrelated items (i.e., place fields). I'm wondering if the authors could speak to the implications of those limitations on the generalizability of their model. Relatedly, I wonder if the model could or does lead to generalization to some extent in a way that would align with the complementary learning systems framework.

    3. Reviewer #2 (Public Review):

      This manuscript proposes a model of replay that focuses on the relation between an item and its context, without considering the value of the item. The model simulates awake learning, awake replay, and sleep replay, and demonstrates parallels between memory phenomenon driven by encoding strength, replay of sequence learning, and activation of nearest neighbor to infer causality. There is some discussion of the importance of suppression/inhibition to reduce activation of only dominant memories to be replayed, potentially boosting memories that are weakly encoded. Very nice replications of several key replay findings including the effect of reward and remote replay, demonstrating the equally salient cue of context for offline memory consolidation.

      I have no suggestions for the main body of the study, including methods and simulations, as the work is comprehensive, transparent, and well-described. However, I would like to understand how the CMRreplay model fits with the current understanding of the importance of excitation vs inhibition, remembering vs forgetting, activation vs deactivation, strengthening vs elimination of synapses, and even NREM vs REM as Schapiro has modeled. There seems to be a strong association with the efforts of the model to instantiate a memory as well as how that reinstantiation changes across time. But that is not all this is to consolidation. The specific roles of different brain states and how they might change replay is also an important consideration.

      Do the authors suggest that these replay systems are more universal to offline processes beyond episodic memory? What about procedural memories and working memory?

      Though this is not a biophysical model per se, can the authors speak to the neuromodulatory milieus that give rise to the different types of replay?

    4. Reviewer #3 (Public Review):

      In this manuscript, Zhou et al. present a computational model of memory replay. Their model (CMR-replay) draws from temporal context models of human memory (e.g., TCM, CMR) and claims replay may be another instance of a context-guided memory process. During awake learning, CMR replay (like its predecessors) encodes items alongside a drifting mental context that maintains a recency-weighted history of recently encoded contexts/items. In this way, the presently encoded item becomes associated with other recently learned items via their shared context representation - giving rise to typical effects in recall such as primacy, recency, and contiguity. Unlike its predecessors, CMR-replay has built-in replay periods. These replay periods are designed to approximate sleep or wakeful quiescence, in which an item is spontaneously reactivated, causing a subsequent cascade of item-context reactivations that further update the model's item-context associations.

      Using this model of replay, Zhou et al. were able to reproduce a variety of empirical findings in the replay literature: e.g., greater forward replay at the beginning of a track and more backward replay at the end; more replay for rewarded events; the occurrence of remote replay; reduced replay for repeated items, etc. Furthermore, the model diverges considerably (in implementation and predictions) from other prominent models of replay that, instead, emphasize replay as a way of predicting value from a reinforcement learning framing (i.e., EVB, expected value backup).

      Overall, I found the manuscript clear and easy to follow, despite not being a computational modeller myself. (Which is pretty commendable, I'd say). The model also was effective at capturing several important empirical results from the replay literature while relying on a concise set of mechanisms - which will have implications for subsequent theory-building in the field.

      With respect to weaknesses, additional details for some of the methods and results would help the readers better evaluate the data presented here (e.g., explicitly defining how the various 'proportion of replay' DVs were calculated).

      For example, for many of the simulations, the y-axis scale differs from the empirical data despite using comparable units, like the proportion of replay events (e.g., Figures 1B and C). Presumably, this was done to emphasize the similarity between the empirical and model data. But, as a reader, I often found myself doing the mental manipulation myself anyway to better evaluate how the model compared to the empirical data. Please consider using comparable y-axis ranges across empirical and simulated data wherever possible.

      In a similar vein to the above point, while the DVs in the simulations/empirical data made intuitive sense, I wasn't always sure precisely how they were calculated. Consider the "proportion of replay" in Figure 1A. In the Methods (perhaps under Task Simulations), it should specify exactly how this proportion was calculated (e.g., proportions of all replay events, both forwards and backwards, combining across all simulations from Pre- and Post-run rest periods). In many of the examples, the proportions seem to possibly sum to 1 (e.g., Figure 1A), but in other cases, this doesn't seem to be true (e.g., Figure 3A). More clarity here is critical to help readers evaluate these data. Furthermore, sometimes the labels themselves are not the most informative. For example, in Figure 1A, the y-axis is "Proportion of replay" and in 1C it is the "Proportion of events". I presumed those were the same thing - the proportion of replay events - but it would be best if the axis labels were consistent across figures in this manuscript when they reflect the same DV.

    5. Reviewer #4 (Public Review):

      Summary:

      With their 'CMR-replay' model, Zhou et al. demonstrate that the use of spontaneous neural cascades in a context-maintenance and retrieval (CMR) model significantly expands the range of captured memory phenomena.

      Strengths:

      The proposed model compellingly outperforms its CMR predecessor and, thus, makes important strides towards understanding the empirical memory literature, as well as highlighting a cognitive function of replay.

      Weaknesses:

      Competing accounts of replay are acknowledged but there are no formal comparisons and only CMR-replay predictions are visualized. Indeed, other than the CMR model, only one alternative account is given serious consideration: A variant of the 'Dyna-replay' architecture, originally developed in the machine learning literature (Sutton, 1990; Moore & Atkeson, 1993) and modified by Mattar et al (2018) such that previously experienced event-sequences get replayed based on their relevance to future gain. Mattar et al acknowledged that a realistic Dyna-replay mechanism would require a learned representation of transitions between perceptual and motor events, i.e., a 'cognitive map'. While Zhou et al. note that the CMR-replay model might provide such a complementary mechanism, they emphasize that their account captures replay characteristics that Dyna-replay does not (though it is unclear to what extent the reverse is also true).

      Another important consideration, however, is how CMR replay compares to alternative mechanistic accounts of cognitive maps. For example, Recurrent Neural Networks are adept at detecting spatial and temporal dependencies in sequential input; these networks are being increasingly used to capture psychological and neuroscientific data (e.g., Zhang et al, 2020; Spoerer et al, 2020), including hippocampal replay specifically (Haga & Fukai, 2018). Another relevant framework is provided by Associative Learning Theory, in which bidirectional associations between static and transient stimulus elements are commonly used to explain contextual and cue-based phenomena, including associative retrieval of absent events (McLaren et al, 1989; Harris, 2006; Kokkola et al, 2019). Without proper integration with these modeling approaches, it is difficult to gauge the innovation and significance of CMR-replay, particularly since the model is applied post hoc to the relatively narrow domain of rodent maze navigation.

    1. eLife assessment

      This study presents valuable findings with practical and theoretical implications for drug discovery, particularly in the context of repurposing CIP for the treatment of Babesia spp. The evidence is convincing overall, as the data and analyses support the main claims. However, a few assertions are only partially substantiated. If the authors can strengthen these areas with additional evidence, the paper could attract greater interest from scientists in drug discovery, computational biology, and microbiology.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors address an important issue in Babesia research by repurposing Cipargamin (CIP) as a potential therapeutic against selective Babesia spp. In this study, CIP demonstrated potent in vitro inhibition of B. bovis and B. gibsoni with IC50 values of 20.2 {plus minus} 1.4 nM and 69.4 {plus minus} 2.2 nM, respectively, and the in vivo efficacy against Babesia spp using mouse model. The authors identified two key resistance mutations in the BgATP4 gene (BgATP4L921I and BgATP4L921V) and explored their implications through phenotypic characterization of the parasite using cell biological experiments, complemented by in silico analysis. Overall, the findings are promising and could significantly advance Babesia treatment strategies.

      Strengths:

      In this manuscript, the authors effectively repurpose Cipargamin (CIP) as a potential treatment for Babesia spp. They provide compelling in vitro and in vivo data showing strong efficacy. Key resistance mutations in the BgATP4 gene are identified and analyzed through both phenotypic and in silico methods, offering valuable insights for advancing treatment strategies.

      Weaknesses:

      The manuscript explores important aspects of drug repurposing and rational drug design using Cipargamin (CIP) against Babesia. However, several weaknesses should be addressed. The study lacks novelty as similar research on Cipargamin has been conducted, and the experimental design could be improved. The rationale for choosing CIP over other ATP4-targeting compounds is not well-explained. Validation of mutations relies heavily on in silico predictions without sufficient experimental support. The Ion Transport Assay has limitations and would benefit from additional assays like Radiolabeled Ion Flux and Electrophysiological Assays. Also, the study lacks appropriate control drugs and detailed functional characterization. Further clarity on mutation percentages, additional safety testing, and exploration of cross-resistance would strengthen the findings.

      (1) It is commendable to explore drug repurposing, drug deprescribing, drug repositioning, and rational drug design, especially using established ATP4 inhibitors that are well-studied in Plasmodium and other protozoan parasites. While the study provides some interesting findings, it appears to lack novelty, as similar investigations of Cipargamin on other protozoan parasites have been conducted. The study does not introduce new concepts, and the experimental design could benefit from refinement to strengthen the results. Additionally, the rationale for choosing CIP over other MMV compounds targeting ATP4 is not clearly articulated. Clarifying the specific advantages CIP may offer against Babesia would be beneficial. Finally, the validation of the identified mutations might be strengthened by additional experimental support, as reliance on in silico predictions alone may not fully address the functional impact, particularly given the potential ambiguity of the mutations (BgATP4 L to V and I).

      (2) Conducting an Ion Transport Assay is useful but has limitations. Non-specific binding or transport by other cellular components can lead to inaccurate results, causing false positives or negatives and making data interpretation difficult. Indirect measurements, like changes in fluorescence or electrical potential, can introduce artifacts. To improve accuracy, consider additional assays such as<br /> a. Radiolabeled Ion Flux Assay: tracks the movement of Na^+ using radiolabeled ions, providing direct evidence of ion transport.<br /> b. Electrophysiological Assay: measures ionic currents in real-time with patch-clamp techniques, offering detailed information about ATP4 activity.

      (3) In-silico predictions can provide plausible outcomes, but it is essential to evaluate how the recombinant purified protein and ligand interact and function at physiological levels. This aspect is currently missing and should be included. For example, incorporating immunoprecipitation and ATPase activity assays with both wild-type and mutant proteins, as well as detailed kinetic studies with Cipargamin, would be recommended to validate the findings of the study.

      (4) The study lacks specific suitable control drugs tested both in vitro and in vivo. For accurate drug assessment, especially when evaluating drugs based on a specific phenotype, such as enlarged parasites, it is important to use ATP4 gene-specific inhibitors. Including similar classes of drugs, such as Aminopyrazoles, Dihydroisoquinolines, Pyrazoleamides, Pantothenamides, Imidazolopiperazines (e.g., GNF179), and Bicyclic Azetidine Compounds, would provide more comprehensive validation.

      (5) Functional characterization of CIP through microscopic examination and quantification for assessing parasite size enlargement is not entirely reliable. A Flow Cytometry-Based Assay is recommended instead 9 along with suitable control antiparasitic drugs). To effectively monitor Cipargamin's action, conducting time-course experiments with 6-hour intervals is advisable rather than relying solely on endpoint measurements. Additionally, for accurate assessment of parasite morphology, obtaining representative qualitative images using Scanning Electron Microscopy (SEM) or Transmission Electron Microscopy (TEM) for treated versus untreated samples is recommended for precise measurements.

      (6) A notable contradiction observed is that mutant cells displayed reduced efficacy and affinity but more pronounced phenotypic effects. The BgATP4L921I mutation shows a 2x lower susceptibility (IC50 of 887.9 {plus minus} 61.97 nM) and a predicted binding affinity of -6.26 kcal/mol with CIP. However, the phenotype exhibits significantly lower Na+ concentration in BgATP4L921I (P = 0.0087) (Figure 3E).

      (7) The manuscript does not clarify the percentage of mutations, and the number of sequence iterations performed on the ATP4 gene. It is also unclear whether clonal selection was carried out on the resistant population. If mutations are not present in 100% of the resistant parasites, please indicate the ratio of wild-type to mutant parasites and represent this information in the figure, along with the chromatograms.

      (8) While the compound's toxicity data is well-established, it is advisable to include additional testing in epithelial cells and liver-specific cell lines (e.g., HeLa, HCT, HepG2) if feasible for the authors. This would provide a more comprehensive assessment of the compound's safety profile.

      (9) In the in vivo efficacy study, recrudescent parasites emerged after 8 days of treatment. Did these parasites harbor the same mutation in the ATP4 gene? The authors did not investigate this aspect, which is crucial for understanding the basis of recrudescence.

      (10) The authors should explain their choice of Balb/c mice for evaluating CIP efficacy, as these mice clear the infection and may not fully represent the compound's effectiveness. Investigating CIP efficacy in SCID mice would be valuable, as they provide a more reliable model and eliminate the influence of the immune system. The rationale for not using SCID mice should be clarified.

      (11) Do the in vitro-resistant parasites show any potential for cross-resistance with commonly used antiparasitic drugs? Have the authors considered this possibility, and what are their expectations regarding cross-resistance?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have tried to repurpose cipargamin (CIP), a known drug against plasmodium and toxoplasma against babesia. They proved the efficacy of CIP on babesia in the nanomolar range. In silico analyses revealed the drug resistance mechanism through a single amino acid mutation at amino acid position 921 on the ATP4 gene of babesia. Overall, the conclusions drawn by the authors are well justified by their data. I believe this study opens up a novel therapeutic strategy against babesiosis.

      Strengths:

      The authors have carried out a comprehensive study. All the experiments performed were carried out methodically and logically.

      Weaknesses:

      The introduction section needs to be more informative. The authors are investigating the binding of CIP to the ATP4 gene, but they did not give any information about the gene or how the ATP4 inhibitors work in general.

      The resolution of the figures is not good and the font size is too small to read properly.

      I also have several minor concerns which have been addressed in the "Recommendations for the authors" section.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to establish that cipargamin can be used for the treatment of infection caused by Babesia organisms.

      Strengths:

      The study provides strong evidence that cipargamin is effective against various Babesia species. In vitro, growth assays were used to establish that cipargamin is effective against Babesia bovis and Babesia gibsoni. Infection of mice with Babesia microti demonstrated that cipargamin is as effective as the combination of atovaquone plus azithromycin. Cipargamin protected mice from lethal infection with Babesia rodhaini. Mutations that confer resistance to cipargamin were identified in the gene encoding ATP4, a P-type Na ATPase that was found in other apicomplexan parasites, thereby validating ATP4 as the target of cipargamin.

      Weaknesses:

      Cipargamin was tested in vivo at a single dose administered daily for 7 days. Despite the prospect of using cipargamin for the treatment of human babesiosis, there was no attempt to identify the lowest dose of cipagarmin that protects mice from Babesia microti infection. Exposure to cipargamin can induce resistance, indicating that cipargamin should not be used alone but in combination with other drugs. There was no attempt at testing cipargamin in combination with other drugs, particularly atovaquone, in the mouse model of Babesia microti infection. Given the difficulty in treating immunocompromised patients infected with Babesia microti, it would have been informative to test cipargamin in a mouse model of severe immunosuppression (SCID or rag-deficient mice).

    1. eLife assessment

      This important study provides solid mechanistic and modeling data suggesting that the polar localization of MinCD in Bacillus subtilis is largely due to differences in diffusion rates between monomeric and dimeric MinD. This finding is exciting as it negates the necessity for a third, localization determinant, in this system as has been previously proposed. The work is generally strong but is incomplete without some additional quantitative analysis, as well as clarification of the underlying assumptions and details used for the modeling experiments.

    2. Reviewer #1 (Public review):

      The authors used fluorescence microscopy, image analysis, and mathematical modeling to study the effects of membrane affinity and diffusion rates of MinD monomer and dimer states on MinD gradient formation in B. subtilis. To test these effects, the authors experimentally examined MinD mutants that lock the protein in specific states, including Apo monomer (K16A), ATP-bound monomer (G12V), and ATP-bound dimer (D40A, hydrolysis defective), and compared to wild-type MinD. Overall, the experimental results support the conclusion that reversible membrane binding of MinD is critical for the formation of the MinD gradient, but that the binding affinities between monomers and dimers are similar.

      The modeling part is a new attempt to use the Monte Carlo method to test the conditions for the formation of the MinD gradient in B. subtilis. The modeling results provide good support for the observations and find that the MinD gradient is sensitive to different diffusion rates between monomers and dimers. This simulation is based on several assumptions and predictions, which raises new questions that need to be addressed experimentally in the future. However, the current story is sufficient without testing these assumptions or predictions.

    3. Reviewer #2 (Public review):

      Summary:

      Bohorquez et al. investigate the molecular determinants of intracellular gradient formation in the B. subtilis Min system. To this end, they generate B. subtilis strains that express MinD mutants that are locked in the monomeric or dimeric states, and also MinD mutants with amphipathic helices of varying membrane affinity. They then assess the mutants' ability to bind to the membrane and form gradients using fluorescence microscopy in different genetic backgrounds. They find that, unlike in the E. coli Min system, the monomeric form of MinD is already capable of membrane binding. They also show that MinJ is not required for MinD membrane binding and only interacts with the dimeric form of MinD. Using kinetic Monte Carlo simulations, the authors then test different models for gradient formation, and find that a MinD gradient along the cell axis is only formed when the polarly localized protein MinJ stimulates dimerization of MinD, and when the diffusion rate of monomeric and dimeric MinD differs. They also show that differences in the membrane affinity of MinD monomers and dimers are not required for gradient formation.

      Strengths:

      The paper offers a comprehensive collection of the subcellular localization and gradient formation of various MinD mutants in different genetic backgrounds. In particular, the comparison of the localization of these mutants in a delta MinC and MinJ background offers valuable additional insights. For example, they find that only dimeric MinD can interact with MinJ. They also provide evidence that MinD locked in a dimer state may co-polymerize with MinC, resulting in a speckled appearance.

      The authors introduce and verify a useful measure of membrane affinity in vivo.

      The modulation of the membrane affinity by using distinct amphipathic helices highlights the robustness of the B. subtilis MinD system, which can form gradients even when the membrane affinity of MinD is increased or decreased.

      Weaknesses:

      The main claim of the paper, that differences in the membrane affinity between MinD monomers and dimers are not required for gradient formation, does not seem to be supported by the data. The only measure of membrane affinity presented is extracted from the transverse fluorescence intensity profile of cells expressing the mGFP-tagged MinD mutants. The authors measure the valley-to-peak ratio of the profile, which is lower than 1 for proteins binding to the membrane and higher than 1 for cytosolic proteins. To verify this measure of membrane affinity, they use a membrane dye and a soluble GFP, which results in values of ~0.75 and ~1.25, respectively. They then show that all MinD mutants have a value - roughly in the range of 0.8-0.9 - and they use this to claim that there are no differences in membrane affinity between monomeric and dimeric versions.

      While this way to measure membrane affinity is useful to distinguish between binders and non-binders, it is unclear how sensitive this assay is, and whether it can resolve more subtle differences in membrane affinity, beyond the classification into binders and non-binders. A dimer with two amphipathic helices should have a higher membrane affinity than a monomer with only one such copy. Thus, the data does not seem to support the claim that "the different monomeric mutants have the same membrane affinity as the wildtype MinD". The data only supports the claim that B. subtilis MinD monomers already have a measurable membrane affinity, which is indeed a difference from the E. coli Min system.

      While their data does show that a stark difference between monomer and dimer membrane affinity may not be required for gradient formation in the B. subtilis case, it is also not prevented if the monomer is unable to bind to the membrane. They show this by replacing the native MinD amphipathic helix with the weak amphipathic helix NS4AB-AH. According to their membrane affinity assay, NS4AB-AH does not bind to the membrane as a monomer (Figure 4D), but when this helix is fused to MinD, MinD is still capable of forming a gradient (albeit a weaker one). Since the authors make a direct comparison to the E. coli MinDE systems, they could have used the E. coli MinD MTS instead or in addition to the NS4AB-AH amphipathic helix. The reviewer suspects that a fusion of the E. coli MinD MTS to B. subtilis MinD may also support gradient formation.

      The paper contains insufficient data to support the many claims about cell filamentation and minicell formation. In many cases, statements like "did not result in cell filamentation" or "restored cell division" are only supported by a single fluorescence image instead of a quantitative analysis of cell length distribution and minicell frequency, as the one reported for a subset of the data in Figure 5.

      The paper would also benefit from a quantitative measure of gradient formation of the distinct MinD mutants, instead of relying on individual fluorescent intensity profiles.

      The authors compare their experimental results with the oscillating E. coli MinDE system and use it to define some of the rules of their Monte Carlo simulation. However, the description of the E. coli Min system is sometimes misleading or based on outdated findings.

      The Monte Carlo simulation of the gradient formation in B. subtilis could benefit from a more comprehensive approach:

      (1) While most of the initial rules underlying the simulation are well justified, the authors do not implement or test two key conditions:<br /> (a) Cooperative membrane binding, which is a key component of mathematical models for the oscillating E. coli Min system. This cooperative membrane binding has recently been attributed to MinD or MinCD oligomerization on the membrane and has been experimentally observed in various instances; in fact, the authors themselves show data supporting the formation of MinCD copolymers.

      (2) Local stimulation of the ATPase activity of MinD which triggers the dimer-to-monomer transition; E. coli MinD ATP hydrolysis is stimulated by the membrane and by MinE, so B. subtilis MinD may also be stimulated by the membrane and/or other components like MinJ. Instead, the authors claim that (a) would only increase differences in diffusion between the monomer and different oligomeric species, and that a 2-fold increase in dimerization on the membrane could not induce gradient formation in their simulation, in the absence of MinJ stimulating gradient formation. However, a 2-fold increase in dimerization is likely way too low to explain any cooperative membrane binding observed for the E. coli Min system. Regarding (b), they also claim that implementing stimulation of ATP hydrolysis on the membrane (dimer-to-monomer transition) would not change the outcome, but no simulation result for this condition is actually shown.

      (3) To generate any gradient formation, the authors claim that they would need to implement stimulation of dimer formation by MinJ, but they themselves acknowledge the lack of any experimental evidence for this assertion. They then test all other conditions (e.g., differences in membrane affinity, diffusion, etc.) in addition to the requirement that MinJ stimulates dimer formation. It is unclear whether the authors tested all other conditions independently of the "MinJ induces dimerization" condition, and whether either of those alone or in combination could also lead to gradient formation. This would be an important test to establish the validity of their claims.

    4. Reviewer #3 (Public review):

      This important study by Bohorquez et al examines the determinants necessary for concentrating the spatial modulator of cell division, MinD, at the future site of division and the cell poles. Proper localization of MinD is necessary to bring the division inhibitor, MinC, in proximity to the cell membrane and cell poles where it prevents aberrant assembly of the division machinery. In contrast to E. coli, in which MinD oscillates from pole to pole courtesy of a third protein MinE, how MinD localization is achieved in B. subtilis - which does not encode a MinE analog - has remained largely a mystery. The authors present compelling data indicating that MinD dimerization is dispensable for membrane localization but required for concentration at the cell poles. Dimerization is also important for interactions between MinD and MinC, leading to the formation of large protein complexes. Computational modeling, specifically a Monte Carlo simulation, supports a model in which differences in diffusion rates between MinD monomers and dimers lead to the concentration of MinD at cell poles. Once there, interaction with MinC increases the size of the complex, further reinforcing diffusion differences. Notably, interactions with MinJ-which has previously been implicated in MinCD localization, are dispensable for concentrating MinD at cell poles although MinJ may help stabilize the MinCD complex at those locations.

    1. eLife assessment

      This work provides solid evidence that Transforming Growth Factor β Activated Kinase 1 (TAK1) regulates esophageal squamous cell carcinoma (ESCC) tumor proliferation and metastasis. The findings are valuable to the field of molecular tumor biology in general and to the understanding of ESCC tumor invasiveness and metastatic potential.

    2. Reviewer #1 (Public Review):

      Summary:

      In previously published work, the authors found that Transforming Growth Factor β Activated Kinase 1 (TAK1) may regulate esophageal squamous cell carcinoma (ESCC) tumor cell proliferation via the RAS/MEK/ERK axis. They explore the mechanisms for TAK1 as a possible tumor suppressor, demonstrating phospholipase C epsilon 1 as an effector of tumor cell migration, invasion and metastatic potential.

      They explore the mechanisms for TAK1 as a possible tumor suppressor, demonstrating phospholipase C epsilon 1 as an effector of tumor cell migration, invasion and metastatic potential.

      Strengths:

      The authors show in vitro that TAK1 overexpression reduces tumor cell migration and invasion while TAK1 knockdown promotes a mesenchymal phenotype (epithelial-mesenchymal transition) and enhances migration and invasion. To explore possible mechanisms of action, the authors focused on phospholipase C epsilon 1 (PLCE1) as a potential effector, having identified this protein in co-immunoprecipitation experiments. Further, they demonstrate that TAK1-mediated phosphorylation of PLCE1 is inhibitory. Each of the observations is supported by different experimental strategies, e.g. use of different approaches for knockdown (pharmacologic, RNA inhibition, CRISPR/Cas). Xenograft experiments showed that suppression/loss of TAK1 is associated with more frequent metastases and conversely that PLCE1 is associated positively with xenograft metastases. A considerable amount of experimental data is presented for review, including supplemental data, that show that TAK1 regulation may be important in ESCC development.

      Weaknesses:

      As noted by the authors, immunoprecipitation (IP) experiments identified a number (24) of proteins as potential targets for the TAK1 ser/thr kinase. Prior work (cited as Shi et al, 2021) focused on a different phosphorylation target for TAK1, Ras association domain family 9 (RASSF9), but a more comprehensive discussion of the co-IP experiments would help place this work in better context.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, Ju Q et al performed both in vitro and in vivo experiments to test the effect of TAK1 on cancer metastasis. They demonstrated that TAK1 is capable of directly phosphorylating PLCE1 and this modification represses its enzyme activity, leading to suppression of PIP2 hydrolysis and subsequently signal transduction in the PKC/GSK-3β/β-Catenin axis.

      Strengths:

      The quality of data is good, and the presentation is well organized in a logical way.

      Weaknesses:

      The study missed some key link in connecting the effect of TAK1 on cancer metastasis via phosphorylating PLCE1.

    4. Reviewer #3 (Public Review):

      Summary:

      The research by Qianqian Ju et al. found that the knockdown of TAK1 promoted ESCC migration and invasion, whereas overexpression of TAK1 resulted in the opposite outcome. These in vitro findings could be recapitulated in a xenograft metastasis mouse model.<br /> Mechanistically, TAK1 phosphorylates PLCE1 S1060 in the cells, decreasing PLCE1 enzyme activity and repressing PIP2 hydrolysis. As a result, reducing DAG and inositol IP3, thereby suppressing signal transduction of PKC/GSK 3β/β Catenin. Consequently, cancer metastasis-related genes were impeded by TAK1.<br /> Overall, this study offers some intriguing observations. Providing a potential druggable target for developing agents for dealing with ESCC.

      The strengths of this research are:<br /> (1) The research uses different experimental approaches to address one question. The experiments are largely convincing and appear to be well executed.<br /> (2) The phenotypes were observed from different angles: at the mouse model, cellular level, and molecular level.<br /> (3) The molecular mechanism was down to a single amino acid modification on PLCE1.

      The weaknesses part of this research are:

      Most of the experiments were done in protein overexpression conditions, with the protein level increasing hundreds of folds in the cell, producing an artificial environment that would sometimes generate false positive results.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      We would like to see the reviewers' critiques be addressed satisfactorily.

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript reads fairly well, there are a number of minor grammatical edits that would improve the reading of this paper.

      To improve the reading, we sent our manuscript out for language polishing using Wiley Editing Services. The changes were labeled in Red color.

      The opening paragraph, while seeking to establish clinical relevance, likely can be removed or tailored.

      We agreed with this concern, the first paragraph was tailored in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Although the authors provided a substantial amount of data to support the conclusion, there are several important issues to be added to strengthen the study, as highlighted below:

      Figure 2: In this figure, the authors provided evidence that TAK1 phosphorylates PLCE1 at serine 1060. To make the data more convincing, the authors need to perform an in vitro kinase assay to confirm this result. Ideally, the in vitro kinase assay also includes a mutant form of PLCE1-S1060A as a control.

      Thank the referee for this constructive comment. Since we cannot perform experiments with radioactive compounds in our institute, therefore the phosphorylation of PLCE1 at serine 1060 induced by TAK1 cannot be further confirmed by a routine in vitro kinase, in which 32P was used. Instead, we performed TAK1 and PLCE1 pulldown, and incubated these two proteins in a kinase assay buffer. The resulting samples were analyzed by western blot. Our data showed that TAK1 phosphorylates PLCE1 at serine 1060, as evidenced by a strong band for p-PLCE1 S1060 when TAK1 incubated with PLCE1. For the sample contained TAK1 and PLCE1 S1060A, the band density for p-PLCE1 S1060 was largely decreased. Ideally, there should be no band for p-PLCE1 S1060 when TAK1 incubated with PLCE1 S1060A. However, our current data detected p-PLCE1 S1060 in this reaction, although it was decreased as compared to wild type PLCE1. The reason for this is likely due to the presence of endogenous wild type PLCE1 in the TAK1 pull-down samples. These data were presented as Figure S6C in the revised manuscript.

      Figure 4: In this part of the study, the author claimed that TAK1 inhibits PLCE1 enzyme activity. However, they fall short of evidence that this inhibitory effect of TAK1 on PLCE1 enzyme activity is mediated via phosphorylation at S1060.

      Thank the referee for this critical comment. Actually, we measured the effect of TAK1 on mutate PLCE1 activity, which was presented in Figure 4B. The data showed that TAK1 has no inhibitory effect on PLCE1 S1060A enzyme activity. In contract, TAK1 repressed wild type PLCE1 activity (Figure 4A). These data indicate that, at least in part, the inhibitory effect of TAK1 on PLCE1 enzyme activity is mediated via phosphorylation at S1060.

      Figures 6 and 7: Here the authors used ESCC metastasis model in nude mice to establish the role of TAK1 and PLCE1, respectively. However, the effects of TAK1 and PLCE1 are studied separately, and there no link to show that TAK1 inhibits metastasis via activation of PLCE1. Ideally the authors should use the transgenic mice with expression of mutant PLCE1-S1060A to support the conclusion.

      We agreed with this notion that the transgenic mice with expression of mutant PLCE1-S1060A will further strengthen our conclusions. However, due to limited time and resource, we cannot generate such genetic mice. Thank the referee for this insightful and critical comment.

      Reviewer #3 (Recommendations For The Authors):

      (1) Have the authors ever checked the phosphorylation status of endogenous PLCE1 S1060p in the TAK1 overexpression alone ECA-109 cell line? Does it increase? Similarly, in siMap3k7 ECA-109 cells, does endogenous PLCE1 S1060p reduce?

      Thank the referee for these critical comments. During the revision, we examined whether TAK1 overexpression or knockdown affects endogenous p-PLCE1 S1060 in ECA-109 cells. Our data showed that TAK1 overexpression induced an increase in p-PLCE1 S1060, whereas TAK1 knockdown resulted in a decrease in p-PLCE1 S1060. These data were presented in Figure S6A, B.

      (2) The authors show that using TAK1 inhibitors cannot completely abolish all the phosphorylation of PLCE1 S1060 in cells and mice. Does it mean some other potential kinases also target PLCE1 S1060?

      Thank the referee for this insightful comment. As mentioned by the referee, TAK1 inhibitors cannot completely abolish all the phosphorylation of PLCE1 S1060 in cells and mice. Therefore, it is likely that some other potential kinases also target PLCE1 S1060, we added this notion in the Discussion in the revised manuscript.

      (3) PLCE1 S1060A completely bans the migration and invasion regulation function of TAK1 (Figure S10), indicating that PLCE1 S1060 is a very unique downstream target of TAK1 in migration and invasion regulation in the ECA-109 cell line. As a MAP3K, TAK1 was documented to regulate migration and invasion through multiple signal transduction pathways such as IKK, JNK, p38 MAPK, et al. Have the authors ever tried to test the effect of overexpression/knockdown of TAK1 on a few of these pathways in the ECA-109 cell line?

      Thank the referee for these constructive comments. During the revision, we analyzed the effects of TAK1 on IKK, JNK, p38 MAPK, and ERK. Our data showed that TAK1 positively regulates these signal transduction pathways. For example, TAK1 overexpression increased p-IKK, p-JNK, p-P38 MAPK, and p-ERK in ECA-109 cells, whereas TAK1 knockdown decreased these protein levels. Although these pathways are affected by TAK1, with respect to cell migration and invasion, PLCE1 is likely a unique substrate of TAK1 in migration and invasion regulation in ECA-109 cells. We added these contents in the Results section in revised manuscript, and these data were presented in Figure S12A-D.

      (4) Does TAK1 only catalyze the S1060 site on PLCE1 protein?

      Thank the referee for this insightful comment. Currently, we just found TAK1 catalyze the S1060 site on PLCE1 protein, which cannot exclude the possibility that TAK1 also phosphorylates other residues on PLCE1 protein.

      (5) Is there any PLCE1 S1060 point mutation existing in ESCC patients? Does it influence the prognosis of ESCC patients?

      Thank the referee for this critical and constructive comment, which would further strengthen the significance of current study. However, we are facing a shortage of enough patient tumor samples for addressing this very important issue.

      (6) What's the effect of TAK1 inhibitor on mice body weight?

      Thank the referee for this critical comment. Since body weight is an important parameter, we measured mouse body weight during the whole experiments. The results showed that the body weight growth rate is not affected by TAK1 inhibitor, Takinib. These data were included in the revised manuscript as Figure S20A.

      (7) For the control groups of the mouse xenograft tumor model in Figures 6 vs 7, why does the number of metastases behave so differently?

      In Figure 6, the control mice were administered with ECA-109 cells via tail vein injection, mice were then treated with vehicle (saline). As for the control mice in Figure 7, they were administered with ECA-109 cells via tail vein injection. It should be mentioned that these cells were transduced with control lentivirus. Due to these differences, therefore, these two control mice have different number of metastases.

    1. eLife assessment

      This valuable study characterized a new set of small molecules targeting the interaction between ELF3-MED23, with one of the reported compounds representing a promising novel therapeutic strategy, The evidence supporting the conclusions is convincing. This article will be of interest to medical and cell biologists working on cancer and, particularly, on HER2-overexpression cancers.

    2. Reviewer #1 (Public review):

      Summary:

      Soo-Yeon Hwang et al. synthesized and characterized a new set of Chalcone- and Pyrazoline-derived molecules targeting the interaction between ELF3, a transcription factor, and MED23, a coactivator for HER2 transcription. The authors employed biochemical analysis, cell-based assays, and an in vivo xenograft model to demonstrate that the lead compound, Compound 10, inhibits HER2 transcription and protein expression, subsequently inducing anticancer activity in gastric cancer models, particularly in trastuzumab-resistant cell lines. The obtained data is robust and supports the potential anticancer efficacy of Compound 10 for HER2+ gastric cancer.

      Strengths:

      The current manuscript proposes an alternative strategy for targeting HER2-overexpressing cancers by reducing HER2 transcription levels. The study presents compelling evidence that the lead compound, Compound 10, disrupts the binding of ELF3 to MED23, thereby inhibiting HER2 transcription. Notably, cell-based assays and xenograft models demonstrated the compound's significant antitumor activity in gastric cancer models.

    3. Reviewer #2 (Public review):

      Summary:

      The findings highlight the importance of targeting the ELF3-MED23 protein-protein interaction (PPI) as a potential therapeutic strategy for HER2-overexpressing cancers, notably gastric cancers, as an alternative to trastuzumab. The evidence, including the strong potency of compound 10 in inhibiting ELF3-MED23 PPI, its capacity to lower HER2 levels, induce apoptosis, and impede proliferation both in laboratory settings and animal models, indicates that compound 10 holds promise as a novel therapeutic option, even for cases resistant to trastuzumab treatment.

      Strengths:

      The experiments conducted are robust and diverse enough to address the hypothesis posed.

    4. Reviewer #3 (Public review):

      Summary:

      The authors synthesized a compound which can inhibit ELF3 and MED23 interaction which leads to inhibition of HER2 expression in gastric cancer.

      Strengths:

      Enough evidence shows the potency of compound 10 in inhibiting ELF3 and MED23 interaction.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for your thorough evaluation. We are pleased that you find our mouse model for acute retinal artery occlusion to be sophisticated and clinically relevant. Your recognition of the model’s utility in studying the cellular and molecular mechanisms of RAO, as well as its potential for advancing therapeutic research, is highly encouraging and underscores the significance of our work. We are grateful for your supportive feedback.

      Public Reviews:

      Reviewer #1:

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block blood supply to the mouse inner retina, which mimic clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      The revised manuscript has been significantly improved in clarity and readability. It has addressed all my questions convincingly.

      Thank you for your positive feedback. We are pleased to hear that the revisions have significantly improved the manuscript's clarity and readability, and that we have convincingly addressed all your questions. Your encouraging words are of great importance to us.

      Reviewer #2 (Public Review):

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes of major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach for studying retinal artery occlusion. The study is very comprehensive.

      Thank you for your positive feedback. We are delighted that you find the UPOAO model to be a novel and comprehensive approach to studying retinal artery occlusion. Your recognition of the depth and significance of our study is highly valuable and encourages us in our ongoing research.

      Weaknesses:

      Originally, some statements were incorrect and confusing. However, the authors have made clarifications in the revised manuscript to avoid confusion.

      We sincerely appreciate your meticulous review of the manuscript. We have thoroughly addressed the inaccuracies identified in the revised version. Additionally, we have polished the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely. We appreciate your careful attention to detail, and your patience and meticulous suggestions have significantly improved the clarity and readability of our manuscript.


      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1:

      The revised manuscript has been significantly improved in clarity and readability. It has addressed all my questions convincingly.

      Thank you for your positive feedback. We are pleased to hear that the revisions have significantly improved the manuscript's clarity and readability, and that we have convincingly addressed all your questions. Your encouraging words are of great importance to us.

      Reviewer #2:

      The authors have revised the manuscript and/or provided answers to the majority of prior comments, which have helped to strengthen the work. However, addressing the following concerns is still necessary to further improve the manuscript.

      Thank you for acknowledging our revisions and the improvements made to the manuscript. We appreciate your continued feedback and will address the remaining concerns to further enhance the quality of our work.

      The quantification method of RGCs is described in detail in the response letter, but this detailed methodology was not included in the revised manuscript to clarify the quantification process.

      Thank you for your helpful recommendations. We have added detailed methodology in the revised manuscript to clarify the quantification process (line 180-188).

      The graphs in Fig. 3D b-wave and Fig. 3E-b wave are duplicated.

      We apologize for the error in our figures. We have corrected the mistake by replacing the duplicated image in Fig. 3E-b wave with the correct one (line 880). Your careful observation has been very helpful in improving our manuscript. Thank you for bringing this to our attention.

      The quantifications of the thickness of retinal layers in HE-stained sections in Figure 4 (IPL) and Response Figure 2 are incorrect. For mice retina, the thickness of the IPL is approximately 50 µm.

      Thank you for your meticulous review of the manuscript. We have rectified the inaccuracies in the quantification of retinal layer thickness in HE-stained sections in Figure 4, addressing the initial issue with the scale bar.

      We consulted with a microscope engineer and used a microscope microscale to calibrate the scale of the fluorescence microscope (BX63; Olympus, Tokyo, Japan) at the suggestion of the engineer.

      We recount the thickness of all layers of the HE-stained retinal section (line 902). The inner retina thickness in Figure 4 has been adjusted under a new scale bar, and the thickness of the outer retinal layers is now displayed in

      Author response image 1. However, the IPL thickness of the sham eye in the UPOAO model is still not aligned with the common thickness of 50 µm. Therefore we review the literature within our laboratory, focusing on C57BL/6 mice from the same source, revealed that the inner retina thickness (GCC+INL) in the HE-stained sections of the sham eye in the UPOAO model (around 80 µm) is consistent with previous findings (see Author response image 2) conducted by Kaibao Ji and published in Experimental Eye Research in 2021 [1].

      We captured and analyzed the average retinal thickness of each layer over a long range of 200-1100 μm from the optic nerve head (see Author response image 3, highlighted by the green line). The field region has been corrected in the revised manuscript (line 232). Considering the significant variation in retinal thickness from the optic nerve to the periphery, we consulted literature on multi-point measurements of HE-stained retinas. The average thickness of the GCC layer in the control group was approximately 57 µm at 600 µm from the optic nerve head and about 48 µm at 1200 µm from the optic nerve head in the literature [2] (see Author response image 4). The GCC layer thickness of the sham eye in the UPOAO model is around 50 µm, in alignment with existing literature. In future studies, we will pay more attention to the issue of thickness averaging.

      We appreciate your thorough review and valuable feedback, which has enabled us to correct errors and enhance the accuracy of our research.

      Author response image 1.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      Author response image 2.

      Cited from Ji, K., et al., Resveratrol attenuates retinal ganglion cell loss in a mouse model of retinal ischemia reperfusion injury via multiple pathways. Experimental Eye Research, 2021. 209: p. 108683.

      Author response image 3.

      Schematic diagram illustrating the selection of regions. The figure was captured using a fluorescence microscope (BX63; Olympus, Tokyo, Japan) under a 4X objective. Scale bar=500 µm.

      Author response image 4.

      Cited from Feng, L., et al., Ripa-56 protects retinal ganglion cells in glutamate-induced retinal excitotoxic model of glaucoma. Sci Rep, 2024. 14(1): p. 3834.

      There are some typos in the summary table. For example: 'Amplitudes of a-wave (0.3, 2.0, and 10.0 cd.s/m²)' should be 'Amplitudes of a-wave (0.3, 3.0, and 10.0 cd.s/m²)'; and 'IINL thickness' in HE' should be 'INL thickness'.

      Thank you for pointing out the typos in the summary table (line 1073). We have corrected 'Amplitudes of a-wave (0.3, 2.0, and 10.0 cd.s/m²)' to 'Amplitudes of a-wave (0.3, 3.0, and 10.0 cd.s/m²)' and 'IINL thickness' to 'INL thickness'. Your attention to detail is greatly appreciated and has been very helpful in improving our manuscript.

      References

      (1) Ji, K., et al., Resveratrol attenuates retinal ganglion cell loss in a mouse model of retinal ischemia reperfusion injury via multiple pathways. Experimental Eye Research, 2021. 209: p. 108683.

      (2) Feng, L., et al., Ripa-56 protects retinal ganglion cells in glutamate-induced retinal excitotoxic model of glaucoma. Sci Rep, 2024. 14(1): p. 3834.

    2. eLife assessment

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

    3. Reviewer #1 (Public Review):

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block blood supply to the mouse inner retina, which mimic clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      Weaknesses:

      The revised manuscript has been significantly improved in clarity and readability. It has addressed all my questions convincingly.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes of major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach for studying retinal artery occlusion. The study is very comprehensive.

      Weaknesses:

      Originally, some statements were incorrect and confusing. However, the authors have made clarifications in the revised manuscript to avoid confusion.

    1. eLife assessment

      This study presents an important computational tool for the quantification of the cellular composition of human tissues profiled with ATAC-seq. The methodology and its application results on breast cancer tumor tissues are convincing. It advances existing methods by utilizing a comprehensive reference profile for major cancer-relevant cell types, compatible with a widely-used cell type deconvolution tool.

    2. Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 12 cell types, encompassing immune cells, endothelial cells, and fibroblasts. Then, they coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk ATAC-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and results in robust estimates. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      In the benchmarking analysis, EPIC-ATAC was compared also to deconvolution methods that were originally developed for transcriptomics and not for ATAC-seq data. However, the authors described in detail the specific settings used to analyze this different data modality as robustly as possible, and they discussed possible limitations and ideas for future improvement.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections and have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools. This tool will be valuable to the community it serves.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I praise the authors for their impressive work; all my major concerns have been addressed. I believe the revised article is much stronger and will surely raise the interest of a broad readership.

      I list in the following a few minor points that the authors might want to consider when finalizing the work:

      - It might be helpful for the reader to know if EPIC-ATAC can also be used on tissues different from tumors and PBMC/blood, and how (i.e. which reference should they use). 

      We thank the reviewer for this comment. In the discussion, we have clarified this point as follows:

      “Although not tested in this work, the TME marker peaks and profiles could be used on normal tissues where immune cells are expected to be present. In cases where specific cell types are expected in a sample but are not part of our list of reference profiles (e.g., neuronal cells in brain tumors or tissues other than human PBMCs or tumor samples), custom marker peaks and reference profiles can be provided to EPIC-ATAC to perform cell-type deconvolution. To this end, users should select markers that are cell-type specific, which could be identified using pairwise differential analysis performed on ATAC-Seq data from sorted cells from the populations of interest, following the approach developed in this work (Figure 1, see Code availability).”

      - In Fig 2 the numbers are hard to read as they are too close or overlapping.We have updated Figure 2 to avoid the overlap between the numbers.

      - In Fig 5 I see some squared around the sub-panels, but it might be due to the PDF compression. 

      We do not see these squares on the Figure 5 but have seen such squares on Figure 1. We have checked that all the PDF files uploaded on the eLife submission system do not contain the previously mentioned squares.

      - In the Introduction, some "deconvolution concepts" are introduced (e.g. Line 63-65), but not explained/illustrated. It might be helpful to refer to a "didactic" review. 

      We have added two references to these sentences in the introduction:

      “As described in more details elsewhere (Avila Cobos et al., 2018; Sturm et al., 2019), many of these tools model bulk data as a mixture of reference profiles either coming from purified cell populations or inferred from single-cell genomic data for each cell type.”

    1. eLife assessment

      This important study uses state-of-the-art, multi-region two-photon calcium imaging to characterize the statistics of functional connectivity between visual cortical neurons. While the evidence supporting the conclusions is solid, alternative interpretations of the results cannot be ruled out due to the limitations of calcium imaging, the use of noise correlations as a measure of functional connectivity and putative confounds of behavioural state modulations.

    2. Reviewer #1 (Public review):

      Summary:

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons, and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Strengths:

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. To control for potential influences of behaviour-related top-down modulation of noise correlations, the manuscript uses measurements of pupil dynamics as a proxy for behavioural state and shows that this top-down modulation cannot explain the stability of noise correlations across stimuli.

      Weaknesses:

      The interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicate the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

    3. Reviewer #2 (Public review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimuli. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap, behavioral state). The paper demonstrates the robustness of the activity clustering analysis and of the activity correlation measurements. The paper shows convincingly that the correlation structure observed with grating stimuli is present in the responses to naturalistic stimuli. A simple simulation is provided that suggest that recurrent connectivity is required for the stimulus invariance of the results. The paper is well written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. A methodological issue that does not seem completely addressed is whether the calcium imaging measurements with their limited sensitivity amplify the apparent dependence of noise correlations on the similarity of tuning. Although the paper shows that noise correlation measurements are robust to changes in firing rates / missing spikes, the effects of receptive field tuning dissimilarity are not addressed directly. The calcium responses of mouse visual cortical neurons are sharply tuned. Neurons with dissimilar receptive fields may show too little overlap in their estimated firing rates to infer noise correlations, which could lead to underestimation of correlations across groups of dissimilar neurons.

    4. Reviewer #3 (Public review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons in 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.<br /> NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neurons pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights on the correlation structure of visual responses across multiple areas.

      Strengths:

      The measurements of shared variability across multiple areas are novel. The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are one of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory evoked responses (Niell et al , Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al , Neuron 2015 for a similar point).

      In the new version of the manuscript, behavioral modulations are explicitly considered in Figure S8. New analyses show that most of the variance of the neuronal responses is driven by the stimulus, rather than by behavioural variable. However, they new analyses still do not address if the shared noise correlation in cotuned neurons is also independent of behavioral modulations .

      As behavioral modulations are not considered this confound affects the conclusions and the conclusion that activity in communicated unmixed across areas ( results in Figure 4), as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain the results without the need of discrete broadcasting channels or any particular network architecture and should be addressed to support the main claims.

      (2) Discrete vs continuous communication channels<br /> (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels, as stated in teh title of the paper. This discreteness is based on an unbiased clustering approach on the tuning of neurons, followed by a manual grouping into six categories with relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

    5. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. eLife assessment

      This important study provides convincing evidence that both psychiatric dimensions (e.g. anhedonia, apathy, or depression) and chronotype (i.e., being a morning or evening person) influence effort-based decision-making. This is of importance to researchers and clinicians alike, who may make inferences about behaviour and cognition without taking into account whether the individual may be tested or observed out-of-sync with their phenotype. The current study can serve as a starting point for more targeted investigation of the relationship between chronotype, altered decision making and psychiatric illness.

    2. Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular, apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affect decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

    3. Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy, anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. The biggest drawback is that it does not provide evidence for the idea that a match between chronotype and delay matters is especially relevant for people with depression or continuous measures like anhedonia and apathy. It is unclear whether disorders further interact with chronotype and time of day to determine a bias against effort. On the other hand, the study does provide evidence that future studies should consider such interactions when examining questions about effort expenditure in psychiatric disorders.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria.

      Weaknesses:

      The authors do not explore how chronotype and depression are related (does one mediate the effect of the other etc). Both variables are included in the same model in the revised article now which is a great improvement, but it also means psychopathology and circadian rhythms are treated as distinct phenomena and their relationship in predicting effort-reward preferences is not examined.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular, apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affect decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

      Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy, anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. The biggest drawback is that it does not provide evidence for the idea that a match between chronotype and delay matters is especially relevant for people with depression or continuous measures like anhedonia and apathy. It is unclear whether disorders further interact with chronotype and time of day to determine a bias against effort. On the other hand, the study does provide evidence that future studies should consider such interactions when examining questions about effort expenditure in psychiatric disorders.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria.

      Weaknesses:

      The authors do not explore how chronotype and depression are related (does one mediate the effect of the other etc). Both variables are included in the same model in the revised article now which is a great improvement, but it also means psychopathology and circadian rhythms are treated as distinct phenomena and their relationship in predicting effort-reward preferences is not examined.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Two points in response to changes the authors made:

      (1) "motivational tendency" is in our opinion not an improved phrase over "choice bias". A paper by Jon Roiser calls it "overall bias to accept effortful challenges" (but that's maybe too long?)

      We thank the reviewer for their suggestion of renaming our computational parameter and agree it would be of value to introduce and label this parameter in line with other work, improving consistency across the literature. Hence, we have updated our manuscript and now introduce the parameter as bias to accept effortful challenges for reward and refer to the parameter as acceptance bias thereafter.

      We have updated this nomenclature throughout the manuscript text, figures and supplement.

      (2) The new title "Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making" sounds slightly causal (as would be the case in a longitudinal or intervention study). Maybe instead the authors could use "are associated with" or similar?

      We agree with the reviewers that our current title could be interpreted in a causal manner. We have updated our title to now read A common alteration in effort-based decision-making in apathy, anhedonia, and late circadian rhythm.

    1. eLife assessment

      This important work provides insights into the neural mechanisms regulating specific parental behaviors. By identifying a key role for oxytocin synthesizing cells in the paraventricular nucleus of the hypothalamus and their projections to the medial prefrontal cortex in promoting pup care and inhibiting infanticide, this study advances our understanding of the neurobiological basis of these contrasting behaviors in male and female mandarin voles. The evidence supporting the authors' conclusions is solid, and this work should be of interest to researchers studying neuropeptide control of social behaviors in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      This important study investigated the role of oxytocin (OT) neurons in the paraventricular nucleus (PVN) and their projections to the medial prefrontal cortex (mPFC) in regulating pup care and infanticide behaviors in mandarin voles. The researchers used techniques like immunofluorescence, optogenetics, OT sensors, and peripheral OT administration. Activating OT neurons in the PVN reduced the time it took pup-caring male voles to approach and retrieve pups, facilitating pup care behavior. However, this activation had no effect on females. Interestingly, this same PVN OT neuron activation also reduced the time for both male and female infanticidal voles to approach and attack pups, suggesting PVN OT neuron activity can promote pup care while inhibiting infanticide behavior. Inhibition of these neurons promoted infanticide. Stimulating PVN->mPFC OT projections facilitated pup care in males and in infanticide prone voles, activation of these terminals prolonged latency to approach and attack. Inhibition of PVN->mPFC OT projections promoted infanticide. Peripheral OT administration increased pup care in males and reduced infanticide in both sexes. However, some results differed in females, suggesting other mechanisms may regulate female pup care.

      Strengths:

      This multi-faceted approach provides converging evidence and strengthens the conclusions drawn from the study and make them very convincing. Additionally, the study examines both pup care and infanticide behaviors, offering insights into the mechanisms underlying these contrasting behaviors. The inclusion of both male and female voles allows for the exploration of potential sex differences in the regulation of pup-directed behaviors. The peripheral OT administration experiments also provide valuable information for potential clinical applications and wildlife management strategies.

      Weaknesses:

      While the study presents exciting findings, there are several weaknesses. The sample sizes used in some experiments, such as the Fos study and optogenetic manipulations, appear to be small, which may limit the statistical power and generalizability of the results.

      There is potential effect of manipulating OT neurons on the release of other neurotransmitters (or the influence of other neurochemicals or brain regions) on pup-directed behaviors, especially in females, are not fully explored. Additionally, it is unclear whether back-propagation of action potentials during optogenetic manipulations causes the same behavioral effect as direct stimulation of PVN OT cells. However, the authors now discuss these possibilities. It is also uncertain whether more OT neurons were manipulated in females compared to males. All other comments have been addressed by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      This series of experiments studied the involvement of PVN OT neurons and their projection to the mPFC in pup-care and attack behavior in virgin male and female Mandarin voles. Using Fos visualization, optogenetics, fiber photometry and IP injection of OT the results converge on OT regulating caregiving and attacks on pups. Some sex differences were found in the effects of the manipulations.

      Strengths:

      Major strengths are the modern multi-method approach and including both sexes of Mandarin vole in every experiment.

      Weaknesses:

      The few weaknesses include 1) Some experiments' groups have small sample sizes (4-5 animals) which may render some results difficult for others to replicate when different extraneous variables are likely to be present, and 2) the authors discuss PVN OT cell stimulation findings seen in other rodents so the work seems less conceptually novel. Overall, the findings add to the knowledge about OT regulation of pup-directed behavior in male and female rodents, especially the PVN-mPFC OT

    4. Reviewer #3 (Public review):

      Summary:

      Here Li et al. examine pup-directed behavior in virgin Mandarin voles. Some males and females tend towards infanticide, others tend towards pup care. c-Fos staining showed more oxytocin cells activated in the paraventricular nucleus (PVN) of the hypothalamus in animals expressing pup care behaviors than in infanticidal animals. Optogenetic stimulation of PVN oxytocin neurons (with an oxytocin-specific virus to express the opsin transgene) increased pup-care, or in infanticidal voles increased latency towards approach and attack. Suppressing activity of PVN oxytocin neurons promoted infanticide. Use of a recent oxytocin GRAB sensor (OT1.0) showed changes in medial prefrontal cortex (mPFC) signals as measured with photometry in both sexes. Activating mPFC oxytocin projections increased latency to approach and attack in infanticidal females and males (similar to the effects of peripheral oxytocin injections), whereas in pup caring animals only males showed a decrease to approach. Inhibiting these projections increased infanticidal behaviors in both females and males, and no effect on pup caretaking.

      Strengths:

      Adopting these methods for Mandarin voles is an impressive accomplishment, especially the valuable data provided by the oxytocin GRAB sensor. This is a major achievement and helps promote systems neuroscience in voles.

      The authors have done a good job responding to the comments on their preprint. I'd ask them to check their z-scored values, as the mean of a z-scored value should be 0 over time. Also I'm not sure I agree that the fiber photometry system "can automatically exclude effects of motion artifacts"; yes that is a function of imaging at a different wavelength but that process is also prone to error and imperfect.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of oxytocin (OT) neurons in the paraventricular nucleus (PVN) and their projections to the medial prefrontal cortex (mPFC) in regulating pup care and infanticide behaviors in mandarin voles. The researchers used techniques like immunofluorescence, optogenetics, OT sensors, and peripheral OT administration. Activating OT neurons in the PVN reduced the time it took pup-caring male voles to approach and retrieve pups, facilitating pup-care behavior. However, this activation had no effect on females. Interestingly, this same PVN OT neuron activation also reduced the time for both male and female infanticidal voles to approach and attack pups, suggesting PVN OT neuron activity can promote pup care while inhibiting infanticide behavior. Inhibition of these neurons promoted infanticide. Stimulating PVN->mPFC OT projections facilitated pup care in males and in infanticide-prone voles, activation of these terminals prolonged latency to approach and attack. Inhibition of PVN->mPFC OT projections promoted infanticide. Peripheral OT administration increased pup care in males and reduced infanticide in both sexes. However, some results differed in females, suggesting other mechanisms may regulate female pup care.

      Strengths:

      This multi-faceted approach provides converging evidence, strengthens the conclusions drawn from the study, and makes them very convincing. Additionally, the study examines both pup care and infanticide behaviors, offering insights into the mechanisms underlying these contrasting behaviors. The inclusion of both male and female voles allows for the exploration of potential sex differences in the regulation of pup-directed behaviors. The peripheral OT administration experiments also provide valuable information for potential clinical applications and wildlife management strategies.

      Weaknesses:

      While the study presents exciting findings, there are several weaknesses that should be addressed. The sample sizes used in some experiments, such as the Fos study and optogenetic manipulations, appear to be small, which may limit the statistical power and generalizability of the results. Effect sizes are not reported, making it difficult to evaluate the practical significance of the findings. The imaging parameters and analysis details for the Fos study are not clearly described, hindering the interpretation of these results (i.e., was the entire PVN counted?). Also, does the Fos colocalization align with previous studies that look at PVN Fos and maternal/ paternal care? Additionally, the study lacks electrophysiological data to support the optogenetic findings, which could provide insights into the neural mechanisms underlying the observed behaviors. 

      In some previous studies (He et al., 2019; Mei, Yan, Yin, Sullivan, & Lin, 2023), the sample size in morphological studies is also small and may be representative. We agree with reviewer’s opinion that results from larger sample size may be more statistically powerful and generalizable. We will pay attention to this issue in the future study. As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio. We have added the objective magnification used in the figure legend. The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, and Fos, OT and merged positive neurons were counted. Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly, Hiura, Saunders, & Ophir, 2017). To support the optogenetic findings, we used c-Fos expression as a marker of neuron activity and revealed significant increases/decreases of c-Fos positive neurons induced by optogenetic activation/inhibition (Supplementary Data Fig. 1), and additionally we found that optogenetic inhibition of OT neurons reduced levels of OT release using OT1.0 sensors. Based on these two experiments, we verified that optogenetic manipulation in the present study is validate and results of optogenetic experiment are reliable (Supplementary Data Fig. 5).

      The study has several limitations that warrant further discussion. Firstly, the potential effects of manipulating OT neurons on the release of other neurotransmitters (or the influence of other neurochemicals or brain regions) on pup-directed behaviors, especially in females, are not fully explored. Additionally, it is unclear whether back-propagation of action potentials during optogenetic manipulations causes the same behavioral effect as direct stimulation of PVN OT cells. Moreover, the authors do not address whether the observed changes in behavior could be explained by overall increases or decreases in locomotor activity.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study. For the optogenetics experiments, we have referred to some of the previous research (Mei et al., 2023; Murugan et al., 2017), and in our study we have also carried out the verification of the reliability of the methods. To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The authors do not specify the percentage of PVN->mPFC neurons labeled that were OT-positive, nor do they directly compare the sexes in their behavioral analysis (or if they did, it is not clear statistically). While the authors propose that the sex difference in pup-directed behaviors is due to females having greater OT expression, they do not provide evidence to support this claim from their labeling data. It is also uncertain whether more OT neurons were manipulated in females compared to males. The study could benefit from a more comprehensive discussion of other factors that could influence the neural circuit under investigation, especially in females.

      AAV11-Ef1a-EGFP virus can infect fibers and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4). In addition, as reviewers suggested, we compared the numbers of OT neurons, activated OT neurons (OT and Fos double-labeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. We did not analyze whether more OT neurons were manipulated in females compared to males, which is indeed a limitation of this study that requires our attention. 

      As the reviewers suggested, we also discussed other factors that could influence the neural circuit under investigation. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice, pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021). The effects of these factors on pup-directed responses should also be considered in the future study. 

      Reviewer #2 (Public Review):

      Summary:

      This series of experiments studied the involvement of PVN OT neurons and their projection to the mPFC in pup-care and attack behavior in virgin male and female Mandarin voles. Using Fos visualization, optogenetics, fiber photometry, and IP injection of OT the results converge on OT regulating caregiving and attacks on pups. Some sex differences were found in the effects of the manipulations.

      Strengths:

      Major strengths are the modern multi-method approaches and involving both sexes of Mandarin vole in every experiment.

      Weaknesses:

      Weaknesses include the lack of some specific details in the methods that would help readers interpret the results. These include:

      (1) No description of diffusion of centrally injected agents.

      Thanks for your professional consideration. Individuals with appropriate viral expression and optical fiber implant location were included in the statistical analysis, otherwise excluded. For optogenetic experiments, the virus (AAV2/9-mOXT-hCHR2(H134R)–mCherry-ER2-WPRE-pA or rAAV-mOXT-eNpHR3.0-mCherry-WPRE-hGH-pA) was designed and constructed to only infect OT neurons, which limited the diffusion of the virus. For fiber photometric experiments, the OT1.0 sensor was largely able to restrict expression within the mPFC brain region, and additionally individuals with incorrect optical fiber embedding position were not included in the statistical analysis. The diffusion of central optogenetic viruses and OT1.0 sensors are shown in the supplemental figure (Supplementary Data Fig. 7).

      (2) Whether all central targets were consistent across animals included in the data analyses. This includes that is not stated if the medial prelimbic mPFC target was in all optogenetic study animals as shown in Figure 4 and if that is the case, there is no discussion of that subregion's function compared to other mPFC subregions.

      As shown in Figure 4 and in the schematic diagram of the optogenetic experiment, the central targets of virus infection and fiber location remain consistent in the data analysis, otherwise the data would be excluded. In the present study, viruses were injected into the prelimbic (PrL). The PrL and infralimbic (IL) regions of the mPFC play different roles in different social interaction contexts (Bravo-Rivera, Roman-Ortiz, Brignoni-Perez, Sotres-Bayon, & Quirk, 2014; Moscarello & LeDoux, 2013). A study has shown that the PrL region of the mPFC contributes to active avoidance in situations where conflict needs to be mitigated, but also contributes to the retention of conflict responses for reward (Capuzzo & Floresco, 2020). This may reveal that the suppression of infanticide by PVN to mPFC OT projections is a behavioral consequence of active conflict avoidance. In a study on pain in rats, OT neurons projections from the PVN to the PrL were found to increase the responsiveness of cell populations in the PrL, suggesting that OT may act by altering the local excitation-inhibition (E/I) balance in the PrL (Liu et al., 2023). A study on anxiety-related behaviors in male rats suggests that the anxiolytic effects of OT in the mPFC are PrL-specific but not infralimbic or anterior cingulate and that this is achieved primarily through the engagement of GABAergic neurons, which ultimately modulate downstream anxiety-related brain regions, including the amygdala (Sabihi, Dong, Maurer, Post, & Leuner, 2017). This finding may provide possible downstream pathways for further research. 

      (3) How groups of pup-care and infanticidal animals were created since there was no obvious pretest mentioned so perhaps there was the testing of a large number of animals until getting enough subjects in each group.  

      Before the experiments, we exposed the animals to pups, and subjects may exhibit pup care, infanticide, or neglect; we grouped subjects according to their behavioral responses to pups, and individuals who neglected pups were excluded.

      (4) The apparent use of a 20-minute baseline data collection period for photometry that started right after the animals were stressed from handling and placement in the novel testing chamber.

      In fiber photometric experiments, all experimental animals were required to acclimatize to the environment for at least 20 minutes prior to the experiment as described in the Methods section. The time 0 in Fig. 4 represents the point in time when a behavior or a segment of behavior started and is not the actual time 0 at which the test was started.

      (5) A weakness in the results reporting is that it's unclear what statistics are reported (2 x 2 ANOVA main effect of interaction results, t-test results) and that the degrees of freedom expected for the 2 X 2 ANOVAs in some cases don't appear to match the numbers of subjects shown in the graphs; including sample sizes in each group would be helpful because the graph panels are very small and data points overlap.

      Thanks for your suggestion. We displayed analysis methods for the data statistics and the sample sizes for each group of experiments in the figure legends.

      The additional context that could help readers of this study is that the authors overlook some important mPFC and pup caregiving and infanticide studies in the introduction which would help put this work in better context in terms of what is known about the mPFC and these behaviors. These previous studies include Febo et al., 2010; Febo 2012; Peirera and Morrell, 2011 and 2020; and a very relevant study by Alsina-Llanes and Olazábal, 2021 on mPFC lesions and infanticide in virgin male and female mice. The introduction states that nothing is known about the mPFC and infanticide. In the introduction and discussion, stating the species and sex of the animals tested in all the previous studies mentioned would be useful. The authors also discuss PVN OT cell stimulation findings seen in other rodents, so the work seems less conceptually novel. Overall, the findings add to the knowledge about OT regulation of pup-directed behavior in male and female rodents, especially the PVN-mPFC OT projection.

      We appreciate you very much to provide so many valuable references. We have cited them in the introduction and discussion. We agree with the reviewer’s opinion that nothing is known about the mPFC and infanticide is incorrect. It should be whether mPFC OT projections are involved in paternal cares and infanticide remains unclear. A study in mother rats indicated that inactivation or inhibition of neuronal activity in the mPFC largely reduced pup retrieval and grouping (Febo, Felix-Ortiz, & Johnson, 2010). In a subsequent study on firing patterns in the mPFC of mother rats suggested that sensory-motor processing occurs in the mPFC that may affect decision making of maternal care to their pups (Febo, 2012). In a study on new mother rats examining different regions of the mPFC (anterior cingulate (Cg1), PrL, IL), they identified a involvement of the IL cortex in biased preference decision-making in favour of the offspring (Pereira & Morrell, 2020). A study on maternal motivation in rats suggests that in the early postpartum period, the IL and Cg1 subregion in mPFC, are the motivating circuits for pup-specific biases (Pereira & Morrell, 2011), while the PrL subregion, are recruited and contribute to the expression of maternal behaviors in the late postpartum period (Pereira & Morrell, 2011).

      Reviewer #3 (Public Review):

      Summary:

      Here Li et al. examine pup-directed behavior in virgin Mandarin voles. Some males and females tend towards infanticide, others tend towards pup care. c-Fos staining showed more oxytocin cells activated in the paraventricular nucleus (PVN) of the hypothalamus in animals expressing pup care behaviors than in infanticidal animals. Optogenetic stimulation of PVN oxytocin neurons (with an oxytocin-specific virus to express the opsin transgene) increased pup-care, or in infanticidal voles increased latency towards approach and attack.

      Suppressing the activity of PVN oxytocin neurons promoted infanticide. The use of a recent oxytocin GRAB sensor (OT1.0) showed changes in medial prefrontal cortex (mPFC) signals as measured with photometry in both sexes. Activating mPFC oxytocin projections increased latency to approach and attack in infanticidal females and males (similar to the effects of peripheral oxytocin injections), whereas in pup-caring animals only males showed a decrease in approach. Inhibiting these projections increased infanticidal behaviors in both females and males and had no effect on pup caretaking.

      Strengths:

      Adopting these methods for Mandarin voles is an impressive accomplishment, especially the valuable data provided by the oxytocin GRAB sensor. This is a major achievement and helps promote systems neuroscience in voles.

      Weaknesses:

      The study would be strengthened by an initial figure summarizing the behavioral phenotypes of voles expressing pup care vs infanticide: the percentages and behavioral scores of individual male and female nulliparous animals for the behaviors examined here. Do the authors have data about the housing or life history/experiences of these animals? How bimodal and robust are these behavioral tendencies in the population?

      As our response to reviewer 2, animals generally exhibit three types of behavioral responses toward pups, and data on the percentage of these different behavioral types occurring in the group will be included in another study in our lab. The reviewer's suggestion of scoring the behaviors is an inspiring idea that will help us to more fully parse these behaviors. Mandarin voles were captured from the wild in Henan, China. The experimental subjects were F2 generation voles reared in the Experimental Animal Centre of Shaanxi Normal University. In our observations, pup care and infanticide behaviors were conserved across several pup exposures, especially pup care behaviors, whereas for infanticide behaviors we did not conduct more pup exposures in order to protect the pups. 

      Optogenetics with the oxytocin promoter virus is a nice advance here. More details about their preparation and methods should be in the main text, and not simply relegated to the methods section. For optogenetic stimulation in Figure 2, how were the stimulation parameters chosen? There is a worry that oxytocin neurons can co-release other factors- are the authors sure that oxytocin is being released by optogenetic stimulation as opposed to other transmitters or peptides, and acting through the oxytocin receptor (as opposed to a vasopressin receptor)?

      As reviewer suggested, more detailed information about virus construction and choice of optogenetic stimulation parameter have been added in the revised manuscript. The details about the construction of CHR2 and mCherry viruses used in optogenetic manipulation can refer to a previous study in which they constructed an rAAV-expressing Venus from a 2.6 kb region upstream of OT exon 1, which is conserved in mammalian species (Knobloch et al., 2012). For details about construction of the eNpHR 3.0 virus, expression of the vector is driven by the mouse OXT promoter, a 1kb promoter upstream of exon 1 of the OXT gene, which has been shown to induce cell type-specific expression in OXT cells (Peñagarikano et al., 2015). Details about the construction of OT1.0 sensor can be referred to the research of Professor Li's group (Qian et al., 2023). The mapping of the viral vectors and OT1.0 sensor is shown below. 

      The optogenetic stimulation parameters were used based on a previous study (He et al., 2021). However, our description of the parameters in the experiment is still not in detail, so some information about optogenetic stimulation parameters has been added in the method. In pupdirected pup care behavioral test, light stimulation lasted for 11 min. Parameters used in optogenetic manipulation of PVN OT neurons were ~ 3 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF and parameters used in optogenetic manipulation of PVN OT neurons projecting to mPFC were ~ 10 mW, 20 Hz, 20 ms, 8 s ON and 2 s OFF to cover the entire interaction. We performed fiber photometric experiments to determine the role that OT plays in behavior, and these results were able to support each other with optogenetic experiments. In addition, we further confirmed the role of optogenetic manipulation on OT release in combination with optogenetic inhibition and OT1.0 sensors (Supplementary Data Fig. 2). It has been previously shown that OT is able to act specifically on OTR in mPFC-PL (Sabihi et al., 2017). Our study focuses on oxytocin neurons as well as oxytocin release, and more research is needed to construct a more complex and complete network regarding the involvement of the OTR and other factors in the mPFC in these behaviors.

      Author response image 1.

      Author response image 2.

       

      Given that they are studying changes in latency to approach/attack, having some controls for motion when oxytocin neurons are activated or suppressed might be nice. Oxytocin is reported to be an anxiolytic and a sedative at high levels.

      As our response to reviewer 1, to exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      The OT1.0 sensor is also amazing, these data are quite remarkable. However, photometry is known to be susceptive to motion artifacts and I didn't see much in the methods about controls or correction for this. It's also surprising to see such dramatic, sudden, and large-scale suppression of oxytocin signaling in the mPFC in the infanticidal animals - does this mean there is a substantial tonic level of oxytocin release in the cortex under baseline conditions?

      The optical fiber recording system used in the present study can automatically exclude effects of motion artifacts by simultaneously recording signals stimulated by a 405nm light source. As shown in the formula below, the z-score data were calculated and presented, and the increase and decline of the OT signal is a trend relative to the baseline. For a smooth baseline, the decreasing signal is generally amplified after calculation. In our experiments combining optogenetic inhibition and OT1.0 sensors, we were able to find that there was a certain level of OT release at baseline, on which there was room for a decrease in the signal recorded by the OT1.0 sensor.

      Figure 5 is difficult to parse as-is, and relates to an important consideration for this study: how extensive is the oxytocin neuron projection from PVN to mPFC?

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected the this virus (green, AAV11-Ef1aEGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).  

      In Figures 6 and 7, the authors use the phrase 'projection terminals'; however, to my knowledge, there have not been terminals (i.e., presynaptic formations opposed to a target postsynaptic site) observed in oxytocin neuron projections into target central regions.

      According your suggestion, we replaced the ‘terminals’ with ‘fibers’ to describe it more accurately..

      Projection-based inhibition as in Figure 7 remains a controversial issue, as it is unclear if the opsin activation can be fast enough to reduce the fast axonal/terminal action potential. Do the authors have confirmation that this works, perhaps with the oxytocin GRAB OT sensor?

      Thanks for your suggestion. We measured the OT release using OT1.0 sensors when the OT neuron projections in the mPFC were optogenetically inhibited. The result showed that optogenetic inhibition of OT neuron fibers in the mPFC significantly reduced OT release that validate the method of projection-based inhibition (Supplementary Data Fig. 5).

      As females and males had similar GRAB OT1.0 responses in mPFC, why would the behavioral effects of increasing activity be different between the sexes?

      In the present study, females released higher levels of OT into the mPFC (Figure 4 d, e) than males upon occurrence of different behaviors. In addition, females already exhibited more rapid approach and retrieval of pups than male before the optogenetic activation this may be the reason no effects of this manipulation were found in female.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Check for spelling and grammar errors throughout.

      Thanks to the reviewer's suggestion, we have checked and revised the article.

      (2) Report effect sizes for all significant findings to allow evaluation of practical significance.

      As reviewer suggested, we have added effect size both in the source data and in the main text, including d, η2  and odds ratio.

      (3) Provide detailed information on the imaging parameters and analysis methods used in the Fos study.

      The imaging parameters and analysis details for the Fos study have also been added in the revised manuscript. Brain slices of 40 µm thick were collected consecutively on 4 slides, each slide had 6 brain slices spaced 160 µm apart from each other. PVN area were determined based on the Allen Mouse Brain Atlas and our previous study, andFos, OT and merged positive neurons were counted.

      (4) Compare the Fos colocalization results with previous studies examining PVN Fos and maternal/paternal care to contextualize the findings.

      Our result about Fos and OT colocalization is consistent with previous study. In a previous study on virgin male prairie voles, OT and Fos colabeled neurons in the PVN increased after exposure to conspecific pups and experiencing paternal care (Kenkel et al., 2012). In another study of prairie voles, OT and c-fos colabeled neurons in PVN significantly increased after becoming parents which may be due to a shift from virgin to parents (Kelly et al., 2017).

      (5) Discuss the limitations of the study, such as the potential effects of manipulating OT neurons on the release of other transmitters or the influence of other neurochemicals or brain regions on pupdirected behaviors, especially in females.

      We agree with reviewer’s suggestion that several limitations should be discussed. Although we used a virus strategy to specifically activate or inhibit PVN OT neurons, other neurochemical may also be released during optogenetic manipulations because OT neurons may also release other neurochemicals. In one of our previous studies, activation of the OT neuron projections from the PVN to the VTA as well as to the Nac brain also altered pup-directed behaviors, which may also be accompanied by dopamine release (He et al., 2021). In addition, backpropagation of action potentials during optogenetic manipulations may also causes the same behavioral effect as direct stimulation of PVN OT cells. These effects on pup-directed behaviors should also be investigated further in the future study.

      (6) Address the possibility of back-propagation of action potentials in the optogenetic manipulations causing the same behavioral effects as PVN OT cell stimulation.

      We agree with the reviewer’s opinion hat optogenetic manipulation may possibly induce back-propagation of action potentials that may result in same behavioral effects as OT cell stimulation. We will pay attention to this issue in the future study.  

      (7) Investigate whether changes in locomotor behavior could explain the observed effects on pupdirected behaviors.

      To exclude effects of locomotor activity on pup directed behaviors, we also investigated effect of optogenetic manipulations on the locomotor activity of experimental animals and found that optogenetic manipulation did not change levels of locomotor activity (Supplementary Data Fig. 6).

      (8) Report the percentage of PVN->mPFC neurons labeled that were OT-positive.

      AAV11-Ef1a-EGFP virus can infect fiber and retrogradely reach to cell body, thus this virus can be used to retrogradely trace neurons. We injected this virus (green, AAV11-Ef1a-EGFP) in the mPFC and observed virus infected and OT (red) positive neuron in the PVN (Yellow), and we also counted the OT neurons that project from PVN to mPFC and found that approximately 45.16% and 40.79% of cells projecting from PVN to the mPFC were OT-positive, and approximately 18.48% and 18.89% of OT cells in the PVN projected to the mPFC in females and males, respectively (Supplementary Data Fig. 4).

      (9)  Directly compare the sexes in the behavioral analysis and discuss any potential sex differences.

      We agree with the reviewer's suggestion and have added comparisons between two sexes and discussion about relevant results. 

      (10) If available, report and discuss the OT expression levels and the number of OT neurons manipulated in each sex.

      In the present study, we have counted the number of OT cells, but did not measure the level of OT expression using WB or qPCR. In addition, the percentages of CHR2(H134R) and eNpHR3.0 virus infected neurons in total OT positive neurons were presented (Supplementary Data Fig. 7), but we did not know how many cells were actually manipulated during the optogenetic experiment.

      (11) Expand the discussion to include what could be regulating or interacting with the OT circuit under investigation, particularly in females where the effects were less pronounced.

      As the reviewers suggested, we have also added relevant discussion. In addition to OT neurons, OTR neurons may also regulate behavioral responses to pups. In a study of virgin female mice pup exposure was found to activate oxytocin and oxytocin receptor expressing neurons (Okabe et al., 2017). Other brain regions such as preoptic area (POA) may also be involved in parental behaviors. For example, virgin female mice repeatedly exposed to pups showed shorter retrieval latencies and greater c-Fos expression in the preoptic area (POA), concentrations of OT in the POA were also significantly increased, and the facilitation of alloparental behavior by repeated exposure to pups occurred through the organization of the OT system (Okabe et al., 2017). A recent study suggests that OT of the PVN is involved in the care of pups by male voles (He et al., 2021). This study suggests that PVN to ventral tegumental area (VTA) OT projections as well as VTA to nucleus accumbens (NAc) DA projections are involved in the care of pups by male voles. Inhibition of OT projections from the PVN to the VTA reduces DA release in the NAc during licking and grooming of pups (He et al., 2021).

      Reviewer #2 (Recommendations For The Authors):

      A few additional things the authors may want to consider:

      (1) I don't understand the subject numbers in the peripheral OT study data shown in Figure 8. Panels p and q have 69 females shown and 50 males. Was there a second, much larger, IP injection study conducted that was different than the subjects shown in panels a-o that had ~5 subjects per treatment group per sex?

      Sorry for the confusing. More animals were used to test effects of OT on infanticide behaviors in our pre-test. These data combined with data from formal pharmacological experiment were presented in Fig. 8p, q. After OT treatment, the changes in detailed and specific behaviors were only collected in several animals. We have clarified that in the revised manuscript. 

      (2) The authors suggest higher baseline OT release in the female mPFC, which makes sense and helps explain some of their results. It seems that the data in Figure 1 show what is probably no sex difference in OT cell numbers in the PVN of Mandarin voles, which is unlike the old studies in mice or rats. If readers look at the data in Figure 1 showing what seems to be no sex difference in OT cell number, the authors' argument in the discussion about mPFC OT release levels higher in females would be inconsistent with their own data shown. The authors have the brain sections they need to help support or undermine this argument in the discussion, so maybe it would be useful to analyze the OT cell numbers across the PVN and report it in this paper or briefly mention it in the discussion.

      We compared the numbers of OT neurons, activated OT neurons (OT and Fos doublelabeled neurons) and level of OT release between males and females. We found that females have more activated OT neurons (Figure1, d, g) and released higher levels of OT into the mPFC (Figure 4 d, e) than males. This part has been added in the result and discussion. The inconsistency of the OT cell numbers with previous studies may be due to the method of cell counting, as we did not count all slides consecutively.  

      (3) The discussion suggests visual cues are involved in mPFC OT release relevant for pup care or infanticide, but this is a very odd claim for nocturnal animals that live and nest with their pups in underground burrows.

      Sorry for the confusing. Here, we cited the finding in mice that activation of PVN OT neurons induced by visual stimulation promoted pup care to support our finding that the activity of OT cells of the PVN is involved in pup care, rather than to illustrate the role of visual stimulation in voles. We have clarified that in the revised manuscript.

      (4) The lack of decrease in mPFC OT release in the 2nd and 3rd approaches to pups is probably because the release was so high after the 1st approach that it didn't have time to drop before the subsequent approaches. The authors don't state how long those between-approach intervals were on average to help readers interpret this result.

      As described in our methods, we spaced about 60 s between each behavioral test to allow the signal return back to the baseline level.

      (5) Do PVN-mPFC OT somata collateralize to other brain sites? Could mPFC terminal stimulation activate entire PVN cells and every site they project to? A caveat could be mentioned in the discussion if there's support for this from other optogenetic and PVN OT cell projection studies.

      We verified the OT projections from PVN to mPFC, to validate the optogenetic manipulation of this pathway, but did not investigate whether the OT neurons projecting from PVN to mPFC also project collaterally to other brain regions. It is suggested that mPFC terminal stimulation only activate PVN OT cells projecting mPFC, whether other OT neurons were activated remains unclear. 

      (6) I don't see an ethics statement related to the experiments obviously having to involve pup injury or death. Nothing is said in methods about what happened after adult subjects attacked pups. I assumed the tests were quickly terminated and pups euthanized.

      In case the pups were attacked, we removed them immediately to avoid unnecessary injuries, and injured pups were euthanized.

      (7) The authors could be more specific about what psychological diseases they refer to in the abstract and elsewhere that are relevant to this study. Depression? Rare cases of psychosis? Even within the already rare parental psychosis, infanticide is tragic but rare.

      Infanticide is caused by a variety of factors, mental illness, especially depression and psychosis, is often a very high risk factor among them (Milia & Noonan, 2022; Naviaux, Janne, & Gourdin, 2020). In human, infanticide has been used to refer to the killing, neglect or abuse of newborn babies and older children (Jackson, 2006). Here, we believe that research on the neural mechanisms of infanticide can also contribute to the understanding and treatment of attacks on children, physical and verbal abuse, and direct killing of babies. 

      (8) Figure 8 - in one case the "*" is a chi-square result , correct?

      Thanks for your careful checking. In Figure 8p, q, we applied the chi-square test and  added it in the legend.

      Reviewer #3 (Recommendations For The Authors):

      The only other thing is a typo on line 135: the authors mean 'stimulation' instead of 'simulation'.

      Corrected.

      References

      Bravo-Rivera, C., Roman-Ortiz, C., Brignoni-Perez, E., Sotres-Bayon, F., & Quirk, G. J. (2014). Neural structures mediating expression and extinction of platform-mediated avoidance. J Neurosci, 34(29), 9736-9742. doi:10.1523/jneurosci.0191-14.2014

      Capuzzo, G., & Floresco, S. B. (2020). Prelimbic and Infralimbic Prefrontal Regulation of Active and Inhibitory Avoidance and Reward-Seeking. J Neurosci, 40(24), 4773-4787. doi:10.1523/jneurosci.0414-20.2020

      Febo, M. (2012). Firing patterns of maternal rat prelimbic neurons during spontaneous contact with pups. Brain Res Bull, 88(5), 534-542. doi:10.1016/j.brainresbull.2012.05.012

      Febo, M., Felix-Ortiz, A. C., & Johnson, T. R. (2010). Inactivation or inhibition of neuronal activity in the medial prefrontal cortex largely reduces pup retrieval and grouping in maternal rats. Brain Res, 1325, 77-88. doi:10.1016/j.brainres.2010.02.027

      He, Z., Young, L., Ma, X. M., Guo, Q., Wang, L., Yang, Y., . . . Tai, F. (2019). Increased anxiety and decreased sociability induced by paternal deprivation involve the PVN-PrL OTergic pathway. Elife, 8. doi:10.7554/eLife.44026

      He, Z., Zhang, L., Hou, W., Zhang, X., Young, L. J., Li, L., . . . Tai, F. (2021). Paraventricular Nucleus Oxytocin Subsystems Promote Active Paternal Behaviors in Mandarin Voles. J Neurosci, 41(31), 66996713. doi:10.1523/jneurosci.2864-20.2021

      Jackson, M. (2006). Infanticide. The Lancet, 367(9513), 809. doi:https://doi.org/10.1016/S01406736(06)68323-2

      Kelly, A. M., Hiura, L. C., Saunders, A. G., & Ophir, A. G. (2017). Oxytocin Neurons Exhibit Extensive Functional Plasticity Due To Offspring Age in Mothers and Fathers. Integr Comp Biol, 57(3), 603618. doi:10.1093/icb/icx036

      Kenkel, W. M., Paredes, J., Yee, J. R., Pournajafi-Nazarloo, H., Bales, K. L., & Carter, C. S. (2012). Neuroendocrine and behavioural responses to exposure to an infant in male prairie voles. J Neuroendocrinol, 24(6), 874-886. doi:10.1111/j.1365-2826.2012.02301.x

      Knobloch, H. S., Charlet, A., Hoffmann, L. C., Eliava, M., Khrulev, S., Cetin, A. H., . . . Grinevich, V. (2012). Evoked axonal oxytocin release in the central amygdala attenuates fear response. Neuron, 73(3), 553-566. doi:10.1016/j.neuron.2011.11.030

      Liu, Y., Li, A., Bair-Marshall, C., Xu, H., Jee, H. J., Zhu, E., . . . Wang, J. (2023). Oxytocin promotes prefrontal population activity via the PVN-PFC pathway to regulate pain. Neuron, 111(11), 17951811.e1797. doi:10.1016/j.neuron.2023.03.014

      Mei, L., Yan, R., Yin, L., Sullivan, R. M., & Lin, D. (2023). Antagonistic circuits mediating infanticide and maternal care in female mice. Nature, 618(7967), 1006-1016. doi:10.1038/s41586-023-061479

      Milia, G., & Noonan, M. (2022). Experiences and perspectives of women who have committed neonaticide, infanticide and filicide: A systematic review and qualitative evidence synthesis. J Psychiatr Ment Health Nurs, 29(6), 813-828. doi:10.1111/jpm.12828

      Moscarello, J. M., & LeDoux, J. E. (2013). Active avoidance learning requires prefrontal suppression of amygdala-mediated defensive reactions. J Neurosci, 33(9), 3815-3823. doi:10.1523/jneurosci.2596-12.2013

      Murugan, M., Jang, H. J., Park, M., Miller, E. M., Cox, J., Taliaferro, J. P., . . . Witten, I. B. (2017). Combined Social and Spatial Coding in a Descending Projection from the Prefrontal Cortex. Cell, 171(7), 1663-1677.e1616. doi:10.1016/j.cell.2017.11.002

      Naviaux, A. F., Janne, P., & Gourdin, M. (2020). Psychiatric Considerations on Infanticide: Throwing the Baby out with the Bathwater. Psychiatr Danub, 32(Suppl 1), 24-28. 

      Okabe, S., Tsuneoka, Y., Takahashi, A., Ooyama, R., Watarai, A., Maeda, S., . . . Kikusui, T. (2017). Pup exposure facilitates retrieving behavior via the oxytocin neural system in female mice. Psychoneuroendocrinology, 79, 20-30. doi:10.1016/j.psyneuen.2017.01.036

      Peñagarikano, O., Lázaro, M. T., Lu, X. H., Gordon, A., Dong, H., Lam, H. A., . . . Geschwind, D. H. (2015). Exogenous and evoked oxytocin restores social behavior in the Cntnap2 mouse model of autism. Sci Transl Med, 7(271), 271ra278. doi:10.1126/scitranslmed.3010257

      Pereira, M., & Morrell, J. I. (2011). Functional mapping of the neural circuitry of rat maternal motivation: effects of site-specific transient neural inactivation. J Neuroendocrinol, 23(11), 1020-1035. doi:10.1111/j.1365-2826.2011.02200.x

      Pereira, M., & Morrell, J. I. (2020). Infralimbic Cortex Biases Preference Decision Making for Offspring over Competing Cocaine-Associated Stimuli in New Mother Rats. eNeuro, 7(4). doi:10.1523/eneuro.0460-19.2020

      Qian, T., Wang, H., Wang, P., Geng, L., Mei, L., Osakada, T., . . . Li, Y. (2023). A genetically encoded sensor measures temporal oxytocin release from different neuronal compartments. Nat Biotechnol, 41(7), 944-957. doi:10.1038/s41587-022-01561-2

      Sabihi, S., Dong, S. M., Maurer, S. D., Post, C., & Leuner, B. (2017). Oxytocin in the medial prefrontal cortex attenuates anxiety: Anatomical and receptor specificity and mechanism of action. Neuropharmacology, 125, 1-12. doi:10.1016/j.neuropharm.2017.06.024

    1. eLife assessment

      This fundamental work demonstrates that ABHD6 regulates AMPAR gating kinetics in a TARP γ-2-dependent manner. The evidence in this study is compelling. This study will be of interest to readers in the field of synaptic transmission.

    2. Reviewer #1 (Public Review):

      Summary:

      This research sheds light on the nuanced role of ABHD6 in the regulation of AMPARs, highlighting its interaction with TARP γ-2 as a critical factor in modulating receptor-gating kinetics. It is crucial to understand that while ABHD6 alone does not alter AMPAR kinetics, its presence alongside TARP γ-2 leads to accelerated deactivation and desensitization of AMPARs, impacting synaptic transmission dynamics.

      Strengths:

      Important findings in the research include:<br /> - ABHD6 does not affect the gating kinetics of GluA1 and GluA2(Q) homomeric receptors independently.<br /> - In the presence of TARP γ-2, ABHD6 accelerates deactivation and desensitization of these receptors, regardless of their splicing or editing isoforms.<br /> - The effect is consistent for both homomeric GluA1 and GluA2(Q) receptors and heteromeric GluA1i/GluA2(R)i-G receptors.<br /> - The recovery from desensitization of GluA1 with the flip splicing isoform is slowed by ABHD6 in the presence of TARP γ-2.

      Weaknesses:

      However, the study focuses on specific receptor subunits and isoforms, which may not fully represent the diversity of AMPAR compositions found in vivo (e.g. though the authors have claimed that TARP γ-2 failed to increase GluA3-induced currents significantly, the effect on GluA4 or the explanation was missing). Further research is needed to explore the implications of these findings in more complex neuronal environments.

    3. Reviewer #2 (Public Review):

      Summary:

      Cong et al. investigated the regulatory effects of ABHD6 on AMPARs. The authors performed adequate electrophysiology recordings to show the exact pattern of this regulation and covered major critical points.

      Strengths:

      The authors have performed high-quality ephys recordings and examined all potential regulatory aspects of ABHD6 on AMPARs. This is important to understand the AMPAR functions.

      Weaknesses:

      (1) The authors discussed CNIH-2 extensively from line 92-110 in the introduction, however, they did not perform related experiments. I suggest they move this part to the discussion where they also discussed the roles of CNIH.<br /> (2) The authors need to report the "n" for all the experiments they have presented in this manuscript. How many cells were recorded in each condition? How many batches? This information has to be in all of the figure legends, but it is missing except Fig. 4.<br /> (3) One question is what the physiological meanings of this regulatory effect are. The authors may consider adding some discussions.<br /> (4) About statistics. The authors need to add more details and make sure their statistics sound. For example, they also need to check the equality of variances. In their Table EVs, where the P values are reported, the authors need to report which statistics they have used, one-way ANOVA, K-W test, or others, and the exact post-hoc test type for each comparison. For one-way ANOVA, report the F values simultaneously with the P values in all figure legends.<br /> (5) Fig. 3J, the authors need to correct the label of the Y axis. It is shifted.

    1. eLife assessment

      This study presents important findings on the early development of cardiac and respiratory interoceptive sensitivity based on an investigation of infants aged 3, 9 and 18 months and on extensive statistical analyses. The evidence supporting the conclusions are convincing although the research faced technical and recruitment challenges that limit the findings interpretation and generalizability. This study will be of significant interest to developmental psychologists and neuroscientists working on interoception and its influence on socio-cognitive development.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study investigated the development of interoceptive sensitivity in the context of cardiac and respiratory interoception in 3-, 9-, and 18-month-old infants using a combination of both cross-sectional and longitudinal designs. They utilised the cardiac interoception paradigm developed by Maister et al (2017) and also developed a new paradigm to investigate respiratory interoception in infants. The main findings of this research are that 9-month-old infants displayed a preference for stimuli presented synchronously with their own heartbeat and respiration. The authors found less reliable effects in the 18-month-old group, and this was especially true for the respiratory interoceptive data. The authors replicated a visual preference for synchrony over asynchrony for the cardiac domain in 3-month-old infants, while they found inconclusive evidence regarding the respiratory domain. Considering the developmental nature of the study, the authors also investigated the presence of developmental trajectories and associations between the two interoceptive domains. They found evidence for a relationship between cardiac and respiratory interoceptive sensitivity at 18 months only and preliminary evidence for an increase in respiratory interoception between 9 and 18 months.

      Strengths:

      The conclusions of this paper are mostly well supported by data, and the data analysis procedures are rigorous and well justified. The main strengths of the paper are:

      - A first attempt to explore the association between two different interoceptive domains. How different organ-specific axes of interoception relate to each other is still open and exploring this from a developmental lens can help shed light into possible relationships. The authors have to be commended for developing a novel interoceptive tasks aimed at assessing respiratory interoceptive sensitivity in infants and toddlers, and for trying to assess the relationship between cardiac and respiratory interoception across developmental time.<br /> - A thorough justification of the developmental ages selected for the study. The authors provide a rationale behind their choice to examine interoceptive sensitivity at 3, 9, and 18-months of age. These are well justified based on the literature pertaining to self- and social development. Sometimes, I wondered whether explaining the link between these self and social processes and interoception would have been beneficial as a reader not familiar with the topics may miss the point.<br /> - An explanation of direction of looking behaviour using latent curve analysis. I found this additional analysis extremely helpful in providing a better understanding of the data based on previous research and analytical choices. As the authors explain in the manuscript, it is often difficult to interpret the direction of infant looking behaviour as novelty and familiarity preferences can also be driven by hidden confounders (e.g. task difficulty). The authors provide compelling evidence that analytical choices can explain some of these effects. Beyond the field of interoception, these findings will be relevant to development psychologists and will inform future studies using looking time as a measure of infants' ability to discriminate among stimuli.<br /> - The use of simulation analysis to account for small sample size. The authors acknowledge that some of the effects reported in their study could be explained by a small sample size (i.e. the 3-month-olds and 18-month-olds data). Using a simulation approach, the authors try to overcome some of these limitations and provide convincing evidence of interoceptive abilities in infancy and toddlerhood (but see also my next point).

      Weaknesses:

      - While the research question is timely and the methodology is detailed, there is a critical flaw in the experimental design: the lack of randomization of stimuli due to an error in the programming script. The authors very honestly report this error and have performed additional analyses to investigate its potential impact on the study's results. Unfortunately, I am not fully convinced these analyses provide enough reassurance and I believe the technical error still undermines the validity of the findings, making it difficult to draw meaningful conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Tünte et al. investigated the development of interoceptive sensitivity during the first year of life, focusing specifically on cardiac and respiratory sensitivity in infants aged 3, 9, and 18 months. The research employed a previously developed experimental paradigm for the cardiac domain and adapted it for a novel paradigm in the respiratory domain. This approach assessed infants' cardiac and respiratory sensitivity based on their preferential looking behavior toward visuo-auditory stimuli displayed on a monitor, which moved either in sync or out of sync with the infants' own heartbeats or breathing. The results in the cardiac domain showed that infants across all age groups preferred stimuli moving synchronously rather than asynchronously with their heartbeat, suggesting the presence of cardiac sensitivity as early as 3 months of age. However, it is noteworthy that this preference direction contradicts a previous study, which found that 5-month-old infants looked longer at stimuli moving asynchronously with their heartbeat (Maister et al., 2017). In the respiratory domain, only the group of 9-month-old infants showed a preference for stimuli presented synchronously with their breathing. The authors conducted various statistical analyses to thoroughly examine the obtained data, providing deeper insights valuable for future research in this field.

      Strengths:

      Few studies have explored the early development of interoception, making the replication of the original study by Maister et al. (2017) particularly valuable. Beyond replication, this study expands the investigation into the respiratory domain, significantly enhancing our understanding of interoceptive development. The provision of longitudinal and cross-sectional data from infants at 3, 9, and 18 months of age is instrumental in understanding their developmental trajectory.

      Weaknesses:

      Due to a technical error, this study failed to counterbalance the conditions of the first trial in both the iBEAT and iBREATH tests. Although the authors addressed this issue as much as possible by employing alternative analyses, it should be noted that this error may have critically influenced the results and, thus, the conclusions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 Public:

      - The authors should carefully address the potential confounding of not counterbalancing the conditions of the first trial in both interoceptive tasks for the 9-month and 18-month age groups. The results of these groups could indeed be driven by having seen the synchronous trial first. 

      Upon addressing this comment, we noticed an error in our presentation scripts that resulted in a fixed-experimental design for most of the infants. Therefore, it is crucial to investigate the impact of the fixed-experimental design on our results. We have conducted extensive additional analyses comparing data from infants with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended, which can be found in Supplementary Materials A. In summary, we do not find that the fixed order design had a strong impact on the findings, as we do not find that looking behavior differed systematically between different randomization orders, while also looking patterns across ages and tasks indicate that we were able to adequately capture variance associated with these features. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its implications on the interpretation of the results.

      For instance, on pages 30 and 31 we have added the following paragraphs:

      “The data presented in this study holds several limitations. First, due to an error in our experimental scripts we unintentionally used a fixed-order design, in which almost all infants saw the same fixed order of condition (always starting with a synchronous trial), image assigned to condition, and location of the image (left/right) instead of a semi-randomized design. Such a fixed-order design holds several important limitations as visual preferences might be influenced by the experimental design, i.e., the first trial always being synchronous might have influenced a mean group preference. Further, we cannot rule out that mean group preferences were influenced by the stimuli used (as in most cases the same stimuli were used for synchronous/asynchronous trials) or by the location of the image in a given trial (left/right). Still, there is no strong theoretical argument as to why image used or location should have an impact on infants’ preferences. The stimuli were selected to be similar to each other, in order not to evoke a piori preferences. To further illustrate the impact of the fixed order design we have conducted several additional analyses, which can be found in Supplementary Materials A, which do not indicate that there was a strong impact of the fixed-order design. Specifically, we find no evidence for systematic differences between infants tested with the fixed design and infants tested with a randomized design.

      Despite these limitations fixed-order designs also hold advantages, as they are more suitable to investigate individual differences (Dang et al., 2020; Hedge et al., 2018). When each participant is exposed to the same procedure, individual differences are less likely to be attributed to effects of randomization but are more likely to reflect real differences between participants. Also, when considering the impact of the randomization, one must consider our results in relation to earlier studies (Maister et al. 2017, Weijs et al. 2022, Imafuku et al. 2023), some of which used the exact same stimuli as we did (Maister et al., 2017), with fully randomized designs. Results of these studies indicate no looking times differences depending on the stimulus assigned to each condition or systematic preferences for one of the stimuli.”

      - The conclusion that cardiac interoception remains stable across infancy is not fully warranted by the data. Given the small sample size of 18-month-old toddlers included in the final analyses, it might be misleading to state this without including the caveat that the study may be underpowered. In other words, the small sample size could explain the direction of the results for this age group. 

      We agree with the reviewer and explicitly acknowledge this issue now in the discission, p.  23: 

      “However, due to the small sample size at 18 months the results regarding changes and stability of interoceptive sensitivity in the second year of life must be considered speculative and need to be validated in further research.”

      Reviewer #1 (Recommendations For The Authors): 

      Below are some comments that the authors may wish to take into account: 

      - Why did the authors choose to apply different statistical analyses across the dataset (i.e. Bayesian t-test is used with the 3-month-old sample, whereas a paired t-test is used for the 9 and 18-month-olds)? 

      The use of different statistical analyses was driven by the timeline of the project, as we had to update our initial plans. Due to challenges related to the Covid-19 pandemic, it was not possible to recruit 3-month-old babies for out study at the time we started the data collection. Thus, we first collected the 9- and 18-month-olds, and the 3-month-olds later. For the 9- and 18-month-old samples we aimed at directly replicating the approach by Maister et al. (2017). However, for the 3-month-olds we wanted to focus more on classification of the strength of evidence in favor/against an effect, taking the results of the equivalence tests for the 9- and 18-month-olds into account.

      The following parts have been added to the manuscript to clarify our approach:

      Sample (p 33): “The 3-month-old sample was tested after completion of the 9- and 18-monthold samples. Initially, we had planned to start data collection with the 3-month-old sample.

      However, due to the Covid-19 pandemic this was not possible.”

      Statistical analysis (p. 41): “At 3 months we used a Bayesian paired t-test as the data collection was done after having collected the 9- and 18-month-old samples. Our intention in the analysis of the 3-month-old sample was to focus more strongly on strength of evidence in favor of/against an effect instead of a binary classification for/against an effect.”

      - I found the way in which sample sizes are reported a little unclear. This may be due to having the Results section before the Methods section (in line with journal requirements), but it would be helpful if the authors could clarify their sample size from the outset. For example, sample size for the 3-month-olds first says N = 80 (page 9), but then it becomes apparent that N = 53 completed the iBEAT and N = 40 completed the iBREATH. I think for the purpose of explaining the results, it might be more helpful to the reader to only know the final sample size and then specify recruited participants and dropout in the Methods. 

      We have adapted the description of sample sizes in the Results section. We now only refer to the number of infants included in a given analysis when reporting the results of the analysis. In addition, we have added the following clarification for the MEGA analysis (p. 11): “This approach allowed us to include 135 observations for the iBEATs from 125 infants, and 120 observations for the iBREATH from 107 infants. The sample size differs slightly from our preregistered approach given that we used the same preprocessing approach for the MEGAanalysis for all samples. “ 

      In addition, we now refer to the sample of the MEGA-analysis in the abstract, to make the understanding of our approach more intuitive.

      - I think the sentence "Interestingly, we find evidence for a positive relationship between cardiac and respiratory perception in our 18-month-old sample" at page 25 could be deleted given that the small sample size of 18-month-olds suggests this result should be interpreted with caution. The authors already explained this in the earlier paragraph (page 24) and simply re-stating this (weak) effect without further elaborating may not be necessary. 

      We have removed the sentence.

      - In multiple places in the manuscript, the authors hint at the association between interoception and certain social and self-related abilities (e.g. joint attention, mirror self-recognition), however, these are not fully elaborated on. Could the authors elaborate on the relation between mirror self-recognition and respiratory interoception (page 30)? Why would the ability to recognise the self-face be associated with the individual's ability to perceive their breathing pattern? How these two processes may be linked is not immediately obvious. 

      We have rephrased the sentence on page 30 to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      - Aren't the 18-month-old infants effectively 19-month-olds? The mean age is 576.65 days, and the age window of recruitment was between 18 and 20 months. 

      We have added a sentence clarifying how we refer to the infants age ranges. “To stay coherent, we refer to each age group throughout the manuscript with regard to the lower end of the age range in which we included infants (e.g., we tested infants between 9 and 10 months, but refer to them as the 9-month-old group).”

      Reviewer #2 Public:

      Weaknesses: 

      (1) My primary concern is that this study did not counterbalance the conditions of the first trial in both iBEAT and iBREATH tests for the 9-month and 18-month age groups. In these tests, the first trial invariably involved a synchronous stimulus. I believe that the order of trials can significantly influence an infant's looking duration, and this oversight could potentially impact the results, especially where a marked preference for synchronous stimuli was observed among infants. 

      Upon conducting further analyses to address this comment, we noticed an error in our presentation scripts that resulted in the inadvertent use of a fixed-experimental design for most infants. Therefore, we have conducted extensive additional analysis which can be found in Supplementary Materials A. Specifically, we compared data from infants who were tested with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its potential implications for the interpretation of the results.

      (2) The analysis indicated that the study's sample size was too small to effectively assess the effects within each age group. This limitation fundamentally undermines the reliability of the findings. 

      We have added a statement addressing this issue to the limitation section: “The reduced sample size might have impacted the statistical power to detect mean preferences for some age groups. Still, it must be noted that even the smaller sample sizes included were of similar size as used in previous studies on infant interoceptive sensitivity (Imafuku et al., 2023; Maister et al., 2017; Weijs et al., 2023).”

      (3) The authors attribute the infants' preferential-looking behavior solely to the effects of familiarity and novelty. However, the meaning of "familiarity" in relation to external stimuli moving in sync with an infant's heartbeat or breathing is not clearly defined. A deeper exploration of the underlying mechanisms driving this behavior, such as from the perspectives of attention and perception, is necessary. 

      We have adapted the respective paragraph in the discussion to clarify the term familiarity, and to also address that other aspects of attention and perception, might be relevant (p. 25): 

      “In this context familiarity might refer to the infant’s perception of congruence between internal signal and external stimuli which might drive the infant’s attention. Specifically, the synchronous condition should be easier to process due to the intersensory redundancy and predictability between interoceptive and external signals. “

      “However, it is important to consider that other cognitive and attentional mechanisms could also influence these responses.”

      Reviewer #2 (Recommendations For The Authors):  

      Introduction: 

      (1) The relevance of respiration to self-regulation and social interaction was not clearly described. 

      We have rephrased the relevant section to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      (2) In the last line of page 5, it might be more appropriate to use the term "meta-cognitive awareness" instead of "meta-perception," as the latter can refer to a different concept. 

      We have changed the word as recommended. 

      (3) The authors predicted a positive correlation in sensitivity between the cardiac and respiratory domains, despite studies in adults suggesting these are not related. How did the authors arrive at this prediction, and how do they interpret the results showing a correlation only in 18-montholds, the age group closest to adults in this study? 

      We have elaborated on our reasoning for our prediction (p. 7): “Adult cardiac and respiratory interoception paradigms typically use two conceptually different paradigms. Thus, null results in the adult literature might be due to the unique characteristics of those paradigms.”

      Further, we have expanded on this result in the discussion (p. 24): “Still, we find a relationship between cardiac and respiratory signals in the oldest sample tested here, the 18-month-olds, which is closest to adults. Although this effect needs to be interpreted with caution due to the small sample size, this might indicate that using conceptually similar experimental paradigms might be a promising avenue to investigate relationships between different interoceptive modalities in adults.”

      Results: 

      (4) Please provide the descriptive statistics (means and standard deviations of looking time) for each independent condition, especially for the 18-month and 3-month age groups where this information is missing and only differences in looking times between conditions were mentioned. Furthermore, since the asynchronous condition includes both fast and slow stimuli, descriptive statistics for each should be included to help readers determine whether effects are due to synchronicity or stimulus speed. 

      We have added the information on mean and sd of looking times to synch and asynch trials to the results section. Mean looking times to both types of asynchronous trials can be found in supplementary materials C. We have added the information about standard deviations to this part. 

      (5) Regarding the MEGA analysis for iBEATs, where a main effect of condition was found (OR = 1.13, t(1769) = 2.541, p = .011), are these t-value and p-value based on the GLMM analysis, or did the authors conduct a separate t-test? This query arises because the p-value of the main effect differs from that in Table 2. Also, is it conventional to present GLMM results in the manner of Table 2, comparing specific level combinations (i.e., synchronous condition and 3month age group), instead of listing main effects and interactions? 

      Thank you very much for pointing out that the results of the GLMM were not reported as precise as possible, which might lead to confusion over the presented p-values. The main effect of condition refers to a post-hoc comparison using estimated marginal means from the GLMM across all age groups, while Table 2 refers to the main effect of condition for age group 3 months. 

      To make the results more accessible we have restructured parts of the manuscript following your suggestions: In the main manuscript we now focus on the interaction effects for condition and age, as well as the post hoc comparison, while we now report null-full model comparison, and tables for all age groups in the supplements. 

      We have added the following clarifying sentences to the manuscript, p. 12:

      “In reporting these results we focus on whether we found evidence for interactions between age groups, and whether we found evidence for a general effect across age groups. In-depth results and tables can be found in Supplementary Materials C. 

      […]

      Next, we computed post hoc comparisons using estimated marginal means from the MEGAanalysis across all age groups to investigate whether we find indications for a similar effect across ages.”

      (6) I am confused about the results indicating a significant effect of condition for the iBREATH dataset excluding 18-month-olds (Table 5, OR = 1.15, t(1050) = 2.397, p = .017), as the description in Table 5 suggests no statistical significance (p = .070). The decision to exclude the 18-month group seems arbitrary, particularly since the age-by-condition interaction was not significant in the GLMM across all three age groups. 

      Thank you very much for the comment, we have removed the analysis excluding the 18-month-old group

      (7) Regarding the relationship between cardiac and respiratory interoceptive sensitivity, the statement "However, we found a significant interaction between iBEATs scores and age at the 18-month level" (p16) seems unclear. Clarification is needed, as mentioning age interaction at a specific age stage is unusual. A pairwise comparison between 3 and 9 months should also be included. 

      Thank you for pointing out that the results could be presented more clearly! Similar to the other MEGA analyses we have put detailed tables of the results of the beta regression in the supplements and have kept a single table with the most important results in the main manuscript. Further, we have clarified the text passage as follows: “However, we found a significant interaction between the iBEATs scores and age, specifically comparing the 3- and 18-month-old groups (β = 3.13, SE = 1.41, p = .027). This interaction indicates that the relationship between iBEATs and iBREATH scores changes between 3 and 18 months of age.”  Also, we have now included a pairwise comparison between 3- and 9-month-olds. 

      Discussion: 

      (8) In pages 27-28, the authors discuss the results of the specification curve analysis, but there is no explanation for the 7th entry (statistical analysis) in Table 9. This entry seems particularly important. 

      We did not include an explanation for the 7th entry, as the impact of the statistical test used was comparatively less pronounced. However, to acknowledge this result we have added the following sentence to the discussion: “Moreover, the statistical test used (paired t-test vs linear mixed model, Table 9, 7th entry) had a rather small impact on the results. However, given the large number of analyses conducted, this might be related to not being able to precisely formulate the model to fit the complexity of the data for each specification.”

      Methods: 

      (9) What were the colors of the stimuli? 

      We have added the colors of the stimuli to the methods section. Further, the stimuli can be found in the osf project associated with the manuscript.

      (10) The percentage of trials excluded during preprocessing should be stated. Additionally, the number of trials included in the statistical analyses for each condition (including synchronous, fast, and slow) should be detailed separately. 

      We have added information on numbers of trials completed and included in Table 7.

    1. eLife assessment

      This valuable study advances the understanding of granuloma formation by identifying a key chemokine receptors in containing infection by a specific species of bacteria. The evidence supporting this is solid, providing a spatial transcriptomic dataset spanning granuloma formation and resolution by a specific species of bacteria. The work should be of interest to microbiologists and immunologists.

    2. Reviewer #1 (Public review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking, generating a valuable spatial transcriptomic reference. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      Readers should note that due to the resolution of the spatial data, it is difficult to associate gene expression differences with individual cell types; the authors focus instead on changes in chemokines and chemokine receptors, and perform experiments to evaluate the importance of CCR2.

      Comments on the revised version:

      The authors have addressed all of my previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      In this study Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors, and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil-and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil and monocyte recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result. On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per timepoint. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability. Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous. The authors do assert that in C. violaceum infection granulomas are very consistent in their composition and kinetics, alleviating, in part, this concern.

      Another caveat to these data is the limited or incompletely informative data analysis. This dataset has been previously published with more extensive and broad characterization. Here, the authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, key findings rely on qualitative inspection of gene expression overlaid on to images or data that has been qualitatively binned or thresholded. Upon revision the authors did supplement their analyses with important information, such as the top expressed genes in each Visium cluster and the dynamic range of RNA counts retrieved across clusters.

      Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors. The authors acknowledge the challenge of conducting this analysis given the variable size and geometry of the granulomas. In future studies, this can be overcome through size- or distance-based normalization or spatial clustering approaches that evaluate local neighborhood composition across different scales.

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As that authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      The spatial transcriptomics data presented in the figures could be considered a valuable resource if shared, with the potential for improved and clarified analyses. The primary conclusion of the paper, that C. violaceum infection in the liver cannot be contained without macrophages, would benefit from clarification.

      We thank the reviewer for their time and effort in evaluating our manuscript.

      While the spatial transcriptomic data generated in the figures are interesting and valuable, they could benefit from additional information. The manual selection of regions of granulomas for analysis could use additional context - was the rest of the liver not sequenced, or excluded for other reasons? Including a healthy liver in the analysis could serve as a control for any lasting effects at the final time point of 21 days.

      We revised the text in the methods section to include additional information about manual selection of regions. The entire tissue section was sequenced, but using H&E as a guide, we manually selected each representative lesion and a surrounding layer of healthy hepatocytes at each timepoint. We agree that an uninfected control could be useful, however we did not include an uninfected mouse in the experiment because we were most interested in the cells that make up the granuloma, not hepatocytes outside the lesion. Additionally, we find that in the 21 DPI timepoint the surrounding hepatocytes appear to have returned to a homeostatic transcriptional state; at 21 DPI the majority of mice have undetectable CFU burdens.

      Providing more context for the scalebars throughout the spatial analyses, such as whether the data are raw counts or normalized based on the number of reads per spatial spot, would be helpful for interpretation, as changes in expression could signal changes in the numbers of cells or changes in the gene expression of cells.

      The scalebars for the SpatialFeaturePlots display the normalized gene expression values. The data are normalized based on the number of reads per spatial spot, using the sctransform method published in (Hafemeister & Satija, 2019). We agree that the changes in expression could result from changes in cell numbers and/or changes in gene expression on a per cell basis. However, the sctransform method is designed to preserve biological variation while minimizing technical effects observed in transcriptomics platforms. Regardless of the heterogeneity of sequencing depth, it is clear from these plots that gene expression changes dynamically over time and space, which was the focus of our analysis. We have updated the figure legends to clarify scalebar units, and revised the methods section. 

      In Figure 4, qualitative measurements are valuable, but having an idea of the raw data for a few of the pursued chemokines/receptors would aid interpretation

      All of the SpatialFeaturePlots utilized to generate Figure 4 have been included in the manuscript, either in the main figures or in the supplemental figures. For example, the SpatialFeaturePlots of Cxcl4, Cxcl9, and Cxcl10 are all in Figure 4 – figure supplement 1.

      In Figure 4 it would also be beneficial to clarify whether the reported values are across all clusters and consider focusing on clusters with the greatest change in expression.

      Figure 4 summarizes the expression of each gene at each timepoint for the entire selected area, independently of cluster identity. Different clusters do show variability in the relative change in expression. To better show these data, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster, many of which include chemokines (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.   

      Figures 5E and F would benefit from clarification regarding the x-axis units and whether the expression levels are summed across all clusters for each time point

      Figures 5E and 5F display the normalized gene expression values for all spots (independent of cluster identity) at each timepoint. We have updated the figure legend to reflect this clarification.

      Additionally, information on the sequencing depth of the samples would be helpful, particularly as shallow sequencing of RNA can result in poor capture of low-expression transcripts.

      We agree with the reviewer that sequencing depth is an additional factor to take into consideration. We have included an additional supplemental figure (Figure 1 – figure supplement 1A-B) to display raw counts spatially at the various timepoints, and within each cluster.

      Regarding the conclusion of the essentiality of macrophages in granuloma formation, it may be prudent to further investigate the role of macrophages versus CCR2. Consideration of experiments deleting macrophages directly, instead of CCR2, could provide more definitive evidence of the necessity of macrophage migration in containing infections.

      While CCR2 is expressed on a number of other cells besides monocytes, it is well-documented that loss of CCR2 results in accumulation of monocytes in the bone marrow and a significant reduction in the blood-monocyte population. As a result, monocytes are not recruited to the site of infection in numerous prior publications in the field; we confirm this as shown by flow cytometry and IHC. Nonetheless, future studies will aim to rescue Ccr2–/– mice via adoptive transfer of monocytes to further show that monocyte-derived macrophages are essential for defense against infection. We also intend to perform clodronate depletion experiments at various timepoints, however, clodronate will also deplete Kupffer cells and has off-target effects on neutrophils. Overall, the established importance of CCR2 for monocyte egress from the bone marrow and our observation that the macrophage ring fails to form give us sufficient confidence to conclude that monocyte-derived macrophages are essential for this innate granuloma.

      Analyzing total cell counts in the liver after infection could provide insight into whether the decrease in the fraction of macrophages is due to decreased numbers or infiltration of other cell types...

      Our flow data suggest that the decrease in macrophages in Ccr2–/– mice is due to both a decrease in macrophage number and an increase in the infiltration of other cell types (namely neutrophils). To better illustrate this, we now include an additional quantification of the total cell counts in the liver and spleen (new Figure 6 – figure supplement 1), which supports our conclusion that Ccr2–/– mice have a defect in granuloma macrophage numbers. We have also repeated the experiment to reach sufficient numbers to perform statistical analysis (revised Figure 6F–K).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil- and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      We thank the reviewer for their thoughtful comments and suggestions.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil- and monocyte-recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result...

      We agree with the reviewer, and the limited solubility of reparixin and other chemokine-receptor inhibitors is a major caveat of this study and others in the field. In future studies, there are several other inhibitors that could be used to further assess the role of CXCR1/2.

      On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control. I would hesitate to agree with the authors' interpretation that their data proves macrophages are serving as a physical barrier from the uninvolved liver. It is possible and likely that they are contributing to bacterial control through direct immunological activity and not simply as a structural barrier.

      We agree that macrophages do not form a physical or structural barrier, a word that implies epithelial-like function. Instead, we agree that macrophages mostly act immunologically. We revised the text to remove the term barrier.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per time point. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability.

      Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous, both between and within individuals. It will be difficult to assert how broadly generalizable the transcriptomic features are to other C. violaceum granulomas...

      We thank the reviewers for highlighting this key difference between granulomas in other infectious diseases, and granulomas induced by C. violaceum. Based on many prior experiments, we observe that C. violaceum-induced granulomas are very reproducible between and within individuals (highlighted in our previous publication). As this is a major advantage of this model system, we chose specific timepoints based on key events that consistently occur in the majority of lesions assessed at each timepoint, allowing us to be confident in the selection of representative granulomas. However, it is worth noting that granulomas within an individual mouse are seeded and resolved somewhat asynchronously. This did indeed affect our spatial transcriptomic data, as the 7 DPI timepoint was not histologically representative of a typical 7 DPI granuloma. Therefore, we excluded the 7 DPI timepoint from our analyses.

      Furthermore, this undermines any opportunity for statistical testing of features between time points, limiting the potential value of the temporal data.

      We agree with the reviewer that there is much more characterization and quantification that can be done. As demonstrated by the abundance of spatial and temporal data for the chemokine family alone, the spatial transcriptomics dataset is rich and will likely supply us with many years of analyses and investigations. Our current approach is to use the spatial transcriptomics dataset as a hypothesis-generating tool, followed by in vivo studies that seek to uncover physiological relevance for our observations. In the current paper, the strength of the spatial transcriptomic data for CCL2, CCL7 and their receptor CCR2 prompted us to study Ccr2–/– mice. These mice then prove the relevance of the spatial transcriptomic data. In regard to conclusions about temporal changes in chemokine expression, in this manuscript we do not make conclusions that CCL2 is important at one timepoint but not another. We are characterizing the broad temporal trends of expression in order to cast a broad net to inform future in vivo studies. There is much work for us to do to explore all the induced chemokines and their receptors.

      Another caveat to these data is the limited or incompletely informative data analysis. The authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, it would be beneficial to see more general analyses considering Visum captures the entire transcriptome. Some important questions that are left unanswered from this study are:

      What major genes defined each spatial cluster?...

      The initial characterization of each spatial cluster was performed in Harvest et al., 2023. In brief, we used a mixture of published single-cell sequencing data, histological-based parameters, and ImmGen to define each cluster. We have not re-stated those methods in the current manuscript, but instead reference our prior paper.

      What were the top differentially expressed genes across time points of infection?...

      Though the top differentially expressed genes for each cluster can be informative in some situations, we chose a more targeted approach because of the obvious importance of chemokines. Nonetheless, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.  

      Did the authors choose to focus on chemokines/receptors purely from a hypothesis perspective or did chemokines represent a major signature in the transcriptomic differences across time points?

      We chose to focus on chemokines because of their obvious importance for recruitment of immune cells. They were also among the highest induced genes in the spatial transcriptome (new Table 4).

      In addition to the absence of deep characterization of the spatial transcriptomic data, the study lacks sufficient quantitative analysis to back up the authors' qualitative assessments...

      See above comment regarding statistical comparisons.

      Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors.

      Several factors make quantification challenging. Lesions grow considerably in size in the first few days of infection, and then shrink in size in the latter days. This makes quantification challenging between timepoints. Radial quantification is also challenging due to the irregular shapes of each granuloma (see comment below for further discussion). Most importantly, the key next experiments are to validate the importance of each chemokine and receptor in vivo. Once we know which ones are the most important, this will justify putting more effort into spatial quantitative analysis and patterning of expression for those chemokines. 

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As the authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The Visium analysis would be strengthened by

      (1) Showing several histology examples of granulomas at each timepoint to help aid the reader in seeing how 'representative' each Visium sample is...

      These histological analyses are performed in our previous manuscript, and indeed were a crucial aspect of the initial characterization of the spatial transcriptomics dataset, which was performed in Harvest et al., 2023. Full liver sections are shown in that paper at each timepoint, and readers can see that the architecture is highly reproducible.

      (2) Validating their results in other tissues, either with Visium or with more targeted assays for their study's key molecules, such as immunohistochemistry or in situ hybridization

      We agree on the importance of validation studies and have plans to perform single-cell RNA sequencing experiments to further enhance resolution. With key genes in mind, we then plan to perform more in vivo studies to assess physiological relevance of upregulated genes in specific cell types.

      At the very least it would be important to validate the expression of CXCL1 and CXCL2 in other tissues and at the protein level, given the importance of those findings

      We think that the reviewer is asking us to validate that CXCL1 and CXCL2 are actually expressed given the negative reparixin data. However, if we do prove that they are expressed, this will not resolve whether they have critical roles in neutrophil recruitment. To prove this, we would need either a better CXCR2 inhibitor or Cxcr2 knockout mice. Therefore, we are saving further exploration for the future. Regarding validating other chemokines, we establish that CCR2 is critical, and we now show by immunofluorescence and ELISA (new Figure 7 – figure supplement 4) that CCL2 is highly expressed in WT mice, and Ccr2–/– mice actually have strongly elevated CCL2 expression at 3 DPI compared to WT mice.

      In Figure 1B, the UMAP here is largely uninformative. To display the clusters, the authors should instead show a heatmap or equivalent visualization of which genes defined each cluster. It would be helpful for the authors to also write out the full name of each cluster before using the abbreviations shown.

      Please see our previous comment about the initial characterization of clusters performed in Harvest et al., 2023, which details the characteristic genes for each cluster. We have written the full names of each cluster in the legend of Figure 1.

      In Figure 1C the authors, use a binary representation of whether a cluster is present or not at a particular time point. However, the spot size is arbitrary, and the colors of the dots are the same as the cluster color code. It is not clear what threshold the authors (or SpatialDimPlots) use to declare a given cluster is present at a given time point. Therefore, this chart does not give any sense of the extent of each cluster's presence at each time. The authors should revisualize these data to display the abundance of each cluster at each timepoint. This could simply be done by adjusting the size of the circle or using a more traditional heatmap.

      We have now updated this graphic to display the extent of a cluster’s presence, with the size of each dot corresponding to the abundance of each cluster.

      In Figures 2 and 3 the authors describe the kinetics of each chemokine by cluster. While the dynamic expression is evident in the images, it is challenging to determine which clusters are driving expression in the absence of cluster annotation in those figures. The authors should support their visual findings with quantification of each factor in each cluster across time points.

      In Figure 5, violin plots are shown for Cxcl1 and Ccl2 that depict gene expression by each cluster. However, because each capture area is approximately 50 µm in diameter, the data do not achieve single-cell resolution and are not as informative as one would hope. Therefore, violin plots for each chemokine were not shown, though we have generated these graphics. We did not add these graphics to the revision because we did not think readers would generally want to see several pages of violin plots in the supplement. As mentioned, we plan to do single-cell RNA sequencing to further assess chemokine expression by each cell type present within the granulomas at key timepoints.

      With respect to the lack of spatial analysis, the authors describe certain transcript signals (ie. peripheral region versus central region of the granuloma) across each lesion. To back up these qualitative assertions, the authors could use line profiles from the center of each granuloma to the outside to plot the variation in expression of each transcript over radial space. This would provide a more direct way to determine the spatial coordination between various transcripts.

      We considered using line profiles to quantify spatial variation within each lesion at each timepoint. However, this was exceptionally challenging due to the asymmetrical nature of some lesions, and the size discrepancy at different timepoints as the granulomas grow (during infection) and shrink (during resolution). When attempting to decide where to draw the line profiles, we determined that this approach did not enhance our analyses beyond using the cluster overlay and H&E to identify and interrogate different clusters.

      The data visualization in Figure 4 seems unnecessarily confusing. The authors put the transcriptomic signal into categories of 'absent', 'low', 'medium', and 'high.' Why not simply use a continuous scale? The data would also benefit from hierarchical clustering of the heatmap rows to highlight chemokines and their receptors with similar expression patterns across time.

      We considered using a continuous scale as suggested by the reviewer. However, we chose not to create a continuous scale because quantitation is challenging due to the size changes in the lesions over time, such that larger lesions have greater inclusion of surrounding hepatocytes as well as necrotic cores, which would dilute the signal if averaged with the active immunologic granuloma zones. Figure 4 was intended to simplify the entirety of the SpatialFeaturePlots in an easy-to-digest manner, to aid in hypothesis generation as we consider the potential function of each chemokine and receptor in this model. We chose to organize each chemokine ligand based on family, maintaining a numerical order to allow Figure 4 to serve as a quick reference for anyone who is interested in a particular chemokine ligand or receptor.

      Do the authors feel confident in the transcriptomic signal coming from regions of necrosis? Given that many of their bright signals are coming from within clusters annotated as necrosis or necrosis-adjacent this raises an important technical consideration. Can the authors use the H&E image to estimate the cellular density (based on nuclear counts) in each region annotated by Visium? Are there any studies supporting the accurate performance of spatial transcriptomic methods in necrosis? Necrosis can be a source of non-specific binding during in situ hybridization assays.

      The reviewer raises a good point. A defining characteristic of the areas of necrosis is the lack of defined cell borders, with faded or absent nuclei. In these regions, it is impossible to estimate cellular density. Given these concerns, we have included an additional figure (new Figure 1 – figure supplement 1A-B) to display raw counts in each cluster across all timepoints. Though regions of necrosis do display lower read quantity compared to other areas, we are still confident in the positive transcriptomic signal coming from adjacent regions because there are plenty of negative examples in which expression is not detected. In other words, temporal and spatial upregulation of key genes is still observed in the tissues, and future experiments will aim to interrogate the physiological relevance of each gene, while validating the spatial transcriptomics data with other methodologies.

      The methods should include a much more detailed description of the tissue preparation and collection for the Visium experiment. The section on the computational analysis of the Visium data is also extremely limited. At a minimum, the authors should include details on how they performed clustering of the Visium regions.

      The detailed description of tissue preparation, computational analysis, and clustering is in our previous manuscript, from which this dataset originates. We can add a direct quote of the methodology if the reviewer requests.

      The cluster labels in Figure 5 A-B are very difficult to see. Furthermore, it would help if the authors displayed the annotated cluster names (ie. Those shown in 5C) instead of their numerical coding for a more direct interpretation of the data.

      We agree and have updated this figure with annotated cluster names.

      The scale bars in Figure 7 are very difficult to see.

      The scale bars in histology images were kept small intentionally so as not to occlude data, and eLife is an online-only, digital media platform which allows readers to sufficiently zoom on high-resolution histology images. We have increased the DPI resolution for histology images to further aid in visualization.

      The information presented in Tables 2 and 3 is greatly appreciated and will really help guide the reader through the analyses.

      We assembled this information for our own learning about chemokines and hope that it is useful for the reader.

    1. eLife assessment

      By developing a framework to integrate metagenomic and metabolomic data with genome-scale metabolic models, this study establishes a toolkit to investigate trophic interactions between microbiota members in situ. The authors apply this method to the native rhizosphere bacterial communities of apple rootstocks, producing solid evidence and numerous detailed hypotheses on specific trophic exchanges and resource dependencies. The framework represents a valuable method to disentangle features of microbial interaction networks and will be of interest to microbiome scientists as well as plant and computational biologists.

    2. Reviewer #1 (Public review):

      The work by Ginatt et al. uses genome-scale metabolic modeling to identify and characterize trophic interactions between rhizosphere-associated bacteria. Beyond identifying microbial species associated with specific host and soil traits (e.g., disease tolerance), a detailed understanding of the interactions underlying these associations is necessary for developing targeted microbiome-centered interventions for plant health. It has nonetheless remained challenging to define the roles of specific organisms and metabolic species in natural rhizobiomes. Here, the authors combine microbial compositional data obtained through metagenomic sequencing with a new collection of genome-scale models to predict interactions in the native rhizosphere communities of apple rootstocks. To do this, they have established processes to integrate these sources of data and model specific trophic exchanges, which they use to obtain testable hypotheses for targeted modulation of microbiota members in situ.

      The authors carry out a careful model curation process based on metagenomic sequencing data and existing model generation tools, which, together with basing the in silico medium composition on known root exudates, strengthens their predictions of interaction network features. Moreover, its reliance on genome-scale models provides a broader basis for linking sequence-based information to predictions of function on a multispecies level beyond rhizosphere microbiomes.

      Having generated a set of predicted trophic interactions, the authors carried out a detailed analysis linking features of these interactions to organism taxonomy and broader ecosystem properties. Intriguingly, the organisms predicted to grow in the first iteration of their framework (i.e., on only root exudates) broadly correspond to taxonomic groups experimentally shown to benefit from these compounds. Additionally, the simulations predicted some patterns of vitamin and amino acid secretion that are known to form the basis for interactions in the rhizosphere. Together, these outcomes underscore the applicability of this method to help disentangle trophic interaction networks in complex microbiomes.

      The methodology described in this paper represents a useful and promising framework to better understand the complexity of microbial interaction networks in situ. In particular, the authors' simulation of trophic interactions based on cellulose degradation have generated predictions of interactions that can more readily be validated. While a more complete analysis of the method's sensitivity to environmental composition is still needed to fully interpret its conclusions - particularly those predicting the inability of many of the in silico organisms to produce biomass - it represents a valuable addition to the growing toolkit of computational and experimental methods for generating educated hypotheses on complex trophic networks.

    3. Reviewer #3 (Public review):

      Summary:

      This study presents a solid framework for the metabolic modeling of microbial species and resources in the rhizosphere environment. It is an ambitious effort to tackle the huge complexity of the rhizosphere and reveal the plant-microbiota interactions therein. Considering previously published data by Berihu et al., going through a series of steps, the framework then finds associations between an apple tree disease state and both microbes and metabolites. The framework is well explained and motivated. I think that further work should be done to validate the method, both using synthetic data, with a known ground truth and following up on key findings experimentally.

      Strengths:

      - The manuscript is well written with a good balance between detail and readability. The framework steps are well motivated and explained.

      - The authors faithfully acknowledge the limitations of their approach and do not try to "over-sell" their conclusions.

      - The presented framework has potential for significant discovery if the hypotheses generated are followed up with experimental validation.

      Weaknesses:

      - It would be better for the framework to be validated on synthetic data.

      Justification of claims and conclusions:

      The claims and conclusions are sufficiently well justified since the limitations of this approach are acknowledged by the authors.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      …the degree to which the predictions can vary according to environmental composition remains difficult to quantify, and the work does not address the sensitivity of the modeling predictions beyond a simulated medium containing 33 root exudates. I find this especially important given that relatively few (84 of 243) species were predicted to grow even after cross-feeding, suggesting that a richer medium could lead to different interaction network structures. While the authors do state the importance of environmental composition and have carefully designed an in silico medium, I believe that simulating a broader set of resource pools would add necessary insight into both the predictive power of the models themselves and trophic interactions in the rhizosphere more generally.

      The original analyses were indeed focused on a single well-defined environment supporting the growth of only a subset of the species. We have added a paragraph to the discussion section dealing with the potential limitations of this approach. 

      On line 289 we write:

      "Overall, the successive iterations connected 84 out of 243 native members of the apple rhizosphere GSMM community via trophic exchanges. The inability of the remaining bacteria to grow, despite being part of the native root microbiome, possibly reflects the selectiveness of the root environment, which fully supports the nutritional demands of only part of the soil species, whereas specific compounds that might be essential to other species are less abundant1. It is important to note that the specific exudate profile used here represent a snapshot of the root metabolome as root secretion-profiles are highly dynamic, reflecting both environmental and plant developmental conditions. A possible complementary explanation to the observed selective growth might be the partiality of our simulation platform, which examined only plant-bacteria and bacteria-bacteria interactions while ignoring other critical components of the rhizosphere system such as fungi, archaea, protists and mesofauna, as well as less abundant bacterial species, components all known to metabolically interact2. Finally, the MAG collection, while relatively substantial, represents only part of the microbial community. Accordingly, the iterative growth simulations represent a subset of the overall hierarchical-trophic exchanges in the root environment, necessarily reflecting the partiality of the dataset."

      In addition, we have tried to better explain the advantages of a limited/defined medium to such an analysis. On Line 231 we add:

      "By avoiding the inclusion of non-exudate organic metabolites, the true-to-source rhizosphere environment was designed to reveal the hierarchical directionality of the trophic exchanges in soil, as rich media often mask various trophic interactions taking place in native communities3"

      More generally, beyond the above justification of our specific medium selection, we agree that simulating a broader set of resource pools would contribute to a more comprehensive understanding of the trophic interactions. Therefore, we conducted the analysis in an additional environment, in which cellulose was used as an input. We were able to follow its well-documented degradation via multiple steps, conducted by different community members, to serve as a benchmark to our suggested framework. 

      On line 357 we add:

      "To validate the ability of MCSM to capture trophic dependencies and succession, we further tested whether it can trace the well-documented example of cellulose degradation - a multi-step process conducted by several bacterial strains that go through the conversion of cellulose and its oligosaccharide derivatives into ethanol, acetate and glucose, which are all eventually oxidized to CO24. Here, the simulation followed the trophic interactions in an environment provided with cellulose oligosaccharides (4 and 6 glucose units) on the 1st iteration (Supp. Table 3). The formed trophic successions detected along iterations captured the reported multi-step process (Supp.

      Fig.7)." 

      Finally, we have included additional text regarding the challenge of defining our simulation environment in the Discussion section. 

      On line 532 we add:

      "In the current study, the root environment was represented by a single pool of resources (metabolites). As genuine root environments are highly dynamic and responsive to stimuli, a single environment can represent, at best, a temporary snapshot of the conditions. Conductance of simulations with several sets of resource pools (e.g., representing temporal variations in exudation profile) can add insights regarding their effect on trophic interactions and community dynamics. In parallel, confirming predictions made in various environments will support an iterative process that will strengthen the predictive power of the framework and improve its accuracy as a tool for generating testable hypotheses. Similarly, complementing the genomicsbased approaches used here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."

      And we add in Line 520:

      "For these reasons, among others, the framework presented here is not intended to be used as a stand-alone tool for determining microbial function. The framework presented is designed to be used as a platform to generate educated hypotheses regarding bacterial function in a specific environment in conjunction with actual carbon substrates available in the particular ecosystem under study. The hypotheses generated provide a starting point for experimental testing required to gain actual, targeted and feasible applicable insights6,7. While recognizing its limitations, this framework is in fact highly versatile and can be used for the characterization of a variety of microbial communities and environments. Given a set of MAGs derived from a specific environment and environmental metabolomics data, this computational framework provides a generic simulation platform for a wide and diverse range of future applications." 

      Reviewer #2 (Public review):

      There are two main drawback approaches like the one described here, both related only partially to the authors' work yet with great impact in the presented framework. First, the usage of automatic GSMM reconstruction requires great caution. It is indicative of how the semicurated AGORA models are still considered reconstructions and expect the user to parameterize those in a model. In this study, CarveMe was used. CarveMe is a well-known tool with several pros [1]. Yet, several challenges need to be considered when using it [2]. For example, the biomass function used might lead to an overestimation of auxotrophies. Also, as its authors admit in their reply paper, CarveMe does gap fill in a way [3]; models are constructed to ensure no gaps and also secure a minimum growth. However, curation of such a high number of GSMMs is probably not an option. Further, even if FVA is way more useful than FBA for the authors' aim, it does not yet ensure that when a species secretes one compound (let's say metabolite A), the same flux vector, i.e. the same metabolic functioning profile, secretes another compound (metabolite B) at the same time, even if the FVA solution suggests that metabolite B could be secreted in general.

      We thank Reviewer #2 for highlighting this key limitation of our analysis. Below and in the 'recommendations to authors' section we address these concerns. 

      Concerning the first point raised (models' accuracy) we have now clearly acknowledged in the text the limitations of using an automated GSMM reconstruction tool such as CarveMe. More generally, the framework applied here was built in order to meet the challenges of analyzing highthroughput data while acknowledging the inherent potential of introducing inaccuracies. Pros & cons are now discussed. 

      On line 507 we write:

      "Moreover, the use of an automatic GSMM reconstruction tool (CarveMe8), though increasingly used for depicting phenotypic landscapes, is typically less accurate than manual curation of metabolic models9. This approach typically neglects specialized functions involving secondary metabolism10 and introduces additional biases such as the overestimation of auxotrophies11,12. Nevertheless, manual curation is practically non-realistic for hundreds of MAGs, an expected outcome considering the volume of nowadays sequencing projects. As the primary motivation of this framework is the development of a tool capable of transforming high-throughput, low-cost genomic information into testable predictions, the use of automatic metabolic network reconstruction tools was favored, despite their inherent limitations, in pursuit of addressing the necessity of pipelines systematically analyzing metagenomics data." 

      Regarding using FVA solutions, indeed such solutions return all potential metabolic fluxes in GSMMs (ranges of all fluxes satisfying the objective function, which by default is set to biomass increase) in a given environment. However, as indicated by the reviewer, predicted fluxes do not necessarily co-occur (i.e., when a metabolite is secreted another metabolite is not necessarily secreted too), yet, they provide the full set of potential solutions (unlike the single solution provided by FBA). A possible strategy to reduce inflated predictions provided by FVA and further constrain the solution space (reduce the set of metabolic fluxes) can be the incorporation of additional `omics data layers, as for example was done in the work of Zampieri et al5. Such approach could allow for instance limiting active reactions (blocking fluxes) from the network reconstructions if not coming to play in situ, and therefore impose further constraints and narrow the solution space. We now refer in the text to this limitation and to potential routes to overcome it. 

      On line 541 we now write:

      Similarly, complementing the genomics-based approaches done here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5.  

      Reviewer #3 (Public review):

      When presenting a computational framework, best practices include running it on artificial (synthetic) data where the ground truth is known and therefore the precision and accuracy of the method may be assessed. This is not an optional step, the same way that positive/negative controls in lab experiments are not optional. Without this validation step, the manuscript is severely limited. The authors should ask themselves: what have we done to convince the reader that the framework actually works, at least on our minimal synthetic data? 

      Thank you for this suggestion. To validate the ability of MCSM to capture trophic succession, we conducted an additional analysis testing whether it can track the well documented example of cellulose degradation - a multi-step process conducted by several bacterial strains. This example has been included in the manuscript to serve as a case study (i.e. positive control) for metabolic interactions occurring within the bacterial community (Supp. Fig. 7). 

      On line 357 we add:

      "To validate the ability of MCSM to capture trophic dependencies and succession, we further tested whether it can track the well-documented example of cellulose degradation - a multi-step process conducted by several bacterial strains that go through the conversion of cellulose and its oligosaccharide derivatives into ethanol, acetate and glucose, which are all eventually oxidized to CO24. Here, the simulation followed the trophic interactions in an environment provided with cellulose oligosaccharides (4 and 6 glucose units) on the 1st iteration (Supp. Table 3). The formed trophic successions detected along iterations captured the reported multi-step process (Supp. Fig.

      7)."  

      "Supplementary Figure 7. Application of MCSM over the process of cellulose decomposition as described by Kato et al4. 5-partite network exhibiting the uptake of cellulose oligomers (4 and 6 units of connected D-glucose) by primary decomposers, through secretion of intermediate compounds and their metabolization by secondary decomposers to CO2. Distribution of phyla of primary and secondary decomposers is denoted by pie charts. Though MAGs were not constructed for the original species as in Kato et al., among the primary consumers, species corresponding to the Acidobacteria (Acidobacteriales)13, Actinobacteria14, Bacteriodetes15, Proteobacteria (Xanthomonadales)16 and Verrucobacteria17 groups are found to be capable of degrading cellulose compounds via enzymatic mechanisms."

      More generally, beyond the above addition, the relevance of the framework to the analysis of the data is discussed throughout the analysis (in the original version of the manuscript). We have scrutinized each of our observations in light of current available information and provided a corroborating evidence as well as a few discrepancies for multiple steps in the analysis.  Examples include the following discussions:

      On line 312, we discuss the biological relevance of taxonomic classes classified as primary versus secondary degraders

      "As in the full GSMM data set (Community bar, Fig. 3C), most of the species which grew in the 1st iteration belonged to the phyla Acidobacteriota, Proteobacteria, and Bacteroidota. This result concurred with findings from the work of Zhalnina et al, which reported that bacteria assigned to these phyla are the primary beneficiaries of root exudates18. Species from three out of the 17 phyla that did not grow in the first iteration - Elusimicrobiota, Chlamydiota, and Fibrobacterota, did grow on the 2nd iteration (Fig. 3C). Members of these phyla are known for their specialized metabolic dependencies. Such is the case for example with members of the Elusimicrobiota phylum, which include mostly uncultured species whose nutritional preferences are likely to be selective19.

      At the order level, bacteria classified as Sphingomonadales (class Alphaproteobacteria), a group known to include typical inhabitants of the root environment20, grew in the initial Root environment. In comparison, other root-inhabiting groups including the orders Rhizobiales and Burkholderiales_20, did not grow in the first iteration. _Rhizobiales and Burkholderiales did, however, grow in the second and third iterations, respectively, indicating that in the simulations, the growth of these groups was dependent on exchange metabolites secreted by other community members (Supp. Fig. 4)."

      On line 331, we provide support to the classification of specific metabolites as exchange molecules

      "Overall, 158 organic compounds were secreted throughout the MCSM simulation (from which 12 compounds overlapped with the original exudate medium). These compounds varied in their distribution and were mapped into 12 biochemical categories (Fig. 3D). Whereas plant secretions are a source of various organic compounds, microbial secretions provide a source of multiple vitamins and co-factors not secreted by the plant. Microbial-secreted compounds included siderophores (staphyloferrin, salmochelin, pyoverdine, and enterochelin), vitamins (pyridoxine, pantothenate, and thiamin), and coenzymes (coenzyme A, flavin adenine dinucleotide, and flavin mononucleotide) – all known to be exchange compounds in microbial communities21,22. In addition, microbial secretions included 11 amino acids (arginine, lysine, threonine, alanine, serine, phenylalanine, tyrosine, leucine, glutamate, isoleucine, and methionine), also known as a common exchange currency in microbial communities23. Some microbial-secreted compounds, such as phenols and alkaloids, were reported to be produced by plants as secondary metabolites24,25. Additional information regarding mean uptake and secretion degrees of compounds classified to biochemical groups is found in Supp. Fig. 5."

      On line 432, we provide corroborative support to the classification of exudates as associated with beneficial/non beneficial root communities

      "Notably, the S-classified root exudates included compounds reported to support dysbiosis and ARD progression. For example, the S-classified compounds gallic acid and caffeic acid (3,4-dihidroxy-trans-cinnamate) are phenylpropanoids – phenylalanine intermediate phenolic compounds secreted from plant roots following exposure to replant pathogens26. Though secretion of these compounds is considered a defense response, it is hypothesized that high levels of phenolic compounds can have autotoxic effects, potentially exacerbating ARD. Additionally, it was shown that genes associated with the production of caffeic acid were upregulated in ARD-infected apple roots, relative to those grown in γ-irradiated ARD soil27,28, and that root and soil extracts from replant-diseased trees inhibited apple seedling growth and resulted in increased seedling root production of caffeic acid29."

      On line 446, we provide a supporting evidence to the classification of secreted compounds as associated with beneficial/non beneficial root communities

      "Several secreted compounds classified as healthy exchanges (H) were reported to be potentially associated with beneficial functions. For instance, the compounds L-Sorbose (EX_srb__L_e) and Phenylacetaladehyde (EX_pacald_e), both over-represented in H paths (Fig. 5C), have been shown to inhibit the growth of fungal pathogens associated with replant disease30,31.

      Phenylacetaladehyde has also been reported to have nematicidal qualities32."

      On line 453 we discuss the correspondence of specific exudate uptakes and compound secretions via specific subnetwork motifs (PM) and their literature/experimental evidence 

      "Combining both exudate uptake data and metabolite secretion data, the full H-classified PM path 4-Hydroxybenzoate; GSMM_091; catechol (Fig. 4C; the consumed exudate, the GSMM, and the secreted compound, respectively) provides an exemplary model for how the proposed framework can be used to guide the design of strategies which support specific, advantageous exchanges within the rhizobiome. The root exudate 4-Hydroxybenzoate is metabolized by GSMM_091 (class Verrucomicrobiae, order Pedosphaerales) to catechol. Catechol is a precursor of a number of catecholamines, a group of compounds which was recently shown to increase apple tolerance to ARD symptoms when added to orchard6,33. This analysis (PM; Fig 4C), leads to formulating the testable prediction that 4-Hydroxybenzoate can serve as a selective enhancer of catecholamine synthesizing bacteria associated with reduced ARD symptoms, and therefore serve as a potential source for indigenously produced beneficial compounds."

      Moreover, we perceive our analysis as a strategy for integrating high throughput genomic data into testable predictions allowing narrowing the solution space while acknowledging potential inaccuracies that are inherent to the analysis. We have revised the text in order to clearly acknowledge this limitation.

      On line 497 we write: 

      "The framework we present is currently conceptual."

      On line 520 we write: 

      "For these reasons, among others, the framework presented here is not intended to be used as a stand-alone tool for determining microbial function. The framework presented is designed to be used as a platform to generate educated hypotheses regarding bacterial function in a specific environment in conjunction with actual carbon substrates available in the particular ecosystem under study. The hypotheses generated provide a start point for experimental testing required to gain actual, targeted and feasibly applicable insights6,7."

      On line 532 we add: 

      "In the current study, the root environment was represented by a single pool of resources (metabolites). As genuine root environments are highly dynamic and responsive to stimuli, a single environment can represent, at best, a temporary snapshot of the conditions. Conductance of simulations with several sets of resource pools (e.g., representing temporal variations in exudation profile) can add insights regarding their effect on trophic interactions and community dynamics. In parallel, confirming predictions made in various environments will support an iterative process that will strengthen the predictive power of the framework and improve its accuracy as a tool for generating testable hypotheses. Similarly, complementing the genomicsbased approaches used here with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."

      Recommendations for the authors:

      Reviewer #1( Recommendations for the authors):

      (1) Line 219: "Feasibility" - this term/concept may be difficult to understand for readers unfamiliar with GSMMs. I would recommend either clarifying or rephrasing, perhaps as "simulations confirmed the existence of a feasible solution space for all the 243 models, as well as their capacity to predict growth in the respective environment."

      Thanks, done. We have modified this section as suggested (line 221). 

      (2) Line 244: How does MCSM fit within/build upon existing frameworks that simulate patterns of niche construction and cross-feeding with constraint-based modeling?

      This is now addressed. On line 250 we write:  

      "Unlike tools designed for modelling microbial interactions34,35, MCSM bypasses the need for defining a community objective function as the growth of each species is simulated individually. Trophic interactions are then inferred by the extent to which compounds secreted by bacteria could support the growth of other community members."

      (3) Figure 4A: While illustrating the general complexity of the predicted trophic interactions, the density of the network makes it very difficult to interpret specific exchanges. Moreover, the naming conventions of the metabolites make it difficult to understand what they represent. I would recommend either restructuring the graph such that the label of each node is legible, or removing the labels altogether.

      Thanks, done. Labels were removed and a zoom-in-window to the exchanges highlighted in Figure 4C were added. Caption was revised to indicate that node colors correspond to differential abundance classification of GSMMs in the different plots (H, S, NA are Healthy, Sick, Not-Associated, respectively).

      Reviewer #2 (Recommendations for the authors):

      CarveMe solves a Mixed Integer Linear Program (MILP) that enforces network connectivity, thus requiring gapless pathways. It's puzzling how to deal with such a great number of GSMMs that is for sure, especially when coming from such an environment as soil and the vast majority of their corresponding MAGs represent most likely novel taxa. One alternative approach for using CarveMe might be to use the rich medium as a medium to gap-fill during the reconstruction. In this case, the gene annotation scores that CarveMe calculates in its initial step, are used to prioritise the reactions selected for gap-filling. This would lead to a new series of challenges but might be a useful comparison with the current GSMMs of the study.

      Though indeed CraveMe includes a gap-filling option, here we have purposely avoided the gapfilling option as we aimed to adhere to genomic content of the corresponding genomes and to avoid masking their metabolic dependencies emerging due to their incompleteness. This is noted in the Methods section, which we revised to emphasize the adherence to the genomic content of the models: 

      On line 615 we now write:

      "All GSMMs were drafted without gap filling in order to adhere to genomic content and to avoid masking metabolic co-dependencies51"

      More generally, we now refer to the limitation of automatic reconstruction in the context of the current analysis. On line 507 we write:

      "Moreover, the use of an automatic GSMM reconstruction tool (CarveMe8), though increasingly used for depicting phenotypic landscapes, is typically less accurate than manual curation of metabolic models9. This approach typically neglects specialized functions involving secondary metabolism10 and introduces additional biases such as the overestimation of auxotrophies11,12. Nevertheless, manual curation is practically non-realistic for hundreds of MAGs, an expected outcome considering the volume of nowadays sequencing projects. As the primary motivation of this framework is the development of a tool capable of transforming high-throughput, low-cost genomic information into testable predictions, the use of automatic, semi-curated, metabolic network reconstruction tools was favored, despite their inherent limitations, in pursuit of developing pipelines for the systematic analysis of metagenomics data."

      Thermodynamically infeasible loops have been a challenge in constraint-based analysis [1].

      However, for the case of FBA and FVA time efficient implementations are already available. Therefore, I would suggest using the loopless flag of the cobrapy package when performing FVA. 

      Also, it would be nice to show/discuss how many exchange reactions each GSMM includes and what is the number of those with at least a non-zero minimum or maximum in the FVA using each of the three media.

      Done. In Supplementary Figure 4, we added a graphic summary of active FVA ranges for each GSMM in the three different environments (exchange reactions, non-zero flux). Additionally, we analyzed a subset of models and compared their regular FVA results vs loopless FVA results.

      On line 217 we write:

      "The number of active exchange fluxes in each medium corresponds with the respective growth performances displaying noticably higher number of potentially active fluxes in the rich environment (also when applying loopless FVA) (Supp. Fig. 4). Overall, Simulations confirmed the existence of a feasible solution space for  all the 243 models as well as their capacity to predict growth in the respective environemnt (Supp. Data 5)."

      "Supplementary Figure 4. FVA performances of GSMMs in different environments (Supp. Fig.

      3; Supp. Data 5). A. Distribution of potentially active exchange reactions (non-zero minimum FVA flux) in the different environments. Solid line inside each violin indicates the interquartile range (IQR). White point in IQR indicates the median value. Whiskers extending from the IQR indicate the range within 1.5 times the IQR from the quartiles. Violin width at a given value represents the density of data points at that value. B. Loopless FVA scores compared to regular FVA for models in the 3 different environments. Bars indicate the count of active fluxes (nonzero minimum FVA flux). Only a subset of models was used for this analysis."

      This brings us to the main challenge of your framework in my opinion: FVA returns the minimum and the maximum a flux may get. However, it does not ensure that when a metabolite is being secreted, another does the same too. That could lead to an overrepresentation of secreted metabolites after each iteration. To my understanding, unbiased methods focusing on metabolite exchanges would be a much better alternative for such questions. Unbiased constraint-based methods are known for requiring essential computational requirements, yet when focusing on specific parts of the models, recent implementations support them. A great showcase of such techniques is presented in [2].

      Indeed, FVA solutions return all potential metabolic fluxes in GSMMs (ranges of all fluxes satisfying the objective function, which by default is set to biomass increase) but they do not ensure that all fluxes actually co-occur (i.e., when a metabolite is secreted necessarily another metabolite is secreted too). However, though FVA solutions do not necessarily ensure cooccurrence regarding secretion and uptake, they provide a broader metabolic picture (the full set of potential solutions), unlike the arbitrary single solution provided by FBA, which is limited in providing information about potential secretions and uptakes in a specific environment. Here, we tried to elucidate the connection between a specific environment (root exudates) and the growth and metabolic capabilities of native bacteria. To the best of our understanding,  unbiased approaches (such as the one displayed in Wedmark et al.36) are not environment dependent but rather calculate all possible metabolic elements and routes within a metabolic network. Therefore, using FVA is well adapted to explore environment-dependent growth. The sensitivity of FVA predicted active fluxes to the environments is now also implied by Sup. Fig. 3B demonstrating the number of potential active fluxes is proportional to growth performances.  In addition, inquiring all possible metabolic routes across a large dataset of hundreds of MAGS, is central to the current analysis, thus the easy implementation of FVA further justifies its use in the current study.

      An alternative strategy to reduce inflated FVA predictions and further constrain the solution space of predicted active fluxes can be the incorporation of additional layers of `omics data, as for example was done in the work of Zampieri et al5. Such approach could allow for instance removing reactions from the network reconstructions if not coming to play in situ, and therefore impose further constraints and narrow down the solution space. Currently, the complexity of the soil community might impede or at least constrain a high coverage recovery of transcriptomic data, though future works utilizing additional layers of `omics data are expected to significantly reduce the number of potential solutions and thus improve the accuracy of GEMs predictions. 

      This is now discussed in the text. In line 541 we write:

      "Similarly, complementing the genomic-based approaches done here, with additional layers of 'omics information (mainly transcriptomics & metabolomics) can further constrain the solution space, deflate the number of potential metabolic routes and yield more accurate predictions of GSMMs' performances5."  

      In case it was the first version of CheckM used, the authors could consider repeating this check with CheckM2. As they state in line 293, Archaea may play an essential role in the community. Yet, among the high-quality MAGs only one corresponded to Archaea. However, that is quite possible to be the case because CheckM underestimates the completeness of archaeal genomes. If CheckM2 suggests that archaeal MAGs could be used, these would probably benefit a lot for the aim of the study.

      The analysis was conducted with the first version of CheckM to assess MAGs quality. In future analyses we will use CheckM2. However, also before MAG recovery, we already know from the work of Beirhu et al., that Archaea species have a very low representation in the metagenomics data used here (Berihu et al., Additional data 2. Supp. fig. 4; "others" group)6, with less than 0.5% of the contigs mapped to archaeal genomes. The overall taxonomic distribution of the high-quality MAGs was compared to the distribution inferred from the non-binned data (contigs) and amplicon sequencing and the three different data sets are very similar (Fig. 2). 

      On line 130 we write:

      "Overall, the taxonomic distribution of the MAG collection corresponded with the profile reported for the same samples using alternative taxonomic classification approaches such as 16S rRNA amplicon sequencing and gene-based taxonomic annotations of the non-binned shotgun contigs

      (Fig. 2B)."

      The visualisation of the network in Figure 4A is hard to follow. An alternative could be a 5partite plot having taxa in columns one, three, and five and compounds in the other two. An alternative visualisation is necessary.

      The full list of the 5 and 3 partite graphs is provided in supplementary data 10 (also noted in the figure legend now). Figure 4 was revised to improve its visualization. Labels were removed and a zoom in to 5 and 3 partite plots were added (PMM and PM subnetworks, respectively). 

      Line 509: If I get the point of the authors right, they refer to the "from shotgun data to GEMs" approach. I would suggest skipping this statement. Here is a recent study implementing this: https://doi.org/10.1016/j.crmeth.2022.100383.

      Thank you for your comment and reference. The intention behind the phrase in line 509 (in previous version) was to refer to going from metagenomics data to GEMs in soil-rhizosphere microbiome while linking environmental inputs (crop-plants exudates metabolomics data) and the agricultural-related metabolic function of bacteria. This phrase has been modified to clearly make a more modest claim while acknowledging other related studies.

      On line 548 we write

      "Where recent studies begin to apply GSMM reconstruction and analysis starting from MAGs5,37 , this work applies the MAGs to GSMMs approach to conduct a large-scale CBM analysis over highquality MAGs derived from a native rhizosphere and explore the complex network of interactions in light of the functioning of the respective agro-ecosystem. "

      Line 820: Reference format is broken.

      Corrected.

      In the caption of Figure 4, please add the meaning of H, S, and NA so it is selfexplanatory.

      Done. In Figure 4 legend we added:

      "Node colors correspond to differential abundance classification of GSMMs in the different plots; H, S, NA are Healthy, Sick, Not-Associated, respectively."

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 4A is unreadable. It is not clear what insight the reader could gain by examining this figure.

      Thanks. Figure was revised. Labels were removed and a zoom-in-window to the exchanges highlighted in Figure 4C were added. Caption was revised to indicate that node colors correspond to differential abundance classification of GSMMs in the different plots (H, S, NA are Healthy, Sick, Not-Associated, respectively).

      (2) In Figure 5, it is not apparent what the units of "prevalence" are, that is, what is the scale. What does 140 mean? How does that compare to 350?

      Thanks. Prevalence in the context of Figure. 5B,C refers to the count of the compounds in each category (significantly affiliated with either healthy or symptomized soils) in sub-network motifs corresponding to this DA classification. We revised the figures (Y axes) and legend to be more specific (B: # of exudates; C: # of secreted compounds).

      "B. Bar plot indicating the number of exudates significantly associated with H or S-classified PM sub-networks (Hypergeometric test; FDR <= 0.05; green: healthy-H, red: sick-S). C. Bar plots indicate the number of secreted compounds in PM sub-networks, which are significantly associated with H-classified (upper, colored green), or S-classified (lower, colored red) (Hypergeometric test; FDR <= 0.05)."

      References

      (1) Buée, M., de Boer, W., Martin, F., van Overbeek, L. & Jurkevitch, E. The rhizosphere zoo: An overview of plant-associated communities of microorganisms, including phages, bacteria, archaea, and fungi, and of some of their structuring factors. Plant Soil 321, 189– 212 (2009).

      (2) Bardgett, R. D. & Van Der Putten, W. H. Belowground biodiversity and ecosystem functioning. Nature 515, 505–511 (2014).

      (3) Opatovsky, I. et al. Modeling trophic dependencies and exchanges among insects’ bacterial symbionts in a host-simulated environment. BMC Genomics 19, 1–14 (2018).

      (4) Kato, S., Haruta, S., Cui, Z. J., Ishii, M. & Igarashi, Y. Stable coexistence of five bacterial strains as a cellulose-degrading community. Appl. Environ. Microbiol. 71, 7099–7106 (2005).

      (5) Zampieri, G., Campanaro, S., Angione, C. & Treu, L. Metatranscriptomics-guided genomescale metabolic modeling of microbial communities. Cell Reports Methods 3, 100383 (2023).

      (6) Berihu, M. et al. A framework for the targeted recruitment of crop ‑ beneficial soil taxa based on network analysis of metagenomics data. Microbiome 1–21 (2023) doi:10.1186/s40168-022-01438-1.

      (7) Dhakar, K. et al. Modeling-Guided Amendments Lead to Enhanced Biodegradation in Soil. mSystems 7, (2022).

      (8) Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res. 46, 7542–7553 (2018).

      (9) Henry, C. S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28, 977–982 (2010).

      (10) Freilich, S. et al. Competitive and cooperative metabolic interactions in bacterial communities. Nat. Commun. 2, (2011).

      (11) Price, M. Erroneous predictions of auxotrophies by CarveMe. Nat. Ecol. Evol. 7, 194–195 (2023).

      (12) Machado, D. & Patil, K. R. Reply to: Erroneous predictions of auxotrophies by CarveMe. Nat. Ecol. Evol. 7, 196–197 (2023).

      (13) Kulichevskaya, I. S. et al. Acidicapsa borealis gen. nov., sp. nov. and Acidicapsa ligni sp. nov., subdivision 1 Acidobacteria from Sphagnum peat and decaying wood. Int. J. Syst. Evol. Microbiol. 62, 1512–1520 (2012).

      (14) Depart-, M. & Building, L. S. Lignocellulose-degrading actinomycetes. 46, 145–163 (1987).

      (15)Thomas, F., Hehemann, J. H., Rebuffet, E., Czjzek, M. & Michel, G. Environmental and gut Bacteroidetes: The food connection. Front. Microbiol. 2, 1–16 (2011).

      (16) Dow, J. M. & Daniels, M. J. Pathogenicity determinants and global regulation of pathogenicity of Xanthomonas campestris pv. campestris. Curr. Top. Microbiol. Immunol. 192, 29–41 (1994).

      (17) Bergmann, G. T. et al. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil Biol. Biochem. 43, 1450–1455 (2011).

      (18) Zhalnina, K. et al. Dynamic root exudate chemistry and microbial substrate preferences drive patterns in rhizosphere microbial community assembly. Nat. Microbiol. 3, 470–480 (2018).

      (19) Uzun, M. et al. Recovery and genome reconstruction of novel magnetotactic Elusimicrobiota from bog soil. ISME J. 1–11 (2022) doi:10.1038/s41396-022-01339-z.

      (20) Lei, S. et al. Analysis of the community composition and bacterial diversity of the rhizosphere microbiome across different plant taxa. Microbiologyopen 8, 1–10 (2019).

      (21) Ghosh, S. K., Banerjee, S. & Sengupta, C. Bioassay, characterization and estimation of siderophores from some important antagonistic fungi. J. Biopestic. 10, 105–112 (2017).

      (22) Lu, X., Heal, K. R., Ingalls, A. E., Doxey, A. C. & Neufeld, J. D. Metagenomic and chemical characterization of soil cobalamin production. ISME J. 14, 53–66 (2020).

      (23) Mee, M. T., Collins, J. J., Church, G. M. & Wang, H. H. Syntrophic exchange in synthetic microbial communities. Proc. Natl. Acad. Sci. U. S. A. 111, (2014).

      (24) Justin, K., Edmond, S., Ally, M. & Xin, H. Plant Secondary Metabolites: Biosynthesis, Classification, Function and Pharmacological Properties. J. Pharm. Pharmacol. 2, 377–392 (2014).

      (25) Yang, W. et al. A Genomic Analysis of Bacillus megaterium HT517 Reveals the Genetic Basis of Its Abilities to Promote Growth and Control Disease in Greenhouse Tomato. Genet. Res. (Camb). 2022, (2022).

      (26) Balbín-Suárez, A. et al. Root exposure to apple replant disease soil triggers local defense response and rhizoplane microbiome dysbiosis. FEMS Microbiol. Ecol. 97, 1–14 (2021).

      (27) Weiß, S., Liu, B., Reckwell, D., Beerhues, L. & Winkelmann, T. Impaired defense reactions in apple replant disease-Affected roots of Malus domestica ‘M26’. Tree Physiol. 37, 1672–1685 (2017).

      (28) Weiß, S., Bartsch, M. & Winkelmann, T. Transcriptomic analysis of molecular responses in Malus domestica ‘M26’ roots affected by apple replant disease. Plant Mol. Biol. 94, 303– 318 (2017).

      (29) Sun, N. et al. Effects of Organic Acid Root Exudates of Malus hupehensis Rehd. Derived from Soil and Root Leaching Liquor from Orchards with Apple Replant Disease. Plants 11, (2022).

      (30) Howell, C. R. Seed Treatment with L-Sorbose to Control Damping-Off or Cotton Seedlings by Rhizoctonia solani. Phytopathology 68, 1096 (1978).

      (31) Zou, C. S., Mo, M. H., Gu, Y. Q., Zhou, J. P. & Zhang, K. Q. Possible contributions of volatile-producing bacteria to soil fungistasis. Soil Biol. Biochem. 39, 2371–2379 (2007).

      (32) Gomes, V. A. et al. Activity of papaya seeds (Carica papaya) against Meloidogyne incognita as a soil biofumigant. J. Pest Sci. (2004). 93, 783–792 (2020).

      (33) Gao, T. et al. Exogenous dopamine and overexpression of the dopamine synthase gene MdTYDC alleviated apple replant disease. Tree Physiol. 41, 1524–1541 (2021).

      (34) Diener, C., Gibbons, S. M. & Resendis-Antonio, O. MICOM: Metagenome-Scale Modeling To Infer Metabolic Interactions in the Gut Microbiota. mSystems 5, (2020).

      (35) Dukovski, I. et al. A metabolic modeling platform for the computation of microbial ecosystems in time and space (COMETS). Nat. Protoc. 16, 5030–5082 (2021).

      (36) Katarina Wedmark, Y., Olav Vik, J. & Øyås, O. A hierarchy of metabolite exchanges in metabolic models of microbial species and communities. bioRxiv 1–19 (2023).

      (37) Zorrilla, F., Buric, F., Patil, K. R. & Zelezniak, A. MetaGEM: Reconstruction of genome scale metabolic models directly from metagenomes. Nucleic Acids Res. 49, (2021).

    1. eLife assessment

      In this valuable study, the authors investigate how inflammatory priming and exposure to irradiated Mycobacterium tuberculosis or the bacterial endotoxin LPS impact the metabolism of primary human airway macrophages and monocyte-derived macrophages. The work shows that metabolic plasticity is greater in monocyte-derived macrophages than alveolar macrophages, with solid experimental methods and evidence. The work is relevant to the field of immunometabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers demonstrated that when cytokine priming is combined with exposure to pathogens or pathogen-associated molecular patterns, human alveolar macrophages and monocyte-derived macrophages undergo metabolic adaptations, becoming more glycolytic while reducing oxidative phosphorylation. This metabolic plasticity is more in monocyte derived macrophages as compared to alveolar macrophages.

      Strengths:

      This study presents evidence of metabolic reprogramming in human macrophages, which significantly contributes to our existing understanding of this field primarily derived from murine models.

    3. Reviewer #2 (Public review):

      Summary:

      The current study is presented to assess the shift in metabolism (Glycolysis and Oxidative phosphorylation) of differently primed human Alveolar macrophages and Monocyte derived macrophages in response to TLR4 activating signals (such as LPS and dead Mtb bacteria). They conducted this macrophage characterization in response to type II interferon and IL-4 priming signals, followed by different stimuli of irradiated Mycobacterium tuberculosis and LPS.

      Strengths:

      (1) The study employs thorough measurement of metabolic shift in metabolism by assessing extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) of differentially polarized primary human macrophages using the Seahorse XFe24 Analyzer.<br /> (2) The effect of differential metabolic shift on the expression of different surface markers for macrophage activation is evaluated through immunofluorescence flow cytometry and cytokine measurement via ELISA.

      Weaknesses:

      (1) Prior studies with human macrophages have shown a glycolytic shift with similar signals, including live Mycobacterium tuberculosis infection.<br /> (2) Results are often described with detailed methodology for each experiment, and data are replotted and presented in duplicates for cross-analyses which can be confusing.<br /> (3) The data presented shows a distinct functional profile of airway macrophages (AMs) compared to monocyte (blood)-derived macrophages (MDMs) in response to the same priming signals. However, the study does not attempt to explore the underlying mechanisms for this difference.

      Appraisal:

      (1) The authors have achieved their aim of preliminarily characterizing the glycolysis-dependent cytokine profile and activation marker expression of IFN-g and IL-4 primed primary human macrophages.<br /> (2) The results of the study support its conclusion of glycolysis-dependent phenotypical differences in cytokine secretion and activation marker expression of AMs and MDMs.<br /> (3) However, the study is descriptive in nature, and the results validate IFN-g-mediated glycolytic reprogramming in primary human macrophages without providing mechanistic insights.

      Impact:

      The study provides evidence of metabolic reprogramming in human primary macrophages and their dependence on glycolysis for downstream secretion of cytokines and expression of activation markers.

      Additional comments:

      The results of this study are generated from a very large experiment with different treatments and phenotypic characterization. The data is plotted and analyzed in different figures to aid the reader.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript the authors explore the contribution of metabolism to the response of two subpopulations of macrophages to bacterial pathogens commonly encountered in the human lung, as well as the influence of priming signals typically produced at a site of inflammation. The two subpopulations are resident airway macrophages (AM) isolated via bronchoalveolar lavage and monocyte-derived macrophages (MDM) isolated from human blood and differentiated using human serum. The two cell types were primed using IFNγ and Il-4, which are produced at sites of inflammation as part of initiation and resolution of inflammation respectively, followed by stimulation with either heat-killed tuberculosis (Mtb) or LPS to simulate interaction with a bacterial pathogen that is either gram-negative in the case of Mtb or gram-positive in the case of LPS. The authors use human cells for this work, which makes use of widely reported and thoroughly described priming signals, as well as model antigens. This makes the observations on the functional response of these two subpopulations relevant to human health and disease to a greater extent that the mouse models typically used to interrogate theses interactions. To examine the relationship between metabolism and functional response, the authors measure rates of oxidative phosphorylation and glycolysis under baseline conditions, primed using IFNγ or IL-4, and primed and stimulated with Mtb or LPS.

      The authors addressed most of the initial critiques. The dose of IFNγ used was justified, figure legends were harmonized, a contextual definition was provided for the term "functional plasticity," and the airway macrophage population was partially characterized by flow cytometry. However, some concerns remain relating to the clarity of methods and use of statistics. The authors have not adequately explained how % change was calculated in Figure 1, in either the figure legend or the methods section. Additionally, the use of multiple statistical analyses on the same data set in figures 4 and 5, with data exclusion resulting in lower p values, is not satisfactorily justified.

      Strengths:

      • The data indicate that both populations of macrophages increase metabolic rates when primed, but MDMs decrease their rates of oxidative phosphorylation after IL-4 priming and bacterial exposure while AMs do not.

      • It is demonstrated that glycolysis rates are directly linked to the expression of surface molecules involved in T-cell stimulation and while secretion of TNFα in AM is dependent on glycolysis, in MDM this is not the case. IL-10 secretion does not appear regulated by glycolysis in either population. It is also demonstrated that Mtb and LPS stimulation produces responses that are not metabolically consistent across the two macrophage populations. The Mtb-induced response in MDMs differed from the LPS response, in that it relies on glycolysis, while this relationship is reversed in AMs. The difference in metabolic contributions to functional outcomes between these two macrophage populations is significant, despite acknowledgement of the reductive nature of the system by the authors.

      • The observations that AM and MDM rely on glycolysis for production of cytokines during a response to bacterial pathogens in the lung, but that only AM shift to Warburg Metabolism following exposure to IL-4, are supported by the data and a significant contribution the study of the innate immune response.

      Weaknesses:

      Critiques:

      • It is still difficult to interpret the metabolism data due to inconsistent normalization. It appears that in the case of rate measurement the data is normalized to unstimulated macrophages where values are set to one, but in the case of % change the values from unstimulated cells are not set to 100% and the methods say that values were calculated using primed controls, which is ambiguous. It is therefore unclear how exactly the % change values were determined. This makes it difficult to conclude whether the changes in glycolysis and oxidative phosphorylation in primed cells after stimulation are proportional to changes in unprimed cells. This would suggest that the majority of the observed effect on metabolism comes from priming itself and not from the subsequent stimulation as the authors claim.

      • The use of repeated statistical analyses with different comparison groups in the same figure/data set (e.g., in Fig.4) is still not justified. The current approach, using two-way ANOVA, removing a third of the dataset, and then applying another two-way ANOVA, produces the desired p values, but is not appropriate.

      Conclusion:

      Overall, this study reveals how inflammatory and anti-inflammatory cytokine priming contributes to the metabolic reprogramming of AM and MDM populations. Their conclusions regarding the relationship between cytokine secretion and inflammatory molecule expression in response to bacterial stimuli are supported by the data. The involvement of metabolism in innate immune cell function is relevant when devising treatment strategies that target the innate immune response during infection. The data presented in this paper further our understanding of that relationship and advance the field of innate immune cell biology.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The researchers demonstrated that when cytokine priming is combined with exposure to pathogens or pathogen-associated molecular patterns, human alveolar macrophages and monocyte-derived macrophages undergo metabolic adaptations, becoming more glycolytic while reducing oxidative phosphorylation. This metabolic plasticity is greater in monocyte-derived macrophages than in alveolar macrophages.

      Strengths:

      This study presents evidence of metabolic reprogramming in human macrophages, which significantly contributes to our existing understanding of this field primarily derived from murine models.

      Weaknesses:

      The study has limited conceptual novelty.

      We acknowledge that the study has limited conceptual novelty, however, the current manuscript provides the field with evidence of the changes in the phenotype and functions of human macrophages in response to IFN-γ or IL-4 which is currently lacking in the literature. Moreover, our data shows for the first time that human airway macrophages change their function in response to IFN-γ.  

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to functionally characterize primary human airway macrophages and monocytederived macrophages, correlating their glycolytic shift in metabolism. They conducted this macrophage characterization in response to type II interferon and IL-4 priming signals, followed by different stimuli of irradiated Mycobacterium tuberculosis and LPS.

      Strengths:

      (1) The study employs a thorough measurement of metabolic shift in metabolism by assessing extracellular acidification rate (ECAR) and oxygen consumption rate (OCR) of differentially polarized primary human macrophages using the Seahorse XFe24 Analyzer.

      (2) The effect of differential metabolic shift on the expression of different surface markers for macrophage activation is evaluated through immunofluorescence flow cytometry and cytokine measurement via ELISA.

      (3) The authors have achieved their aim of preliminarily characterizing the glycolysis-dependent cytokine profile and activation marker expression of IFN-g and IL-4 primed primary human macrophages.

      (4) The results of the study support its conclusion of glycolysis-dependent phenotypical differences in cytokine secretion and activation marker expression of Ams and MDMs.

      Weaknesses:

      (1) The data are presented in duplicates for cross-analyses.

      (2) The data presented supports a distinct functional profile of airway macrophages (Ams) compared to monocyte (blood)-derived macrophages (MDMs) in response to the same priming signals. However, the study does not attempt to explore the underlying mechanism for this difference.

      (3) The study is descriptive in nature, and the results validate IFN-g-mediated glycolytic reprogramming in primary human macrophages without providing mechanistic insights.

      (1) We acknowledge the data is presented in duplicate for cross-analyses. This duplication allowed us to examine both (A) the effect of IFN-γ or IL-4 on primary human airway and monocyte derived macrophages in the presence or absence of distinct stimulations and (B) to directly compare the fold change in function occurring in the AM with the changes in the MDM.

      (2 & 3) We acknowledge that our study is descriptive however, by inhibiting glycolysis using 2DG we have demonstrated that increased flux through glycolysis is mechanistically required to mediate enhanced cytokine responses in both primary human AM and MDM primed with IFN-γ. However, we acknowledge that we have not determined the differential molecular mechanisms downstream of IFNγ in the AM versus the MDM. IFN-γ promotes both pro- and anti-inflammatory cytokines in AM and this was reduced by inhibiting glycolysis with 2DG. This identifies glycolysis as a key mechanistic pathway which can be therapeutically targeted in AM to modulate inflammation. Mechanistic studies on human AM are limited due to low number of AM retrieved from BAL samples. Nevertheless, the differences between AM and MDM identified in the current study indicate that future mechanistic studies are warranted to identify why IFN-γ promotes IL-10 in AM and not MDM, and, why TNF is differentially regulated by glycolysis in the two macrophage subpopulations, for example.  

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors explore the contribution of metabolism to the response of two subpopulations of macrophages to bacterial pathogens commonly encountered in the human lung, as well as the influence of priming signals typically produced at a site of inflammation. The two subpopulations are resident airway macrophages (AM) isolated via bronchoalveolar lavage and monocyte-derived macrophages (MDM) isolated from human blood and differentiated using human serum. The two cell types were primed using IFNγ and Il-4, which are produced at sites of inflammation as part of initiation and resolution of inflammation respectively, followed by stimulation with either irradiated Mycobacterium tuberculosis (Mtb) or LPS to simulate interaction with a bacterial pathogen. The authors use human cells for this work, which makes use of widely reported and thoroughly described priming signals, as well as model antigens. This makes the observations on the functional response of these two subpopulations relevant to human health and disease. To examine the relationship between metabolism and functional response, the authors measure rates of oxidative phosphorylation and glycolysis under baseline conditions, primed using IFNγ or IL-4, and primed and stimulated with Mtb or LPS.

      Strengths:

      • The data indicate that both populations of macrophages increase metabolic rates when primed, but MDMs decrease their rates of oxidative phosphorylation after IL-4 priming and bacterial exposure while AMs do not.

      • It is demonstrated that glycolysis rates are directly linked to the expression of surface molecules involved in T-cell stimulation and while secretion of TNFα in AM is dependent on glycolysis, in MDM this is not the case. IL-1β is regulated by glycolysis only after IFN-γ priming in both MDM and AM populations. It is also demonstrated that Mtb and LPS stimulation produces responses that are not metabolically consistent across the two macrophage populations. The Mtb-induced response in MDMs differed from the LPS response, in that it relies on glycolysis, while this relationship is reversed in AMs. The difference in metabolic contributions to functional outcomes between these two macrophage populations is significant, despite acknowledgement of the reductive nature of the system by the authors.

      • The observations that AM and MDM rely on glycolysis for the production of cytokines during a response to bacterial pathogens in the lung, but that only MDM shift to Warburg Metabolism, though this shift is blocked following exposure to IL-4, are supported by the data and a significant contribution the study of the innate immune response.

      Weaknesses:

      • It is unclear whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. ECAR and OCR analyses were therefore difficult to interpret.

      All data sets have been presented and analysed relative to both unprimed unstimulated to show both the effect of priming and subsequent stimulation. A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change. Therefore, each of unprimed, IFN-γ and IL-4 primed cells were set to 100% in order to assess the effect of stimulation independent of the baseline priming effect. For clarity we have removed the following line:

      “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”

      We have amended the text in the manuscript (lines 164-173) to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D).  These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure 1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”

      • The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.

      Our data suggests that the MDM are more phenotypically plastic (in terms of their ability to alter expression of cell surface markers in response to cytokine cues), whereas AM have a greater ability to alter cytokine production, our measure of functional plasticity. We have now defined the use of the terms ‘functional plasticity’ and ‘phenotypic plasticity’ in the context of our paper in lines 6063. To consider different culture and plating requirements of MDM versus AM, cytokine production was analysed relative to the average of the unprimed Mtb or LPS control of the respective MDM or AM. This allowed us to draw more accurate comparisons between the two macrophage populations by examining their relative ability to increase their cytokine production (expressed as fold change) rather than defining this functional plasticity only in terms of concentrations of cytokine produced in culture.  

      We have therefore added the following sentence into the conclusion of the manuscript. “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to modulate cytokine production after exposure Th1 and Th2 cytokines.”

      We have edited the discussion (lines 421-423) to clarify the following "have increased ability to produce all cytokines assayed in response to Mtb stimulation" and changed it to “stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”   

      • The claim that AM are better for "innate training" via IFNγ may not be consistent with increased IL1β and a later claim that MDM have increased production and are "associated with optimal training."

      We have removed the word “better” and now simply state that AM are a tractable target to induce innate training in the human lung.

      • Statistical analyses may not appropriately support some of the conclusions.

      We have consulted with a statistician. Please see response to reviewer 3 recommendations for authors point 1 below.  

      • AM populations would benefit from further definition-presumably this is a heterogenous, mixed population.

      AM are routinely >97% CD68+CD14+ used in the current study (Author response image 1). However, we acknowledge that tissue resident macrophages represent a spectrum of phenotypes. Given limitations in cell numbers from primary human AM derived from BALF, we have not attempted to define the function of discreet subpopulations of AM.

      • The term "functional plasticity" could also be more stringently defined for the purposes of this study.

      We are terming functional plasticity to be the macrophages’ ability to alter their production of cytokines in response to external cues like IFN-γ and IL-4 whereas phenotypic plasticity is measured based on ability to alter the cell surface expression of activation markers.  We have now defined this in the manuscript (lines 60-63).

      Author response image 1.

      Expression of macrophage markers on AM. 

      Conclusion:

      Overall, the authors succeed in their goals of investigating how inflammatory and anti-inflammatory cytokine priming contributes to the metabolic reprogramming of AM and MDM populations. Their conclusions regarding the relationship between cytokine secretion and inflammatory molecule expression in response to bacterial stimuli are supported by the data. The involvement of metabolism in innate immune cell function is relevant when devising treatment strategies that target the innate immune response during infection. The data presented in this paper further our understanding of that relationship and advance the field of innate immune cell biology.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1)  Authors are suggested to provide rationale for their choice of cytokines as IFN-gamma and IL-4. This will be useful for the readers.

      We have updated the following sentence (line 44-46) in the manuscript to add more rationale for the choice of IFN-γ and IL-4.  “There is a paucity of data on the role of metabolism in response to Th1 or Th2 microenvironments induced by cytokines-such as IFN-γ or IL-4 respectively, in human macrophages, especially in tissue resident macrophages, such as AM.”

      (2)  Authors have shown the final outcome of metabolic reprogramming in terms of expression of HLADR and CD-40, and cytokine release. What pathways/receptors are activated or associated with IL-4 and IFN-gamma priming as a first line of response?

      The relationship between IFN-γ or IL-4 induced expression of CD40 is established in haematological cell lines and fibroblasts as well as APC, with roles for the JAK/STAT pathways and upregulation of IRFs defined (1-3). Similarly, the relationship between exogenous IFN-γ and upregulation of HLA-DR expression on human monocytes or endothelial cells is established (4, 5). Whist our work does not outline the signalling pathways downstream of Th1 or Th2 cytokine priming, we have shown for the first time that glycolysis mechanistically underpins the shift in phenotype and function observed in human macrophages upon priming with IFN-γ or IL-4.

      (3)  What are the intracellular signals leading to glycolytic shift?

      One of the most likely mechanisms that under pin the shift to glycolytic metabolism is the stabilisation of HIF-1α mediated by activation of mTOR (see response below and rebuttal figure 2).  

      (4)  Additional evidence is required to show Warburg effect such as stabilization and activation of HIF1alpha.

      We acknowledge that we have not shown the activation and stabilisation of HIF-1α, however, we have provided functional evidence of increased glycolysis with concomitant decreased oxidative phosphorylation indicative of Warburg metabolism.

      In order to address this gap in evidence we have reworded the manuscript to describe this functional change to “Warburg-like metabolism” throughout the manuscript. In addition, we have undertaken Western Blotting to provide evidence of mTOR activation when cells are primed with IFN-γ (Author response image 2).

      Author response image 2.

      IFN-γ activates mTOR in primary human monocytes. Monocytes were isolated from healthy donor PBMC using magnetic separation. Monocytes were left untreated (-), stimulated with rapamycin as a negative control (Rap; 50 nM), IFN-γ (10 ng/ml) or IFN-γ and rapamycin simultaneously (IFN-γ + Rap) for 15 minutes. Phosphorylation of S6 was used as a readout of mTOR activation and measured by western blot using β-actin as a control with a blot (A) and (b) densitometry results are shown as the relative expression of pS6: β-actin from. Graphs show data of n=1 of unprimed (black dot) vs IFN-γ primed (red) with and without rapamycin. ImageLab (Bio-Rad) software was used to perform densitometric analysis. 

      (5)  What is the importance of showing percentage change vs fold change in figure 1 (1C vs 1A)?

      All data sets have been presented and analysed relative to both unprimed unstimulated to show the effect of first priming and subsequent stimulation (Figure 1A). A second analysis was subsequently conducted where each data set was normalised to its own baseline in terms of percentage change (Figure 1C). Therefore, each of unprimed, IFN-γ or IL-4 primed cells were set to 100% to assess the effect of stimulation independent of the pre-existing effect of priming on the baseline metabolism. For clarity we have removed the following line:

      “Percentage change for ECAR and OCR was calculated from the respective baseline of each data set to visualise the differential ability of IFN-γ, IL-4 primed or unprimed AM to respond to stimulation (Figure S1C,D).”

      We have amended the text (lines 164-173) in the manuscript to “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D).  These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C). IFN-γ increased the percent change in OCR of AM in response to both bacterial stimuli compared to the unstimulated IFN-γ primed control (Figure 1D). These data indicate that priming AM alters the metabolic baselines of human tissue resident macrophages and not their ability to respond to bacterial stimuli.”

      (6)  Why IL-4 primed cells have lower glycolysis than unprimed control cells even in absence of pathogen in Figure 1A?

      IL-4 primed AM do not have statistically significant changes in glycolysis compared with unprimed control cells in the absence of stimulation.  

      Reviewer #2 (Recommendations For The Authors):

      The manuscript entitled "Human airway macrophages are metabolically reprogrammed by IFN-γ resulting in glycolysis dependent functional plasticity" by Cox et al., characterizes glycolytic-linked cytokine secretion and surface receptor expression of primary human airway macrophages (AM) and monocyte-derived macrophages (MDM). The authors primed the primary macrophages with type II interferon (IFN-γ) or interleukin-4 (IL-4) into Th1 and Th2 polarized states. This was followed by measurement of the shift in macrophage metabolism to glycolysis (ECAR measurement) and/or oxidative phosphorylation (OCR measurement) in response to lipopolysaccharide and irradiated Mycobacterium tuberculosis. The authors then utilize 2-DG (an inhibitor of glycolysis) to show the reliance of glycolytic shift in metabolism to drive the expression of different macrophage activation markers in MDMs and cytokine secretion in AMs.

      Significance:

      The study provides important validation of IFN-γ-mediated glycolytic shift and its correlated functionalities in primary human macrophage populations.

      Highlights: The study characterizes glycolytic-linked cytokine secretion and expression of macrophage activation markers in primary human resident (lung) and monocyte (blood)-derived macrophages. The study also shows data in support of IFN-γ alone in mediating glycolytic reprogramming of human primary macrophages.

      Limitations:

      The study lacks novelty and does not provide any new or different information in relation to IFN-γmediated glycolytic shift in the metabolism of human macrophages.

      Major comments:

      (1) The authors have relied on irradiated Mycobacterium tuberculosis (Mtb) and LPS stimulation to measure different correlates of macrophage functions. Additionally, the authors have discussed their results with irradiated Mtb with that of infection with live Mtb. There are also recent reports that show Mtb infection limiting glycolytic reprogramming in murine and human macrophages (PMID: 31914380) in contrast to their observation with irradiated Mtb. The authors should also include live Mtb infection or other replicative live bacterium for the induction of surface activation markers and cytokine release in their setup.

      We thank the reviewer for this suggestion; however, this is beyond the scope of the current study which was to assess AM and MDM in the context of immune stimulation in a reductive manner using TLR4 ligand LPS and a more complete whole bacteria stimulation. The selected bacterial ligands were employed in the study to allow us to model an optimal macrophage host response. This minimises the confounding variable of live bacteria which can perturb cellular metabolism and immune responses, which we have highlighted in the discussion. Since both LPS and irradiated Mtb induced similar metabolic and phenotypic profiles, it is likely that the effects of priming are maintained with diverse stimuli.  

      (2) The authors should add a quantitative measure (like extracellular lactate secretion or ECAR level) for the extent of glycolytic inhibition by the use of 5 mM 2-DG in their setup.

      We would like to draw the attention of the reviewer to the data represented in supplementary figure 2B, demonstrating that 2DG lowers ECAR at 5mM at both 1 and 24 h post stimulation with iH37Rv by an average of approximately 40%. In addition, we have acknowledged that inhibition with 5 mM 2DG does not fully inhibit glycolysis as outlined in the study limitations (lines 477-480).  

      (3) Percent change and fold change have been used to show the same or similar result in Fig. 1 and 2. Whereas, supplementary Fig. 1 shows absolute ECAR/OCR values in addition to fold change. The authors can plot either fold change or percent change in different measurements to avoid confusion. For example, do ECAR changes upon LPS stimulation in Fig. 1A and 1C come from the same dataset? One of the data points in percent change shows a decrease in percent ECAR change under no cytokine control, whereas all the data points in fold change show an increase.

      We have addressed this comment above in response to reviewer 1 point 5 (recommendations for the authors).

      We thank the reviewer for highlighting this single error in the data points for percent change. We have fixed this data point which was a result of a calculation error. All data throughout the manuscript has now been rechecked.   

      Minor comments:

      (1) The manuscript for review should be line-marked for referencing and commenting during review.

      We have now included line-marking on the manuscript.  

      (2) The authors can depict marker legends differently for all figures. In all figures, circles to squares or triangles represent treatment/stimulation with iH37Rv or LPS. The authors can depict this as circles to squares/triangles in contrast to different legends.

      We have changed the legend to include a more detailed description of data represented inserting additional information regarding the colours and symbols represented in the figures.  

      (3) Describe bars in supplementary figure 1A - 1H in its legend?

      We thank the reviewer for highlighting this oversight, we have amended the legend to state “error bars represent standard deviation”

      (4) Discuss the significant increase in CD86 expression in IFN-γ and IL-4 primed unstimulated AMs in Fig. 3E.

      We have updated the results section to state that IFN-γ increased the expression of CD86 when isolated in the absence of bacterial stimulations in Fig. 3E (lines 271-272). There is no significant increase in CD86 by IL-4 primed unstimulated AM. IL-4 primed human AM only upregulated CD86 when treated with 2DG or in the presence of stimulation.  

      (5) Contrary to Fig. 2, the data points of unstimulated cells in Fig. 4 vary for different treatment conditions (no cytokine, IFN-γ, and IL-4) for each cytokine measurement. What is the difference between unstimulated cells in Fig. 4 (for each cytokine) from that of Fig. 2 (for each receptor MFI)?

      Unstimulated cells change their surface activation markers and phenotype in response to IFN-γ and IL-4 in Fig. 2. For Fig. 4, IFN-γ and IL-4 are not sufficient to induce cytokine secretion in the absence of stimulation with bacterial ligands.  

      (6) The methodology for seeding and treatment of cells is reemphasized for almost all results. Defining macrophage priming and stimulation of macrophages in the method section and once at the start of results should be fine.

      Plating happens differently for Seahorse compared to the flow cytometric phenotyping and ELISA for cytokine production. For clarity we have stated and reemphasized the seeding and treatment of cells throughout the results section.  

      (7) Clarify "IL-4 reduced glycolysis in response to LPS stimulation" in relation to the results depicted in Fig. 1A and 1C. Similarly, clarify "IL-4 resulting in reduced IL-1β and IL-10 production" in relation to Fig. 4E.

      For clarity we have added the following lines (157-160, 164-170) to the manuscript:  

      “IL-4 primed iH37Rv stimulated AM increased ECAR to similar extent as unprimed controls (Figure 1A; left). Conversely, IL-4 primed AM stimulated with LPS AM did not increase their ECAR to the same extent as controls (Figure 1A; right), suggesting that IL-4 reduces the AM ability to increase ECAR in response to LPS stimulation.”   

      “Since IFN-γ priming increased cellular energetics in the AM at baseline, we calculated percent change in ECAR and OCR from the baseline rate of each group in order to assess if IFN-γ or IL-4 primed AM have altered capacity to change their metabolism in response to stimulation (Figure 1C,D). This was carried out to equalise all the primed data sets at baseline before stimulation (Figure S1C, S1D). These data indicate that whilst the peak of glycolysis is elevated in IFN-γ primed AM (Figure S1A), all AM have a similar capacity to increase glycolysis upon stimulation when baseline differences in metabolism were adjusted for the effects of cytokine priming (Figure 1C).”

      For clarity we have amended the sentence the reviewer has highlighted (lines 214-215): “IL-4 primed AM had reduced fold change in glycolysis upon stimulation with LPS compared with controls”.

      Since IFN-γ priming induced large effect sizes, we statistically analysed the IL-4 primed and unprimed data sets in the absence of the IFN-γ primed data sets to determine how IL-4 influenced macrophage function. The only data where this resulted in any statistical significance was in response to cytokine production. We have now clarified this in the methods and relevant figure legends by stating, “Statistically significant differences were determined using two-way ANOVA with a Tukey post-test (AD); *P≤0.05, **P≤0.01, ***P≤0.001, ****P≤0.0001 or #P≤0.05, ##P≤0.01 (where IFN-γ primed data sets were excluded for post-test analysis to analyse statistical differences between no cytokine and IL4 treated data sets).

      To further clarify this, we have amended the text of the manuscript (lines 307-310) to reflect this. “All stimulated AM secreted IL-10 regardless of priming (Figure 4E). IFN-γ significantly enhanced iH37Rv induced IL-10 in AM compared to unprimed or IL-4 primed comparators (Figure 4E). IL-4 priming of human AM significantly reduced IL-10 production in response to iH37Rv compared with unprimed AM (Figure 4E). LPS strongly induced IL-10 production in unprimed MDM, which was significantly attenuated by either IFN-γ or IL-4 priming (Figure 4F).”  

      (8) Clarify whether data points in unstimulated, iH37Rv stimulated, and LPS-stimulated control cells in Fig. 3A - 3F are from independent experiments from those in Fig. 2A - 2F? The distribution of data points of control (no 2-DG treatment) in Fig. 3 is highly similar to the corresponding data points in Fig. 2. Similarly, provide clarification for similarity in Fig. 5A - 5F and Fig. 4A - 4F.

      The data illustrated in figure 2 and 3 are from one very large dataset, as are the data in figures 4 and 5. This large experiment was designed to test the effect of priming macrophages with IFN- or IL-4 (in the presence or absence of stimulation), and also to determine if the differential responses elicited due to priming were dependent on glycolysis (by inhibiting with 2DG). For clarity and transparency, the same stimulated dataset is repeated in both figures. Given the size and complexity of the experiment, we chose to present the data this way to aid the reader.  

      (9) Clarify the statement "where data was reanalyzed in the absence of IFN-γ" in the section pertaining to Statistical analysis. The authors should clearly mention nature of biological and technical replicates for each experiment in its figure legend. The authors should also confirm multiple comparison correction in all 2-way ANOVA tests done in each figure legend."

      We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated data sets.”  

      Figures represent biological replicates (which are the average of technical replicates, presented as a single data point). This is indicated by the following sentence in each figure legend: “Each linked data point represents the average of technical duplicates for one individual biological donor”.  

      Each legend has been amended to include the multiple comparison post-test applied.

      (10) Discuss the differences and similarities of IFN-γ driven metabolic reprogramming of primary murine macrophages with the results of this study relative to cytokine secretion and activation marker expression.

      We have added additional discussion and detail comparing human and murine macrophages in lines 381-382, 403, 407 and 412-415 of the manuscript.

      (11) The repetitive data plots of similar results can be significantly reduced to improve the interpretation of the results.

      The benefit of the plotting the data in this way is for a clearer understanding and representation of the data. The repetitive data plots allow the benefit of being able to first delineate the effect of priming and priming plus stimulation and then, separately, to further examine the differences in AM versus MDM. The repetition of the primed data points then allows of the reader to determine the effect of inhibiting glycolysis with 2DG on unprimed and primed macrophages (with and without stimulation).   

      Reviewer #3 (Recommendations For The Authors):

      The methods used and data reported in this manuscript contribute to our understanding of the role of metabolism in programming of macrophages during priming. Suggestions for improving the presentation and interpretation of results include:

      • Consult with a statistician regarding analyses of the multiple conditions used during these assays. The use of repeated statistical analyses with different comparison groups in the same figure/data set seems atypical and should either be amended or fully justified in the text. Also, use of two-way vs. one-way ANOVA should be evaluated and clarified.

      We have now consulted a statistician. We have amended the text (lines 133-136) to clarify this point “P-values of ≤0.05 were considered statistically significant and denoted with an asterisk. Alternatively, P-values of ≤0.05 were denoted with a hashtag where data was analysed in the absence of IFN-γ primed data sets, to analyse statistical differences between no cytokine and IL-4 treated groups.”  

      There are two variables in the data sets; cytokine priming as well as stimulation status therefore we opted for a two-way ANOVA rather than a One-way ANOVA. There are three stimulation groups: unstimulated, Mtb-stimulated and LPS-stimulated. Cytokine priming also has three groups: no cytokine, IFN-y, or IL-4. There are two variables (priming and stimulation), each with 3 groups i.e., six treatment conditions in total, therefore two-way AVOVA with multiple comparisons tests help pinpoint exactly which groups (e.g., the 6 different levels of the 'stimulation' and 'cytokine' treatments) are significantly different from each other. This was important for understanding the specific effects of our treatments. The reader can therefore also deduce how these six treatment conditions compare to each other.

      In contrast, performing multiple single comparisons independently of the rest of the dataset (e.g. t tests), increases the risk of false positives (type 1 error). Multiple comparisons ANOVA with post-tests adjust for this, helping to reduce the likelihood of a type 1 error. These stats are more stringent, and it is therefore harder to get P values <0.05. Hence, if we compared all six treatment groups without adjustment, you increase the chance of finding false positives due to the sheer number of comparisons, leading to biased and incorrect conclusions.

      In our case, multiple comparisons tests were essential after the two-way ANOVA because they helped to objectively identify specific treatment group differences and control the overall error rate when we were extracting our conclusions, thereby reducing any risk of biases in our conclusions.

      A one-way ANOVA is used to test the effect of a single variable with more than two groups contained in the dataset. For example, in our case if you only want to test how different 'stimulation' groups affect ECAR or OCR, only in unprimed macrophages, a one-way ANOVA would be used.

      The current study used two-way ANOVA to test the effects of two variables (priming and stimulation, or in some cases priming and inhibition) each containing 3 groups, and see if there is any interaction between the two factors. For example, in our case this allowed us to examine how the 'stimulation' and the 'cytokine' priming affect ECAR/OCR levels and to determine if the effect of 'stimulation' depends on the 'cytokine' priming.

      • More justification could be given for the dose of IFNγ used for priming. Inflammatory priming is typically performed with a "low-dose" treatment (e.g., ~1 ng/ml), whereas the authors use 10 ng/ml, which would be considered a high dose. It would be useful to repeat select experiments with a more standard low-dose treatment of IFNg to demonstrate that this is also sufficient to induce the observed metabolic changes.

      Previous work has identified little difference in the response of AM and peripheral monocytes to low versus high doses of IFN-γ (6). We have inserted the following into the study limitations (lines 479-481).  

      “Furthermore, only one dose of IFN-γ was utilised due to limitations in AM yield, however, recently both low and high doses of IFN-γ have been shown to have similar effects on AM in vitro (6).”

      • Check for accuracy of the Fig.4 legend. Also check that 4G and 4B math is consistent.

      The legend for Figure 4 has been amended for incorrect A,B to state G,H. The math has been double checked for accuracy and is correct. 3 out of 10 MDM donors produced IL-1β in the absence of IFN-γ in Figure 4B, therefore the average used to calculate the data represented in Figure 4G was brought down markedly by donors who produced little or no IL-1β.  

      • Functional plasticity is a vague term and difficult to interpret in this context. It is stated that AM have greater functional plasticity, but MDMs appear to have greater capacity to secrete IL-1β and respond more robustly to IL-4 in terms of T cell stimulation. On that note, the claims regarding antigen presentation would be more impactful if a direct comparison of antigen presentation capacity was made between AM and MDM.

      Our data suggests that AM have a greater ability to alter cytokine production, such as IL1β. To consider different culture and plating requirements of MDM v AM cytokine concentration was normalised and expressed in terms of fold change.  This gives a more controlled and accurate comparison of the ability of IFN-γ or IL-4 to modulate cytokine production in AM compared with MDM.  

      The terms ‘functional plasticity’ and phenotypic plasticity’ have now been defined in the manuscript in lines 60-63.  

      We have therefore added the following sentence into the conclusion of the manuscript (lines 490-493). “Cumulatively, the data presented herein suggests that the MDM maybe more phenotypically plastic than the AM, while the AM have enhanced functional plasticity in their ability to produce cytokine after exposure Th1 and Th2 cytokines.”

      However, we acknowledge that the MDM may be regarded as more plastic because of their ability to respond robustly to IL-4, whereas the phenotypic and functional changes in the AM in response to IL4 are more limited. Whilst the focus of our work was to determine if AM are a tractable target to promote immunity in the lungs through upregulation of pro-inflammatory effector function, their ability to downregulated inflammation in response to IL-4 is comparatively less profound compared with MDM.  

      We acknowledge the shortcomings of our work which did not allow us to directly measure antigen processing in the AM, due to limitations in the cellular yield from BALF. We have edited the text (lines 251-252 and 286) to clarify this for the reader.  

      • Inconsistent normalization complicates interpretation of metabolic data. For example, it is unclear, for example, whether changes in glycolysis and oxidative phosphorylation in primed cells are due to priming or subsequent treatments. Check harmony of methods for analysis of "metabolic assays" with Fig.1 data, axis, and legend.

      We have addressed this comment, which is similar to points made by the other reviewers and amended the manuscript to increase clarity. These changes are outlined in the response to reviewer 1, point 5 (recommendations for the author). In addition, we have amended the metabolic assay method (lines 111-112) to state that “Post stimulation the ECAR and OCR were continually sampled at 20-minute intervals for times indicated.”

      • A direct comparison of cytokine production after priming and stimulation with Mtb or LPS is limited by inconsistent axes. The data may not support a claim that AM has greater "functional plasticity" without a direct comparison of antigen presentation. Moreover, MDM secrete more IL-1β than AM. The claim that that AM "have increased ability to produce all cytokines assayed in response to Mtb stimulation" does not appear to be supported by the data.

      We have amended the text to clarify this issue (lines 313-315). “These data suggest that the AM have greater functional plasticity in terms of their ability to upregulate cytokine production in response to IFN-γ, compared with the MDM. IFN-γ primed AM have enhanced IL-10 and TNF production in response to Mtb and LPS, respectively.”  

      We have amended the manuscript and have replaced “IFN-γ primed AM have increased ability to produce all cytokines assayed in response to Mtb stimulation” with the following (lines 421-423) “IFNγ primed AM stimulated with Mtb have significantly more production of IL-1β, TNF and IL-10 compared with unprimed controls. This is in contrast with IFN-γ primed MDM which only upregulate TNF compared to their unprimed controls.”

      • AM populations could be defined experimentally.

      Airway macrophages were adherence purified from bronchoalveolar lavage fluid defined as CD68+CD14+ as per rebuttal figure 1. The purpose of this study was to examine if human peripherally derived or lung resident macrophages were plastic in response to the classical polarising cytokines IFNγ and IL-4. We have identified that the AM and MDM do indeed have different functional and metabolic responses to these cytokines. However, determining functional differences within the AM subpopulations is beyond the scope of the current study and hampered by low cell numbers in human BALF.  

      References

      (1) Conzelmann M, Wagner AH, Hildebrandt A, Rodionova E, Hess M, Zota A, Giese T, Falk CS, Ho AD, Dreger P, Hecker M, Luft T. IFN-γ activated JAK1 shifts CD40-induced cytokine profiles in human antigen-presenting cells toward high IL-12p70 and low IL-10 production. Biochemical pharmacology 2010; 80: 2074-2086.

      (2) Fries KM, Sempowski GD, Gaspari AA, Blieden T, Looney RJ, Phipps RP. CD40 Expression by human fibroblasts. Clinical Immunology and Immunopathology 1995; 77: 42-51.

      (3) Gu W, Chen J, Yang L, Zhao KN. TNF-α promotes IFN-γ-induced CD40 expression and antigen process in Myb-transformed hematological cells. TheScientificWorldJournal 2012; 2012: 621969.

      (4) Hershman MJ, Appel SH, Wellhausen SR, Sonnenfeld G, Polk HC, Jr. Interferon-gamma treatment increases HLA-DR expression on monocytes in severely injured patients. Clinical and experimental immunology 1989; 77: 67-70.

      (5) Maenaka A, Kenta I, Ota A, Miwa Y, Ohashi W, Horimi K, Matsuoka Y, Ohnishi M, Uchida K, Kobayashi T. Interferon-γ-induced HLA Class II expression on endothelial cells is decreased by inhibition of mTOR and HMG-CoA reductase. FEBS open bio 2020; 10: 927-936.

      (6) Thiel BA, Lundberg KC, Schlatzer D, Jarvela J, Li Q, Shaw R, Reba SM, Fletcher S, Beckloff SE, Chance MR, Boom WH, Silver RF, Bebek G. Human alveolar macrophages display marked hyporesponsiveness to IFN-γ in both proteomic and gene expression analysis. PLoS One 2024; 19: e0295312.

    1. eLife assessment

      This fundamental state-of-the-art modeling study explores neural mechanisms underlying walking control in cats, demonstrating the probability of three different states of operation of the spinal circuitry generating locomotion at different speeds. The authors' biophysical modeling sufficiently reproduces and provides explanations for experimental data on how the locomotor cycle and phase durations depend on treadmill walking speed and points to new principles of circuit functional architecture and operating regimes underlying how spinal circuits interact with supraspinal signals and limb sensory feedback signals to produce different locomotor behaviors at different speeds, which are major unresolved problems in the field. The modeling evidence is compelling, especially in advancing our understanding of locomotion control mechanisms and will interest neuroscientists studying the neural control of movement.

    2. Reviewer #1 (Public review):

      Summary:

      It is suggested that for each limb, the RG (rhythm generator) can operate in three different regimes: a non-oscillating state-machine regime and a flexor driven and a classical half-center oscillatory regime. This means that the field can move away from the old concept that there is only room for the classic half-center organization

      Strengths:

      A major benefit of the present paper is that a bridge was made between various CPG concepts ( "a potential contradiction between the classical half-center and flexor-driven concepts of spinal RG operation"). Another important step forward is the proposal about the neural control of slow gait ("at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs for phase transitions, which can come from limb sensory feedback and/or volitional inputs (e.g. from the motor cortex").

      Weaknesses:

      Some references are missing

    3. Reviewer #2 (Public review):

      Summary:

      The biologically realistic model of the locomotor circuits developed by this group continues to define the state of the art for understanding spinal genesis of locomotion. Here the authors have achieved a new level of analysis of this model to generate surprising and potentially transformative new insights. They show that these circuits can operate in three very distinct states and that, in the intact spinal cord, these states come into successive operation as the speed of locomotion increases. Equally important, they show that in spinal injury, the model is "stuck" in the low-speed "state machine" behavior.

      Strengths:

      There are many strengths for the simulations results presented here. The model itself has been closely tuned to match a huge range of experimental data and this has a high degree of plausibility. The novel insight presented here, with the three different states, constitutes a truly major advance in the understanding of neural genesis of locomotion in spinal circuits. The authors systematically consider how the states of the model relate to presently available data from animal studies. Equally important, they provide a number of intriguing and testable predictions. It is likely that these insights are the most important achieved in the past 10 years. It is highly likely proposed multi-state behavior will have a transformative effect on this field.

      Weaknesses:

      I have no major weaknesses. A moderate concern is that the authors should consider some basic sensitivity analyses to determine if the 3-state behavior is especially sensitive to any of the major circuit parameters-e.g., connection strengths in the oscillators.

    4. Reviewer #3 (Public review):

      General Comments

      This work probes the control of walking in cats at different speeds and different states (split-belt and regular treadmill walking). Since the time of Sherrington there has been ongoing debate on this issue. The authors provide modeling data showing that they could reproduce data from cats walking on a specialized treadmill allowing for regular and split-belt walking. The data suggest that a non-oscillating state-machine regime best explains slow walking - where phase transitions are handled by external inputs into the spinal network. They then show at higher speeds a flexor-driven and then a classical half-center regime dominates. In spinal animals, it appears that a non-oscillating state-machine regime best explains the experimental data. The model is adapted from their previous work and raises interesting questions regarding the operation of spinal networks, that, at low speeds, challenge assumptions regarding central pattern generator function. This is an outstanding study which will be of general interest to the neuroscience community.

      Strengths

      The study has several strengths. Firstly the detailed model has been well established by the authors and provides details that relate to experimental data such as commissural interneurons (V0c and V0d), along with V3 and V2a interneuron data. Sensory input along with descending drive is also modelled and moreover the model reproduces many experimental data findings. Moreover, the idea that sensory feedback is more crucial at lower speeds, also is confirmed by presynaptic inhibition increasing with descending drive. The inclusion of experimental data from split-belt treadmills, and the ability of the model to reproduce findings here is a definite plus.

      Weaknesses

      Conceptually, this is a compelling study which provides interesting modeling data regarding the idea that the network can operate in different regimes, especially at lower speeds. The modelling data speaks for itself, but on the other hand, sensory feedback also provides generalized excitation of neurons which in turn project to the CPG. That is they are not considered part of the CPG proper. The authors have discussed this possibility in their revised paper.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      It is suggested that for each limb the RG (rhythm generator) can operate in three different regimes: a non-oscillating state-machine regime, and in a flexor driven and a classical half-center oscillatory regime. This means that the field can move away from the old concept that there is only room for the classic half-center organization

      Strengths:

      A major benefit of the present paper is that a bridge was made between various CPG concepts ( "a potential contradiction between the classical half-center and flexor-driven concepts of spinal RG operation"). Another important step forward is the proposal about the neural control of slow gait ("at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs for phase transitions, which can come from limb sensory feedback and/or volitional inputs (e.g. from the motor cortex").

      Weaknesses:

      Some references are missing

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional text to meet the specific Reviewer’s recommendations and several references suggested by the Reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The biologically realistic model of the locomotor circuits developed by this group continues to define the state of the art for understanding spinal genesis of locomotion. Here the authors have achieved a new level of analysis of this model to generate surprising and potentially transformative new insights. They show that these circuits can operate in three very distinct states and that, in the intact cord, these states come into successive operation as the speed of locomotion increases. Equally important, they show that in spinal injury the model is "stuck" in the low speed "state machine" behavior.

      Strengths:

      There are many strengths for the simulation results presented here. The model itself has been closely tuned to match a huge range of experimental data and this has a high degree of plausibility. The novel insight presented here, with the three different states, constitutes a truly major advance in the understanding of neural genesis of locomotion in spinal circuits. The authors systematically consider how the states of the model relate to presently available data from animal studies. Equally important, they provide a number of intriguing and testable predictions. It is likely that these insights are the most important achieved in the past 10 years. It is highly likely proposed multi-state behavior will have a transformative effect on this field.

      Weaknesses:

      I have no major weaknesses. A moderate concern is that the authors should consider some basic sensitivity analyses to determine if the 3 state behavior is especially sensitive to any of the major circuit parameters - e.g. connection strengths in the oscillators or?

      We thank the Reviewer for the thoughtful and constructive comments. The sensitivity analysis has been included as Supplemental file.

      Reviewer #3 (Public Review):

      Summary:

      This work probes the control of walking in cats at different speeds and different states (split-belt and regular treadmill walking). Since the time of Sherrington there has been ongoing debate on this issue. The authors provide modeling data showing that they could reproduce data from cats walking on a specialized treadmill allowing for regular and split-belt walking. The data suggest that a non-oscillating state-machine regime best explains slow walking - where phase transitions are handled by external inputs into the spinal network. They then show at higher speeds a flexor-driven and then a classical halfcenter regime dominates. In spinal animals, it appears that a non-oscillating state-machine regime best explains the experimental data. The model is adapted from their previous work, and raises interesting questions regarding the operation of spinal networks, that, at low speeds, challenge assumptions regarding central pattern generator function. This is an interesting study. I have a few issues with the general validity of the treadmill data at low speeds, which I suspect can be clarified by the authors.

      Strengths:

      The study has several strengths. Firstly the detailed model has been well established by the authors and provides details that relate to experimental data such as commissural interneurons (V0c and V0d), along with V3 and V2a interneuron data. Sensory input along with descending drive is also modelled and moreover the model reproduces many experimental data findings. Moreover, the idea that sensory feedback is more crucial at lower speeds, also is confirmed by presynaptic inhibition increasing with descending drive. The inclusion of experimental data from split-belt treadmills, and the ability of the model to reproduce findings here is a definite plus.

      Weaknesses:

      Conceptually, this is a very useful study which provides interesting modeling data regarding the idea that the network can operate in different regimes, especially at lower speeds. The modelling data speaks for itself, but on the other hand, sensory feedback also provides generalized excitation of neurons which in turn project to the CPG. That is they are not considered part of the CPG proper. In these scenarios, it is possible that an appropriate excitatory drive could be provided to the network itself to move it beyond the state-machine state - into an oscillatory state. Did the authors consider that possibility? This is important since work using L-DOPA, for example, in cats or pharmacological activation of isolated spinal cord circuits, shows the CPG capable of producing locomotion without sensory or descending input.

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional texts, references, and discussed the issues raised by the Reviewer. Particularly, in section “Model limitations and future directions” we now admit that afferent feedback can provide some constant level excitation to the RG circuits after spinal transection which can partly compensate for the lack of supraspinal drive and hence affect (shift) the timing of transitions between the considered regimes. We mentioned that this is one of the limitations of the present model. The potential effects of neuroactive drugs, like DOPA, on CPG circuits after spinal transection were left out because they are outside the scope of the present modeling studies.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      specific feedback to the authors:

      Nevertheless, there are some minor points, worth considering.

      Link to HUMAN DATA

      Here the authors may be interested to know that human data supports their proposal. This is relevant since there is ample evidence for the operation of spinal CPG's in humans (Duysens and van de Crommert,1998). The present model predicts that the basic output of the CPG remains even at very slow speeds, thus leading to similarity in EMG output. This prediction fits the experimental data (den Otter AR, Geurts AC, Mulder T, Duysens J. Speed related changes in muscle activity from normal to very slow walking speeds. Gait Posture. 2004 Jun;19(3):270-8). To investigate whether the basic CPG output remains basically the same even at very slow speeds (as also predicted by the current model), humans walked slowly on a treadmill (speeds as slow as 0.28 m s−1). Results showed that the phasing of muscle activity remained relatively stable over walking speeds despite substantial changes in its amplitude. Some minor additions were seen, consistent with the increased demands of postural stability. Similar results were obtained in another study: Hof AL, Elzinga H, Grimmius W, Halbertsma JP. Speed dependence of averaged EMG profiles in walking. Gait Posture. 2002 Aug;16(1):78-86. doi:

      10.1016/s0966-6362(01)00206-5. PMID: 12127190.

      These authors wrote: "The finding that the EMG profiles of many muscles at a wide range of speeds can be represented by addition of few basic patterns is consistent with the notion of a central pattern generator (CPG) for human walking". The basic idea is that the same CPG can provide the motor program at slow and fast speeds but that the drive to the CPG differs. This difference is accentuated under some conditions in pathology, such as in Parkinson's Kinesia Paradoxa. It was argued that the paradox is not really a paradox but is explained as the CPGs are driven by different systems at slow and at fast speeds (Duysens J, Nonnekes J. Parkinson's Kinesia Paradoxa Is Not a Paradox. Mov Disord. 2021 May;36(5):1115-1118. doi: 10.1002/mds.28550. Epub 2021 Mar 3. PMID: 33656203.)

      These ideas are well in line with the current proposal ("Based on our predictions, slow (conditionally exploratory) locomotion is not "automatic", but requires volitional (e.g. cortical) signals to trigger stepby-step phase transitions because the spinal network operates in a state-machine regime. In contrast, locomotion at moderate to high speeds (conditionally escape locomotion) occurs automatically under the control of spinal rhythm-generating circuits receiving supraspinal drives that define locomotor speed, unless voluntary modifications or precise stepping are required to navigate complex terrain").

      As mentioned in the present paper, other examples exist from pathology ("...Another important implication of our results relates to the recovery of walking in movement disorders, where the recovered pattern is generally very slow. For example, in people with spinal cord injury, the recovered walking pattern is generally less than 0.1 m/s and completely lacks automaticity 77-79. Based on our predictions, because the spinal locomotor network operates in a state-machine regime at these slow speeds, subjects need volition, additional external drive (e.g., epidural spinal cord stimulation) or to make use of limb sensory feedback by changing their posture to perform phase transitions"). As mentioned above, another example is provided by Parkinson's disease. The authors may also be interested in work on flexible generators in SCI: Danner SM, Hofstoetter US, Freundl B, Binder H, Mayr W, Rattay F, Minassian K. Human spinal locomotor control is based on flexibly organized burst generators. Brain. 2015 Mar;138(Pt 3):577-88. doi: 10.1093/brain/awu372. Epub 2015 Jan 12. PMID: 25582580; PMCID: PMC4408427.

      We thank the reviewer for these additional and interesting insights. We added a new paragraph in the Discussion to bolster the link with human data that includes references suggested by the Reviewer.

      CHAIN OF REFLEXES

      It reads: "... in opposition to the previously prevailing viewpoint of Charles Sherrington 21,22 that locomotion is generated through a chain of reflexes, i.e., critically depends on limb sensory feedback (reviewed in 23)." This is correct but incomplete. The reference cited (23: Stuart, D.G. and Hultborn, H, "Thomas Graham Brown (1882--1965), Anders Lundberg (1920-), and the neural control of stepping," Brain Res. Rev. 59(1), 74-95 (2008)) actually reads: "Despite the above findings, the doctrinaire position in the early 1900s was that the rhythm and pattern of hind limb stepping movements was attributable to sequential hind limb reflexes. According to Graham Brown (1911c) this viewpoint was largely due to the arguments of Sherrington and a Belgian physiologist, Maurice Philippson (1877-1938). Philippson studied stepping movements in chronically maintained spinal dogs, using techniques he had acquired in the Strasbourg laboratory of the distinguished German physiologist, Friedrich Goltz (1834-1902). He also analyzed kinematically moving pictures of dog locomotion, which had been sent to him by the renowned French physiologist, Etienne-Jules Marey (1830-1904). Philippson (1905) certainly presented arguments explaining his perception of how sequential spinal reflexes contributed to the four phases of the step cycle (see Fig. 1 in Clarac, 2008). In retrospect, it is likely that Graham Brown was correct in attributing to Philippson and Sherrington the then-prevailing viewpoint that reflexes controlled spinal stepping. It is puzzling, nonetheless, that far less was said then and even now about Philippson's belief that the spinal control was due to a combination of central and reflex mechanisms (Clarac, 2008),4,5 4 We are indebted to François Clarac for drawing to our attention Philippson's statement on p. 37 of his 1905 article that "Nos expériences prouvent d'une part que la moelle lombaire séparée du reste de l'axe cérébro-spinal est capable de produire les mouvements coordonnés dans les deux types de locomotion, trot et gallop. [Our experiments prove that one side of the spinal cord separated from the cerebro-spinal axis is able to produce coordinated movements in two types of locomotion, trot and gallop]." Then, on p. 39 Philippson (1905) states that "Nous voyons donc, en résumé que la coordination locomotrice est une fonction exclusivement médullaire, soutenue d'une part par des enchainements de réflexes directs et croisés, dont l'excitant est tantot le contact avec le sol, tantot le mouvement même du membre. [In summary, we see that locomotor coordination is an exclusive function of the spinal cord supported by a sequencing of direct and crossed reflexes, which are activated sometimes by contact with the ground and sometimes even by leg movement]. A coté de cette coordination basée sur des excitations périphériques, il y a une coordination centrale provenant des voies d'association intra-médullaires. [In conjunction with this peripherally excited coordination, there is a central coordination arising from intraspinal pathways]." (The English translations have also been kindly supplied by François Clarac.) Clearly, Philippson believed in both a central spinal and a reflex control of stepping! 5 In part 1 of his 1913/1916 review Graham Brown discussed Philippson's 1905 article in much detail (pp. 345-350 in Graham Brown, 1913b). He concludes with the statement that "... Philippson die wesentlichen Factoren des Fortbewegungsaktes in das exterozeptive Nervensystem verlegt. Er nimmt an, dass die zyklischen Bewegungen automatisch durch äussere Reize erhalten werden, welche in sich selbst thythmisch als Folge der Reflexakte welche sie selbst erzeugen, wiederholt werden. [Philippson assigns the important factors of the act of locomotion to the exteroceptive nervous system. He assumes that the cyclic movements are automatically maintained by external stimuli which, by themselves, are rhythmically repeated as a consequence of the reflexive actions that they generate themselves]." (English translation kindly supplied by Wulfila Gronenberg). This interpretation clearly ignores Philippson's emphasis on a central spinal component in the control of stepping....). "

      Hence it is a simplification to give all credits to Sherrington and ignoring the role of Philippson concerning the chain of reflexes idea.

      We again thank the Reviewer for these additional and interesting insights. We added the Philippson (1905) and Clarac (2008) references. The important contribution of Philippson is now indicated.

      GTO Ib feedback

      It reads: "This effect and the role of Ib feedback from extensor afferents has been demonstrated and described in many studies in cats during real and fictive locomotion 2,57-59."

      These citations are appropriate but it is surprising to see that the Hultborn contribution is limited to the Gossard reference while the even more important earlier reference to Conway et al is missing (Conway BA, Hultborn H, Kiehn O. Proprioceptive input resets central locomotor rhythm in the spinal cat. Exp Brain Res. 1987;68(3):643-56. doi: 10.1007/BF00249807. PMID: 3691733).

      Yes, the Conway et al. reference has been added.

      Other species

      The authors may also look at other species. The flexible arrangement of the CPGs, as described in this article, is fully in line with work on other species, showing cpg networks capable to support gait, but also scratching, swimming ..etc (Berkowitz A, Hao ZZ. Partly shared spinal cord networks for locomotion and scratching. Integr Comp Biol. 2011 Dec;51(6):890-902. doi: 10.1093/icb/icr041. Epub 2011 Jun 22. PMID: 21700568. Berkowitz A, Roberts A, Soffe SR. Roles for multifunctional and specialized spinal interneurons during motor pattern generation in tadpoles, zebrafish larvae, and turtles. Front Behav Neurosci. 2010 Jun 28;4:36. doi: 10.3389/fnbeh.2010.00036. PMID: 20631847; PMCID: PMC2903196.)

      Similar ideas about flexible coupling can also be found in: Juvin L, Simmers J, Morin D. Locomotor rhythmogenesis in the isolated rat spinal cord: a phase-coupled set of symmetrical flexion extension oscillators. J Physiol. 2007 Aug 15;583(Pt 1):115-28. doi: 10.1113/jphysiol.2007.133413. Epub 2007 Jun 14. PMID: 17569737; PMCID: PMC2277226. Or zebrafish: Harris-Warrick RM. Neuromodulation and flexibility in Central Pattern Generator networks. Curr Opin Neurobiol. 2011 Oct;21(5):685-92. doi: 10.1016/j.conb.2011.05.011. Epub 2011 Jun 7. PMID: 21646013; PMCID: PMC3171584.

      We added a sentence in the Discussion along with supporting references.

      Standing

      In the view of the present reviewer, the model could even be extended to standing in humans. It reads: "at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs"; similarly (personal experience) when going from sit to stand: as soon as weight is over support, extension is initiated and the body raises, as one would expect when the extensor center is activated by reinforcing load feedback, replacing GTO inhibition (Faist M, Hoefer C, Hodapp M, Dietz V, Berger W, Duysens J. In humans Ib facilitation depends on locomotion while suppression of Ib inhibition requires loading. Brain Res. 2006 Mar 3;1076(1):87-92. doi:

      Yes, we agree that the model could be extended to standing and the transition from standing to walking is particularly interesting. However, for this paper, we will keep the focus on locomotion over a range of speeds.

      Reviewer #2 (Recommendations For The Authors):

      The presentation is exceedingly well done and very clear.

      A moderate concern is that the authors do not make use of the capacity of computer simulations for sensitivity analyses. Perhaps these have been previously published? In any case, the question here is whether the 3 state behavior is especially sensitive to excitability of one of the main classes of neurons or a crucial set of connections.

      The sensitivity analysis has been made and included as Supplemental file.

      Minor point. I have but two minor points. A bit more explanation should be provided for the use of the terms "state machine" to describe the lowest speed state. Perhaps this is a term from control theory? In any case, it is not clear why this is term is appropriate for a state in which the oscillator circuits are "stuck" in a constant output form and need to be "pushed" by sensory input.

      Yes, we now provide a definition in the Introduction.

      Minor point: it is of course likely that neuromodulation of multiple types of spinal neurons occurs via inputs that activate G protein coupled receptors. These types of inputs are absent from the model, which is fine, but some sort of brief discussion should be included. One possibility is to note that the circuit achieves transitions between different states without the need for neuromodulatory inputs. This appears to me to be a very interesting and surprising insight.

      In section “Model limitations and future directions” in the Discussion, we now mention that the term “supraspinal drive” in our model is used to represent supraspinal inputs providing both electrical and neuromodulator effects on spinal neurons increasing their excitability, which disappear after spinal transection.” We think that it is so far too early to simulate the exact effects of the descending neuromodulation, since there is almost no data on the effect of different modulators on specific types of spinal interneurons.

      Reviewer #3 (Recommendations For The Authors):

      Minor Comments  

      Page numbers would be useful.

      Abstract

      Following spinal transection, the network can only operate in a state-machine regime. This is a bit strong since it applies to computational data. Clarify this statement.

      We agree. Sentence has been changed to: “Following spinal transection, the model predicts that the spinal network can only operate in the state-machine regime.”

      Introduction

      Intro - "This is somewhat surprising...". It gives the impression that spinal cats are autonomously stable on the belt. They are stabilized by the experimenter.

      The text has been changed to: “This is somewhat surprising because intact and spinal cats rely on different control mechanisms. Intact cats walking freely on a treadmill engage vision for orientation in space and their supraspinal structures process visual information and send inputs to the spinal cord to control locomotion on a treadmill that maintains a fixed position of the animal relative to the external space. Spinal cats, whose position on the treadmill relative to the external space is fixed by an experimenter, can only use sensory feedback from the hindlimbs to adjust locomotion to the treadmill speed.”

      "Cannot consistently perform treadmill locomotion" - likely a context-dependent result. Certainly, cats can do this easily off a treadmill - stalking, for example. Perhaps somewhere, mention that treadmill locomotion is not entirely similar to overground locomotion.

      We completely agree. Stalking is an excellent example showing that during overground locomotion slow movements (and related phase transitions) can be controlled by additional voluntary commands from supraspinal structures, which differs from simple treadmill locomotion, performing out of specific goalor task-dependent contexts. Based on this, we suggest a difference between a relatively slow (exploratory-type, including stalking) and relatively fast (escape-type) overground locomotion. We added the following sentence to the introduction:” This is evidently context dependent and specific for the treadmill locomotion as cats, humans  and other animals can voluntarily decide to perform consistent overground locomotion at slow speeds.”

      The authors introduce the concept of the state machine regime. In my opinion, this could use some more explanation and citations to the literature. Was it a term coined by the authors, or is there literature reinforcing this point?

      This is a computer science and automata theory term that has already been used in descriptions of locomotion (see our references in the 2nd paragraph of Discussion). We added a definition and corresponding references in the Introduction.

      In terms of sensory feedback, particularly group II input, it would be interesting to calculate if the conduction delay to the spinal cord at higher speeds would have a certain cutoff point at which it would no longer be timed effectively for phase transitions. This could reinforce your point.

      This is an interesting proposition but it is unlikely to be a factor over the range of speeds that we investigated (0.1 to 1.0 m/s). Assuming that group II afferents transmit their signals to spinal circuits at a latency of 10-20 ms, this is more than enough time to affect phase transitions, even at the highest speed considered. This might be a factor at very high speeds (e.g. galloping) or in small animals with high stepping frequencies.

      Results.

      The assertion that intact cats are inconsistent in terms of walking at slow speeds needs to be bolstered. For example, if a raised platform were built for a tray of food, would the intact cat consistently walk at slower speeds and eat? I suspect so. By the same token, would they walk slowly during bipedal walking? It is pretty easy to check this. Also, reports from the literature show differential effects of runway versus treadmill gait analysis, specifically when afferent input is removed.

      The Reviewer is correct that raising a platform for a food tray or even having intact cats walk with their hindlimbs only (with forelimbs on a stationary platform) may allow for consistent stepping at slow speeds (0.1 – 0.3 m/s). However, this effectively removes voluntary control of locomotion and makes the pattern more automatic (spinal + limb sensory feedback). These examples provide additional specific contexts, and we have already mentioned (see above) that slow locomotion of intact cat is context dependent. 

      "We believe that intact animals walking on a treadmill..." Citations for this? Certainly, this is not a new point.

      No, this is not new. We changed the sentence and added a reference to the statement: “Intact animals walking on a treadmill use visual cues and supraspinal signals to adjust their speed and maintain a fixed position relative to the external space with reference to Salinas et al. (Salinas, M.M., Wilken, J M, and Dingwell, J B, "How humans use visual optic flow to regulate stepping during walking," Gait. Posture. 57, 15-20, 2017).

      The presentation of the results is somewhat disjointed. The intact data is presented for tied and splitbelt results, but this is not addressed explicitly until figure 4. Would it not be better to create a figure incorporating both intact and modelling data and present the intact data where appropriate?

      We tried to do this initially, but this way required changing the style of the whole paper and we decided against this idea. Therefore, we prefer to keep the presentation of results as it is now. 

      Regarding the role of sensory feedback being especially important at low speeds, it is interesting that egr3+ mice (lacking spindle input) show an inability to walk at high speeds >40 cm/s but can walk at lower speeds (up to 7 cm/s) (Takeoka et al 2014). Similar findings were found with a lesion affecting Group I afferents in general (Takeoka and Arber 2019). Also, Grillner and colleagues show that cats can produce fictive locomotion in the absence of sensory input.

      In the Takeoka experiments it is difficult to assess the effect of removing somatosensory feedback because animals can simply decide to not step at higher speeds to avoid injury. Their mice deprived of somatosensory feedback can walk at slow speeds, likely thanks to voluntary commands, and cannot do so at higher speeds because (1) maybe somatosensory feedback is indeed necessary and/or (2) because they feel threatened because of impaired posture and poor control in general. In other words, they choose to not walk at faster speeds to avoid injury.

      Fictive locomotion by definition is without phasic somatosensory feedback as the animals are curarized or studies are performed in isolated spinal cord preparations. Depending on the preparation, pharmacology or brainstem stimulation is required to evoke fictive locomotion. If animals are deafferented, pharmacology or brainstem stimulation are required to induce fictive locomotion to offset the loss of spinal neuronal excitability provided by primary afferents. At the same time, our preliminary analysis of old fictive locomotion data in the University of Manitoba Spinal Cord center (Drs. Markin and Rybak had an official access to these data base during our collaboration with Dr. David McCrea) has shown that the frequency of stable fictive locomotion in cats usually exceeded 0.6 - 0.7 Hz, which approximately corresponds to the speed above 0.3 - 0.4 m/s. These data and estimation are just approximate; they have not been statistically analyzed and published and hence have not been included in our paper.

      Discussion. The statement that sensory feedback is required for animals to locomote may need to be qualified. Animals need some sensory feedback to locomote is perhaps better. For example, lesion studies by Rossignol in the early 2000s showed that cutaneous feedback from the paw was seemingly quite critical (in spinal cats). Also, see previous comments above.

      We changed this to: “… requires some sensory feedback to locomote, …”

      Figures

      Figure 1C. This figure is somewhat confusing. If intact cats do not walk (arrow), how are the data for swing and stance computed? Also raw traces would be useful to indicate that there is variability. Also, while duration is useful, would you not want to illustrate the co-efficient of variation as well as another way to show that the stepping pattern was inconsistent?

      This is probably a misunderstanding. The left panel of Fig. 1C superimposes data of intact cats from panel A (with speed range from 0.4 m/s to 1.0 m/s) and data from spinal cats from panel B (with speed range from 0.1 m/s and 1.0 m/s). Therefore, the left part of this left panel 1C (with speed range from 0.1 m/s to 0.4 m/s (pointed out by the arrow) corresponds only to spinal cats (not to intact cats). The standard deviations of all measurements are shown. All these figures were reproduced from the previous publications. We did not apply new statistical analysis to these previously published data/figures.

      Figure 4. 'All supraspinal drives (and their suppression of sensory feedback) are eliminated from the schematic shown in A. ' However, it is labelled 'brainstem drives,' which is confusing. Moreover, many of the abbreviations are confusing. Do you need l-SF-E1 in the figure, or could you call it 'Feedback 1' and then refer to l-SF-E1 in the legend? The same goes for βr, etc. Can they move to the legend?

      In the intact model (Fig. 4A), we have supraspinal drives (𝛼𝐿 and 𝛼𝑅, and  𝛾𝐿 and 𝛾𝑅 ), some of which provide presynaptic inhibition of sensory feedback (SF-E1 and SF-E2) as shown in Fig. 4A. In spinaltransected model (Fig. 4B), the above brainstem drives and their effects (presynaptic inhibition) on both feedback types are eliminated (therefore, there is no label “Brainstem drives in Fig. 4B). Also, we do not see a strong reason to change the feedback names, since they are explained in the text.

      I appreciate the detail of these figures, but they are difficult to conceptualize. They are useful in the context of 3C. Perhaps move this figure to supplementary and then show the proposed schematics for the system operating at slow, medium, and fast speeds in a replacement figure?

      We apologize for the resistance, but we would like to keep the current presentation.

      There is a lack of raw data (models or experimental) data reinforcing the figures. I would add these to all figures, which would nicely complement the graphs.

      These raw data can be found in the cited manuscripts. It would be the same figures.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1) Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2) Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3) Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4) Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5) Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    2. eLife assessment

      The ability to estimate the force of infection for Plasmodium falciparum from other more directly measurable epidemiological quantities is a useful contribution to malaria epidemiology. The authors propose a method to accomplish this using genetic data from the var genes of the Pf genome and novel applications of existing methods from queueing theory. While the simulations are sophisticated, the real-world application of the method is incomplete in its analysis and would benefit from clearer articulation of the assumptions being made. Given the lack of clarity in the methods and presentation of results, it is difficult to fully assess the performance of their proposed estimation procedure.

    3. Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context.

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics.

      (3) The mathematical approach is simple and elegant, and thus easy to understand.

      Weaknesses:

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates.

      (2 )Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals.

    5. Reviewer #3 (Public Review):

      Summary:

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI.

      Strengths:

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics.

      Weaknesses:

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone).

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The conclusions of the in vitro experiments using cultured hippocampal slices were well supported by the data, but aspects of the in vivo experiments and proteomic studies need additional clarification.

      (1) In contrast to the in vitro experiments in which a γ-secretase inhibitor was used to exclude possible effects of Aβ, this possibility was not examined in in-vivo experiments assessing synapse loss and function (Figure 3) and cognitive function (Figure 4). The absence of plaque formation (Figure 4B) is not sufficient to exclude the possibility that Aβ is involved. The potential involvement of Aβ is an important consideration given the 4-month duration of protein expression in the in vivo studies.

      Response: We appreciate the reviewer for raising this question. While our current data did not exclude the potential involvement of Aβ-induced toxicity in the synaptic and cognitive dysfunction observed in mice overexpressing β-CTF, addressing this directly remains challenging. Treatment with γ-secretase inhibitors could potentially shed light on this issue. However, treatments with γ-secretase inhibitors are known to lead to brain dysfunction by itself likely due to its blockade of the γ-cleavage of other essential molecules, such as Notch[1, 2]. As a result, this approach is unlikely to provide a definitive answer, which also prevents us from pursuing it further in vivo. We hope the reviewer understands this limitation and agrees to a discussion of this issue in the revised manuscript instead.

      (2) The possibility that the results of the proteomic studies conducted in primary cultured hippocampal neurons depend in part on Aβ was also not taken into consideration.

      Response: We thank the reviewer for raising this interesting question. In the revised manuscript, we plan to address this experimentally by using a γ-secretase inhibitor to investigate the potential contribution of Aβ in this study.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors' use of sparse expression to examine the role of β-CTF on spine loss could be a useful general tool for examining synapses in brain tissue.

      Response: We thank the reviewer for these comments. Indeed, it is a very robust assay and we would like to share this method with the scientific community as soon as possible.

      Additional context that might help readers interpret or understand the significance of the work:

      The discovery of BACE1 stimulated an international effort to develop BACE1 inhibitors to treat Alzheimer's disease. BACE1 inhibitors block the formation of β-CTF which, in turn, prevents the formation of Aβ and other fragments. Unfortunately, BACE1 inhibitors not only did not improve cognition in patients with Alzheimer's disease, they appeared to worsen it, suggesting that producing β-CTF actually facilitates learning and memory. Therefore, it seems unlikely that the disruptive effects of β-CTF on endosomes plays a significant role in human disease. Insights from the authors that shed further light on this issue would be welcome.

      Response: We would like to express our gratitude to the reviewer for raising this interesting question. It remains puzzling why BACE1 inhibition has failed to yield benefits in AD patients, while amyloid clearance via Aβ antibodies has been shown to slow disease progression. One possible explanation is that pharmacological inhibition of BACE1 may not be as effective as genetic removal. Indeed, genetic depletion of BACE1 leads to the clearance of existing amyloid plaques[3], whereas its pharmacological inhibition slows plaque growth and prevents the formation of new plaques but does not stop the growth of the existing ones[4]. We think the negative results of BACE1 inhibitors in clinical trials may not be sufficient to rule out the potential contribution of β-CTF to AD pathogenesis. Given that cognitive function continues to deteriorate rapidly in plaque-free patients after 1.5 years of treatment with Aβ antibodies in phase three clinical studies[5], it is important to consider the possible role of other Aβ-related fragments, such as β-CTF. We will include some further discussion in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigate the potential role of other cleavage products of amyloid precursor protein (APP) in neurodegeneration. They combine in vitro and in vivo experiments, revealing that β-CTF, a product cleaved by BACE1, promotes synaptic loss independently of Aβ. Furthermore, they suggest that β-CTF may interact with Rab5, leading to endosomal dysfunction and contributing to the loss of synaptic proteins.

      Response: We would like to thank the reviewer for his/her insightful suggestions. We have addressed the specific comments in following sections.

      Weaknesses:

      Most experiments were conducted in vitro using overexpressed β-CTF. Additionally, the study does not elucidate the mechanisms by which β-CTF disrupts endosomal function and induces synaptic degeneration.

      Response: We would like to thank the reviewer for this insightful comment. While a significant portion of our experiments were conducted in vitro, the main findings were also confirmed in vivo (Figures 3 and 4). Repeating all the experiments in vivo would be challenging and may not be necessary. Regarding the use of overexpressed β-CTF, we acknowledge that this is a common issue in neurodegenerative disease studies. These diseases progress slowly over many years, sometimes even decades in patients. To model this progression in cell or mouse models within a time frame feasible for research, overexpression of certain proteins is often required. While not ideal, it is sometimes unavoidable. Since β-CTF levels are elevated in AD patients[6], its overexpression is a reasonable approach to investigate its potential effects.

      We did not further investigate the mechanisms by which β-CTF disrupted endosomal function because our preliminary results align with previous findings. Kim et al. demonstrated that β-CTF recruits APPL1 (a Rab5 effector) via the YENPTY motif to Rab5 endosomes, where it stabilizes active GTP-Rab5, leading to pathologically accelerated endocytosis, endosome swelling and selectively impaired transport of Rab5 endosomes[6]. In our manuscript, we observed that co-expression of Rab5S34N with β-CTF effectively mitigated β-CTF-induced spine loss in hippocampal slice cultures (Figures 6I-J), indicating that Rab5 overactivation-induced endosomal dysfunction contributed to β-CTF-induced spine loss, which was consistent with their conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Most previous studies have focused on the contributions of Abeta and amyloid plaques in the neuronal degeneration associated with Alzheimer's disease, especially in the context of impaired synaptic transmission and plasticity which underlies the impaired cognitive functions, a hallmark in AD. But processes independent of Abeta and plaques are much less explored, and to some extent, the contributions of these processes are less well understood. Luo et all addressed this important question with an array of approaches, and their findings generally support the contribution of beta-CTF-dependent but non-Abeta-dependent process to the impaired synaptic properties in the neurons. Interestingly, the above process appears to operate in a cell-autonomous manner. This cell-autonomous effect of beta-CTF as reported here may facilitate our understanding of some potentially important cellular processes related to neurodegeneration. Although these findings are valuable, it is key to understand the probability of this process occurring in a more natural condition, such as when this process occurs in many neurons at the same time. This will put the authors' findings into a context for a better understanding of their contribution to either physiological or pathological processes, such as Alzheimer's. The experiments and results using the cell system are quite solid, but the in vivo results are incomplete and hence less convincing (see below). The mechanistic analysis is interesting but primitive and does not add much more weight to the significance. Hence, further efforts from the authors are required to clarify and solidify their results, in order to provide a complete picture and support for the authors' conclusions.

      Response: We would like to thank the reviewer for the constructive suggestions. We have addressed the specific comments in following sections.

      Strengths:

      (1) The authors have addressed an interesting and potentially important question

      (2) The analysis using the cell system is solid and provides strong support for the authors' major conclusions. This analysis has used various technical approaches to support the authors' conclusions from different aspects and most of these results are consistent with each other.

      Response: We would like to thank the reviewer for these comments.

      Weaknesses:

      (1) The relevance of the authors' major findings to the pathology, especially the Abeta-dependent processes is less clear, and hence the importance of these findings may be limited.

      Response: We would like to thank the reviewer for pointing this out. Phase 3 clinical trial data for Aβ antibodies show that cognitive function continues to decline rapidly, even in plaque-free patients, after 1.5 years of treatment[5]. This suggests that plaque-independent mechanisms may drive AD progression. Therefore, it is crucial to consider the potential contributions of other Aβ species or related fragments, such as alternative forms of Aβ and β-CTF. While it is too early to definitively predict how β-CTF contributes to AD progression, it is notable that β-CTF, rather than Aβ, induced synaptic deficits in mice, which recapitulates a key pathological feature of AD. Ultimately, the true role of β-CTF in AD pathogenesis can only be confirmed through clinical studies.

      (2) In vivo analysis is incomplete, with certain caveats in the experimental procedures and some of the results need to be further explored to confirm the findings.

      Response: We would like to thank the reviewer for this suggestion. We plan to correct these caveats in the revised manuscript.

      (3) The mechanistic analysis is rather primitive and does not add further significance.

      Response: We would like to thank the reviewer for this comment. We did not delve further into the underlying mechanisms because our analysis indicates that Rab5 dysfunction underlies β-CTF-induced endosomal dysfunction, which is consistent with another study and has been addressed in detail there[6]. We hope the reviewer could understand that our focus in this paper is on how β-CTF triggers synaptic deficits, which is why we did not investigate the mechanisms of β-CTF-induced endosomal dysfunction further.

      References:

      1. GüNER G, LICHTENTHALER S F. The substrate repertoire of γ-secretase/presenilin [J]. Seminars in cell & developmental biology, 2020, 105: 27-42.
      2. DOODY R S, RAMAN R, FARLOW M, et al. A phase 3 trial of semagacestat for treatment of Alzheimer's disease [J]. The New England journal of medicine, 2013, 369(4): 341-50.
      3. HU X, DAS B, HOU H, et al. BACE1 deletion in the adult mouse reverses preformed amyloid deposition and improves cognitive functions [J]. The Journal of experimental medicine, 2018, 215(3): 927-40.
      4. PETERS F, SALIHOGLU H, RODRIGUES E, et al. BACE1 inhibition more effectively suppresses initiation than progression of β-amyloid pathology [J]. Acta Neuropathol, 2018, 135(5): 695-710.
      5. SIMS J R, ZIMMER J A, EVANS C D, et al. Donanemab in Early Symptomatic Alzheimer Disease: The TRAILBLAZER-ALZ 2 Randomized Clinical Trial [J]. Jama, 2023, 330(6): 512-27.
      6. KIM S, SATO Y, MOHAN P S, et al. Evidence that the rab5 effector APPL1 mediates APP-βCTF-induced dysfunction of endosomes in Down syndrome and Alzheimer's disease [J]. Molecular psychiatry, 2016, 21(5): 707-16.
    2. eLife assessment

      This study presents a useful demonstration that a specific protein fragment may induce the loss of synapses in Alzheimer's disease. The evidence supporting the data is solid but incomplete and would benefit from additional experiments. The application of the findings is limited because blocking the formation of the protein fragment has not benefited patients in several clinical trials.

    3. Reviewer #1 (Public Review):

      Summary of what the authors were trying to achieve:

      In this manuscript, the authors investigated the role of β-CTF on synaptic function and memory. They report that β-CTF can trigger the loss of synapses in neurons that were transiently transfected in cultured hippocampal slices and that this synapse loss occurs independently of Aβ. They confirmed previous research (Kim et al, Molecular Psychiatry, 2016) that β-CTF-induced cellular toxicity occurs through a mechanism involving a hexapeptide domain (YENPTY) in β-CTF that induces endosomal dysfunction. Although the current study also explores the role of β-CTF in synaptic and memory function in the brain using mice chronically expressing β-CTF, the studies are inconclusive because potential effects of Aβ generated by γ-secretase cleavage of β-CTF were not considered. Based on their findings, the authors suggest developing therapies to treat Alzheimer's disease by targeting β-CTF, but did not address the lack of clinical improvement in trials of several different BACE1 inhibitors, which target β-CTF by preventing its formation.

      Major strengths and weaknesses of the methods and results:

      The conclusions of the in vitro experiments using cultured hippocampal slices were well supported by the data, but aspects of the in vivo experiments and proteomic studies need additional clarification.

      (1) In contrast to the in vitro experiments in which a γ-secretase inhibitor was used to exclude possible effects of Aβ, this possibility was not examined in in-vivo experiments assessing synapse loss and function (Figure 3) and cognitive function (Figure 4). The absence of plaque formation (Figure 4B) is not sufficient to exclude the possibility that Aβ is involved. The potential involvement of Aβ is an important consideration given the 4-month duration of protein expression in the in vivo studies.

      (2) The possibility that the results of the proteomic studies conducted in primary cultured hippocampal neurons depend in part on Aβ was also not taken into consideration.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors' use of sparse expression to examine the role of β-CTF on spine loss could be a useful general tool for examining synapses in brain tissue.

      Additional context that might help readers interpret or understand the significance of the work:

      The discovery of BACE1 stimulated an international effort to develop BACE1 inhibitors to treat Alzheimer's disease. BACE1 inhibitors block the formation of β-CTF which, in turn, prevents the formation of Aβ and other fragments. Unfortunately, BACE1 inhibitors not only did not improve cognition in patients with Alzheimer's disease, they appeared to worsen it, suggesting that producing β-CTF actually facilitates learning and memory. Therefore, it seems unlikely that the disruptive effects of β-CTF on endosomes plays a significant role in human disease. Insights from the authors that shed further light on this issue would be welcome.

    4. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigate the potential role of other cleavage products of amyloid precursor protein (APP) in neurodegeneration. They combine in vitro and in vivo experiments, revealing that β-CTF, a product cleaved by BACE1, promotes synaptic loss independently of Aβ. Furthermore, they suggest that β-CTF may interact with Rab5, leading to endosomal dysfunction and contributing to the loss of synaptic proteins.

      Weaknesses:

      Most experiments were conducted in vitro using overexpressed β-CTF. Additionally, the study does not elucidate the mechanisms by which β-CTF disrupts endosomal function and induces synaptic degeneration.

    5. Reviewer #3 (Public Review):

      Summary:

      Most previous studies have focused on the contributions of Abeta and amyloid plaques in the neuronal degeneration associated with Alzheimer's disease, especially in the context of impaired synaptic transmission and plasticity which underlies the impaired cognitive functions, a hallmark in AD. But processes independent of Abeta and plaques are much less explored, and to some extent, the contributions of these processes are less well understood. Luo et all addressed this important question with an array of approaches, and their findings generally support the contribution of beta-CTF-dependent but non-Abeta-dependent process to the impaired synaptic properties in the neurons. Interestingly, the above process appears to operate in a cell-autonomous manner. This cell-autonomous effect of beta-CTF as reported here may facilitate our understanding of some potentially important cellular processes related to neurodegeneration. Although these findings are valuable, it is key to understand the probability of this process occurring in a more natural condition, such as when this process occurs in many neurons at the same time. This will put the authors' findings into a context for a better understanding of their contribution to either physiological or pathological processes, such as Alzheimer's. The experiments and results using the cell system are quite solid, but the in vivo results are incomplete and hence less convincing (see below). The mechanistic analysis is interesting but primitive and does not add much more weight to the significance. Hence, further efforts from the authors are required to clarify and solidify their results, in order to provide a complete picture and support for the authors' conclusions.

      Strengths:

      (1) The authors have addressed an interesting and potentially important question

      (2) The analysis using the cell system is solid and provides strong support for the authors' major conclusions. This analysis has used various technical approaches to support the authors' conclusions from different aspects and most of these results are consistent with each other.

      Weaknesses:

      (1) The relevance of the authors' major findings to the pathology, especially the Abeta-dependent processes is less clear, and hence the importance of these findings may be limited.

      (2) In vivo analysis is incomplete, with certain caveats in the experimental procedures and some of the results need to be further explored to confirm the findings.

      (3) The mechanistic analysis is rather primitive and does not add further significance.

    1. eLife assessment

      This manuscript describes the impact of modulating signaling by a key regulatory enzyme, Dual Leucine Zipper Kinase (DLK), on hippocampal neurons. The results are interesting and will be important for scientists interested in synapse formation, axon specification, and cell death. The methods and interpretation of the data are solid, but the study can be further strengthened with some additional studies and controls.

    2. Reviewer #1 (Public Review):

      Summary:

      In this work, Ritchie and colleagues explore functional consequences of neuronal over-expression or deletion of the MAP3K DLK that their labs and others have strongly implicated in both axon degeneration, neuronal cell death, and axon regeneration. Their recent work in eLife (Li, 2021) showed that inducible over-expression of DLK (or the related LZK) induces neuronal death in the cerebellum. Here, they extend this work to show that inducible over-expression in Vglut1+ neurons also kills excitatory neurons in hippocampal CA1, but not CA3. They complement this very interesting finding with translatomics to quantify genes whose mRNAs are differentially translated in the context of DLK over-expression or knockout, the latter manipulation having little to no effect on the phenotypes measured. The authors note that several genes and pathways are differentially regulated according to whether DLK is over-expressed or knocked out. They note DLK-dependent changes in genes related to synaptic function and the cytoskeleton and ultimately relate this in cultured neurons to findings that DLK over-expression negatively impacts synapse number and changes microtubules and neurites, though with a less obvious correlation.

      Strengths:

      This work represents a conceptual advance in defining DLK-dependent changes in translation. Moreover, the finding that DLK may differentially impact neuronal death will become the basis for future studies exploring whether DLK contributes to differential neuronal susceptibility to death, which is a broadly important topic.

      Weaknesses:

      This seems like two works in parallel that the authors have not yet connected. First is that DLK affects the translation of an interesting set of genes, and second, that DLK(OE) kills some neurons, disrupts their synapses, and affects neurite growth in culture.

      Specific questions:

      (1) Is DLK effectively knocked out? The authors reference the floxxed allele in their 2016 work (PMID: 27511108), however, the methods of this paper say that the mouse will be characterized in a future publication. Has this ever been published? The major concern is that here the authors show that Cre-mediated deletion results in a smaller molecular weight protein and the maintenance of mRNA levels.

      (2) Why does DLK(OE) not kill CA3 neurons? The phenomenon is clear but there is no link to gene expression changes. In fact, the highlighted transcript in this work, Stmn4, changes in a DLK-dependent manner in CA3.

      (3) Why are whole hippocampi analyzed to IP ribosome-associated mRNAs? The authors nicely show a differential effect of DLK on CA1 vs CA3, but then - at least according to their methods ¬- lyse whole hippocampi to perform IP/sequencing. Their data are therefore a mix of cells where DLK does and does not change cell death. The key issue is whether DLK does/does not have an effect based on the expression changes it drives.

      (4) Is the subtle decrease in synapse number (Basson/Homer co-loc.) in the DLK (OE) simply a function of neurons (and their synapses, presumably) having died? At the P15 time point that the authors choose because cell death is minimal, there is still a ~25% reduction in CA1 thickness (Figure 2B), which is larger than the ~15% change in synapses (Figure 5H) they describe.

    3. Reviewer #2 (Public Review):

      This manuscript describes the impact of deleting or enhancing the expression of the neuronal-specific kinase DLK in glutamatergic hippocampal neurons using clever genetic strategies, which demonstrates that DLK deletion had minimal effects while overexpression resulted in neurodegeneration in vivo. To determine the molecular mechanisms underlying this effect, ribotag mice were used to determine changes in active translation which identified Jun and STMN4 as DLK-dependent genes that may contribute to this effect. Finally, experiments in cultured neurons were conducted to better understand the in vivo effects. These experiments demonstrated that DLK overexpression resulted in morphological and synaptic abnormalities.

      Strengths:

      This study provides interesting new insights into the role of DLK in the normal function of hippocampal neurons. Specifically, the study identifies:

      (1) CA1 vs CA3 hippocampal neurons have differing sensitivity to increased DLK signaling.

      (2) DLK-dependent signaling in these neurons is similar to but distinct from the downstream factors identified in other cell types, highlighted by the identification of STMN4 as a downstream signal.

      (3) DLK overexpression in hippocampal neurons results in signaling that is similar to that induced by neuronal injury.

      The study also provides confirmatory evidence that supports previously published work through orthogonal methods, which adds additional confidence to our understanding of DLK signaling in neurons. Taken together, this is a useful addition to our understanding of DLK function.

      Weaknesses:

      There are a few weaknesses that limit the impact of this manuscript, most of which are pointed out by the authors in the discussion. Namely:

      (1) It is difficult to distinguish whether the changes in the translatome identified by the authors are DLK-dependent transcriptional changes, DLK-dependent post-transcriptional changes or secondary gene expression changes that occur as a result of the neurodegeneration that occurs in vivo. Additional expression analysis at earlier time points could be one method to address this concern.

      (2) Related to the above, it is difficult to conclusively determine from the current data whether the changes in synaptic proteins observed in vivo are a secondary result of neuronal degeneration or a primary impact on synapse formation. The in vitro studies suggest this has the potential to be a primary effect, though the difference in experimental paradigm makes it impossible to determine whether the same mechanisms are present in vitro and in vivo.

      (3) The phenotype of DLK cKO mice is very subtle (consistent with previous reports) and while the outcome of increased DLK levels is interesting, the relevance to physiological DLK signaling is less clear. What does seem possible is that increased DLK may phenocopy other neuronal injuries but there are no real comparisons to directly address this in the manuscript. It would be helpful for the authors to provide this analysis as well as a table with all of the translational changes along with fold changes.

      (4) For the in vivo experiments, it is unclear whether multiple sections from each animal were quantified for each condition. More information here would be helpful and it is important that any quantification takes multiple sections from each animal into account to account for natural variability.

    4. Reviewer #3 (Public Review):

      Dr Jin and colleagues revisit DLK and its established multifactorial roles in neuronal development, axonal injury, and neurodegeneration. The ambitious aim here is to understand the DLK-dependent gene network in the brain and, to pursue this, they explore the role of DLK in hippocampal glutamatergic neurons using conditional knockout and induced overexpression mice. They produce evidence that dorsal CA1 and dentate gyrus neurons are vulnerable to elevated expression of DLK, while CA3 neurons appear unaffected. Then they identify the DLK-dependent translatome featured by conserved molecular signatures and cell-type specificity. Their evidence suggests that increased DLK signaling is associated with possible STMN4 disruptions to microtubules, among else. They also produce evidence on cultured hippocampal neurons showing that expression levels of DLK are associated with changes in neurite outgrowth, axon specification, and synapse formation. They posit that downstream translational events related to DLK signaling in hippocampal glutamatergic neurons are a generalizable paradigm for understanding neurodegenerative diseases.

      Strengths

      This is an interesting paper based on a lot of work and a high number of diverse experiments that point to the pervasive roles of DLK in the development of select glutamatergic hippocampal neurons. One should applaud the authors for their work in constructing sophisticated molecular cre-lox tools and their expert Ribotag analysis, as well as technical skill and scholarly treatment of the literature. I am somewhat more skeptical of interpretations and conclusions on spatial anatomical selectivity without stereological approaches and also going directly from (extremely complex) Ribotag profiling patterns to relevance based on immunohistochemistry and no additional interventions to manipulate (e.g. by knocking down or blocking) their top Ribotag profile hits. Also, it seems to this reviewer that major developmental claims in the paper are based on gene translational profiling dependent on DLK expression, not DLK activation, despite some evidence in the paper that there is a correlation between the two. Therefore, observed patterns and correlations may or may not be physiologically or pathologically relevant. Generalizability to neurodegenerative diseases is an overreach not justified by the scope, approach, and findings of the paper.

      Weaknesses and Suggestions:

      The authors state that the rationale for the translatomic studies is to "to gain molecular understanding of gene expression associated with DLK in glutamatergic neurons" and to characterize the "DLK-dependent molecular and cellular network", However, a problem with the experimental design is the selection of an anatomical region at a time point featured by active neurodegeneration. Therefore, it is not straightforward that the differentially expressed genes or pathways caused by DLK overexpression changes could be due to processes related to neurodegeneration. Indeed, the authors find enrichment of signals related to pathways involved in extracellular matrix organization, apoptosis, unfolded protein responses, the complement cascade, DNA damage responses, and depletion of signals related to mitochondrial electron transport, etc., all of which could be the consequence of neurodegeneration regardless of cause. A more appropriate design to discover DLK-dependent pathways might be to look at a region and/or a time point that is not confounded by neurodegeneration.

      In a related vein, the authors ask "if the differentially expressed genes associated with DLK(iOE) might show correlation to neuronal vulnerability" and, to answer this question, they select the set of differentially expressed genes after DLK overexpression and assess their expression patterns in various regions under normal conditions. It looks to me that this selection is already confounded by neurodegeneration which could be the cause for their downregulation. Therefore, such gene profiles may not be directly linked to neuronal vulnerability. A similar issue also relates to the conclusion that "...the enrichment of DLK-dependent translation of genes in CA1 suggests that the decreased expression of these genes may contribute to CA1 neuron vulnerability to elevated DLK".

      To understand the role and relevance of the DLK overexpression model, there should be a discussion of to what extent it corresponds to endogenous levels of DLK expression or DLK-MAPK pathway activation under baseline or pathological conditions.

      The authors posit that "dorsal CA1 neurons are vulnerable to elevated DLK expression, while neurons in CA3 appear largely resistant to DLK overexpression". This statement assumes that DLK expression levels start at a similar baseline among regions. Do the authors have any such data? Ideally, they should show whether DLK expression and p-c-Jun (as a marker of downstream DLK signaling) are the same or different across regions in both WT and overexpression mice. For example, what are the DLK/p-c-Jun expression levels in regions other than CA1 in Supplementary Figures 2-3 and how do they compare with each other? Normalization to baseline for each region does not allow such a comparison. Also, in Supplementary Figure 6, analyses and comparisons between regions are done at a time point when degeneration has already started. Ideally, these should be done at P10.

      Illustration of proposed selective changes in hippocampal sector volume needs to be very carefully prepared in view of the substantial claims on selective vulnerability. In 2A under P15 and especially P60, it is difficult to see the difference - this needs lower magnification and a lot of care that anteroposterior levels are identical because hippocampal sector anatomy and volumes of sectors vary from level to level. One wonders if the cortex shrinks, too. This is important.

      One cannot be sure that there is selective death of hippocampal sectors with DLK overexpression versus, say, rearrangement of hippocampal architecture. One may need stereological analysis, otherwise this substantial claim appears overinterpreted.

      Is the GFAP excess reflective of neuroinflammation? What do microglial markers show? The presence of neuroinflammation does not bode well with apoptosis. Speaking of which, TUNEL in one cell in Supplementary Figure 4E is not strong evidence of a more widespread apoptotic event in CA1.

      In several places in the paper (as illustrated in Figure 4B, Supplementary Figure 2B, etc.): the unit of biological observation in animal models is typically not a cell, but an organism, in which averaged measures are generated. This is a significant methodological problem because it is not easy to sample neurons without involving stereological methods. With the approach taken here, there is a risk that significance may be overblown.

      Other Comments and Questions:

      Supplementary Figure 9: The authors state that data points are shown for individual ROIs - ideally, they should also show averages for biological replicates. Can the authors confirm that statistical analyses are based on biological replicates (mice) and not ROIs?

      For in vitro experiments, what is the effect of DLK overexpression on neuronal viability and density? Could these variables confound effects on synaptogenesis/synapse maturation?

      Correlations between c-jun expression and phosphorylation are extremely important and need to be carefully and convincingly documented. I am a bit concerned about Supplementary Figure 6 images, especially 6B-CA1 (no difference between control and KO, too small images) and 6D (no p-c-Jun expression at all anywhere in the hippocampus at P15?).

    1. eLife assessment

      Data presented in this useful report suggest a potentially new model for chemotaxis regulation in the gram-negative bacterium P. putida. Data supporting interactions between CheA and the copper-binding protein CsoR, reveal potential mechanisms for coordinating chemotaxis and copper resistance. There was, however concern about the large number of CheA interactors identified in the initial screen and it was felt that the study was incomplete without a substantial number of additional experiments to test the model and bolster the authors' conclusions.

    2. Reviewer #1 (Public Review):

      This report contains two parts. In the first part, several experiments were carried out to show that CsoR binds to CheA, inhibits CheA phosphorylation, and impairs P. putida chemotaxis. The second part provides some evidence that CsoR is a copper-binding protein, binds to CheA in a copper-dependent manner, and regulates P. putida response to copper, a chemorepellent. Based on these results, a working model is proposed to describe how CsoR coordinates chemotaxis and resistance to copper in P. putida. While the second part of the study is relatively solid, there are some major concerns about the first part.

      Critiques:

      (1) The rigor from prior research is not clear. In addition to talking about other bacterial chemotaxis, the Introduction should briefly summarize previous work on P. putida chemotaxis and copper resistance.

      (2) The rationale for identifying those CheA-binding proteins is vague. CheA has been extensively studied and its functional domains (P1 to P5) have been well characterized. Compared to its counterparts from other bacteria, does P. putida CheA contain a unique motif or domain? Does CsoR bind to other bacterial CheAs or only to P. putida CheA?

      (3) Line 133-136, "Collectively, using pull-down, BTH, and BiFC assays, we identified 16 new CheA-interacting proteins in P. putida." It is surprising that so many proteins were identified but none of them were chemotaxis proteins, in particular those known to interact with CheA, such as CheW, CheY and CheZ, which raises a concern about the specificity of these methods. BTH and BiFC often give false-positive results and thus should be substantiated by other approaches such as co-IP, surface plasmon resonance (SPR), or isothermal titration calorimetry (ITC) along with mutagenesis studies.

      (4) Line 147-149, "Fig. 2a, five strains (WT+pcsoR, WT+pispG, WT+pnfuA, WT+pphaD, and WT+pPP_1644) displayed smaller colony than the control strain (WT+pVec), indicating a weaker chemotaxis ability in these five strains." If copper is a chemorepellent, these strains should swim away from high concentrations of copper; thus, the sizes of colonies couldn't be used to measure this response. In the cited reference (reference 29), bacterial response to phenol was measured using a response index (RI).

      (5) Figures 2 and 3 show both CsoR and PhaD bind to CheA and inhibit CheA autophosphorylation. Do these two proteins share any sequence or structural similarity? Does PhaD also bind to copper? Otherwise, it is difficult to understand these results.

      (6) Line 195-196, "CsoR/PhaD had no apparent influence on the phosphate transfer between CheA and CheY". CheA controls bacterial chemotaxis through CheY phosphorylation. If this is true, how do CsoR and PhaD affect chemotaxis?

      (7) Figure 3 shows that CsoR/PhaD bind to CheA through P1, P3, and P4. This result is intriguing. All CheA proteins contain these three domains. If this is true, CsoR/PhaD should bind to other bacterial CheAs too. That said, this experiment is premature and needs to be confirmed by other approaches.

      (8) Figure 5, does PhaD contain these three residues (C40, H65, and C69)? If not, how does PhaD inhibit CheA autophosphorylation and chemotactic response to copper?

      (9) Does deletion of cosR or cheA have any impact on P. putida resistance to high concentrations of copper?

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript focuses on the apparent involvement of a proposed copper-responsive regulator in the chemotactic response of Pseudomonas putida to Cu(II), a chemorepellent. Broadly, this area is of interest because it could provide insight into how soil microbes mitigate metal stress. Additionally, copper has some historical agricultural use as an antimicrobial, thus can accumulate in soil. The manuscript bases its conclusions on an in vitro screen to identify interacting partners of CheA, an essential kinase in the P. putida chemotaxis-signaling pathway. Much of the subsequent analysis focuses on a regulator of the CsoR/RcnR family (PP_2969).

      Weaknesses:

      The data presented in this work does not support the model (Figure 8). In particular, PP_2969 is linked to Ni/Co resistance, not Cu resistance. Further, it is not clear how the putative new interactions with CheA would be integrated into diverse responses to various chemoattract/repellents. These two comments are justified below.

      PP_2969

      (1) The authors present a sequence alignment (Figure S5) that is the sole basis for their initial assignment of this ORF as a CsoR protein. There is a conservation of the primary coordinating ligands (highlighted with asterisks) known to be involved in Cu(I) binding to CsoR (ref 31). There are some key differences, though, in residues immediately adjacent to the conserved Cys (the preceding Ala, which is Tyr in the other sequences). The effect of these changes may be significant in a physiological context.

      (2) The gene immediately downstream of PP_2969 is homologous to E. coli RcnA, a demonstrated Ni/Co efflux protein, suggesting that P2969 may be Ni or Co responsive. Indeed PP_2970 has previously been reported as Ni/Co responsive (J. Bact 2009 doi:10.1128/JB.00465-09). The host cytosol plays a critical role in determining metal response, in addition to the protein, which can explain the divergence from the metal response expected from the alignment.

      (3) The previous JBact study also explains the lack of an effect (Figure 5b) of deleting PP_2969 on copper-efflux gene expression (copA-I, copA-II, and copB-II) as these are regulated by CueR not PP_2969 consistent with the previous report. Deletion of CsoR/RcnR family regulator will result in constitutive expression of the relevant efflux/detoxification gene, at a level generally equivalent to the de-repression observed in the presence of the signal.

      (4) Further, CsoR proteins are Cu(I) responsive so measuring Cu(II) binding affinity is not physiologically relevant (Figures 5a and S5b). The affinities of demonstrated CsoR proteins are 10-18 M and these values are determined by competition assay. The MTS assay and resulting affinities are not physiologically relevant.

      (5) The DNA-binding assays are carried out at protein concentrations well above physiological ranges (Figures 5c and d, and S5c, d). The weak binding will in part result from using DNA sequences upstream of the copA genes and not from from PP_2970.

      CheA interactions

      (1) There is no consideration given to the likely physiological relevance of the new interacting partners for CheA.

      (2) How much CheA is present in the cell (copies) and how many copies of other proteins are present? How would specific responses involving individual interacting partners be possible with such a heterogenous pool of putative CheA-complexes in a cell? For PP_2969, the affinity reported (Figure 5A) may lay at the upper end of the CsoR concentration range (for example, CueR in Salmonella is present at ~40 nM).

      (3) The two-hybrid system experiment uses a long growth time (60 h) before analysis. Even low LacZ activity levels will generate a blue colour, depending upon growth medium (see doi: 10.1016/0076-6879(91)04011-c). It is also not clear how Miller units can be accurately or precisely determined from a solid plate assay (the reference cited describes a protocol for liquid culture).

    1. eLife assessment

      This work describes for the first time the combined gene expression and chromatin structure at the genome level in isolated chondrocytes and classical (cranial) and non-classical (notochordal) osteoblasts. In a compelling analysis of RNA-Seq and ATAC data, the authors characterize the two osteoblast populations relative to their associated chondrocyte cells and further proceed with a convincing analysis of the crucial entpd5a gene regulatory elements by investigating their respective transcriptional activity and specificity in developing zebrafish.

    2. Reviewer #1 (Public Review):

      Summary:

      This work uses transgenic reporter lines to isolate entpd5a+ cells representing classical osteoblasts in the head and non-classical (osterix-) notochordal sheath cells. The authors also include entpd5a- cells, col2a1a+ cells to represent the closely associated cartilage cells. In a combination of ATAC and RNA-Seq analysis, the genome-wide transcriptomic and chromatin status of each cell population is characterized, validating their methodology and providing fundamental insights into the nature of each cell type, especially the less well-studied notochordal sheath cells. Using these data, the authors then turn to a thorough and convincing analysis of the regulatory regions that control the expression of the entpd5a gene in each cell population. Determination of transcriptional activities in developing zebrafish, again combined with ATAC data and expression data of putative regulators, results in a compelling and detailed picture of the regulatory mechanisms governing the expression of this crucial gene.

      Strengths:

      The major strength of this paper is the clever combination of RNA-Seq and ATAC analysis, further combined with functional transcriptional analysis of the regulatory elements of one crucial gene. This results in a very compelling story.

      Weaknesses:

      No major weaknesses were identified, except for all the follow-up experiments that one can think of, but that would be outside of the scope of this paper.

    3. Reviewer #2 (Public Review):

      Summary:

      Complementary to mammalian models, zebrafish has emerged as a powerful system to study vertebrate development and to serve as a go-to model for many human disorders. All vertebrates share the ancestral capacity to form a skeleton. Teleost fish models have been a key model to understand the foundations of skeletal development and plasticity, pairing with more classical work in amniotes such as the chicken and mouse. However, the genetic foundation of the diversity of skeletal programs in teleosts has been hampered by mapping similarities from amniotes back and not objectively establishing more ancestral states. This is most obvious in systematic, objective analysis of transcriptional regulation and tissue specification in differentiated skeletal tissues. Thus, the molecular events regulating bone-producing cells in teleosts have remained largely elusive. In this study, Petratou et al. leverage spatial experimental delineation of specific skeletal tissues -- that they term 'classical' vs 'non-classical' osteoblasts -- with associated cartilage of the endo/peri-chondrial skeleton and inter-segmental regions of the forming spine during development of the zebrafish, to delineate molecular specification of these cells by current chromatin and transcriptome analysis. The authors further show functional evidence of the utility of these datasets to identify functional enhancer regions delineating entp5 expression in 'classical' or 'non-classical' osteoblast populations. By integration with paired RNA-seq, they delineate broad patterns of transcriptional regulation of these populations as well as specific details of regional regulation via predictive binding sites within ATACseq profiles. Overall the paper was very well written and provides an essential contribution to the field that will provide a foundation to promote modeling of skeletal development and disease in an evolutionary and developmentally informed manner.

      Strengths:

      Taken together, this study provides a comprehensive resource of ATAC-seq and RNA-seq data that will be very useful for a wide variety of researchers studying skeletal development and bone pathologies. The authors show specificity in the different skeletal lineages and show the utility of the broad datasets for defining regulatory control of gene regulation in these different lineages, providing a foundation for hypothesis testing of not only agents of skeletal change in evolution but also function of genes and variations of unknown significance as it pertains to disease modeling in zebrafish. The paper is excellently written, integrating a complex history and experimental analysis into a useful and coherent whole. The terminology of 'classical' and 'non-classical' will be useful for the community in discussing the biology of skeletal lineages and their regulation.

      Weaknesses:

      Two items arose that were not critical weaknesses but areas for extending the description of methods and integration into the existing data on the role of non-classical osteoblasts and establishment/canalization of this lineage of skeletal cells.

      (1) In reading the text it was unclear how specific the authors' experimental dissection of the head/trunk was in isolating different entp5a osteoblast populations. Obviously, this was successful given the specificity in DEG of results, however, analysis of contaminating cells/lineages in each population would be useful - e.g. using specific marker genes to assess. The text uses terms such as 'specific to' and 'enriched in' without seemingly grounded meaning of the accuracy of these comments. Is it really specific - e.g. not seen in one or other dataset - or is there some experimental variation in this?

      (2) Further, it would be valuable to discuss NSC-specific genes such as calymmin (Peskin 2020) which has species and lineage-specific regulation of non-classical osteoblasts likely being a key mechanistic node for ratcheting centra-specific patterning of the spine in teleost fishes. What are dynamics observed in this gene in datasets between the different populations, especially when compared with paralogues - are there obvious cis-regulatory changes that correlate with the co-option of this gene in the early regulation of non-classical osteoblasts? The addition of this analysis/discussion would anchor discussions of the differential between different osteoblasts lineages in the paper.

    4. Reviewer #3 (Public Review):

      Summary:

      This study characterizes classical and nonclassical osteoblasts as both types were analyzed independently (integrated ATAC-seq and RNAseq). It was found that gene expression in classical and nonclassical osteoblasts is not regulated in the same way. In classical osteoblasts, Dlx family factors seem to play an important role, while Hox family factors are involved in the regulation of spinal ossification by nonclassical osteoblasts. In the second part of the study, the authors focus on the promoter structure of entpd5a. Through the identification of enhancers, they reveal complex modes of regulation of the gene. The authors suggest candidate transcription factors that likely act on the identified enhancer elements. All the results taken together provide comprehensive new insights into the process of bone development, and point to spatio-temporally regulated promoter/enhancer interactions taking place at the entpd5a locus.

      Strengths:

      The authors have succeeded in justifying a sound and consistent buildup of their experiments, and meaningfully integrating the results into the design of each of their follow-up experiments. The data are solid, insightfully presented, and the conclusion valid. This makes this manuscript of great value and interest to those studying (fundamental) skeletal biology.

      Weaknesses:

      The study is solidly constructed, the manuscript is clearly written and the discussion is meaningful - I see no real weaknesses.

    1. eLife assessment

      In this manuscript, Chen et al. used cryo-ET and in vitro reconstituted system to demonstrate that the autoinhibited form of LRRK2 can also assemble into filaments on the microtubule surface, with a new interface involving the N-terminal repeats that were disordered in the previous active-LRRK2 filament structure. The structure obtained in this study is the highest resolution of LRRK2 filaments done by subtomogram averaging, representing a major technical advance compared to the previous paper from the same group. This is an important study, especially considering the pharmacological implications of the effect of inhibitors of the protein. The strengths of the data are convincing, but the study would be considerably strengthened if the authors addressed several discrepancies relating to their earlier work, and explored the physiological significance of the new interfaces and the incomplete decoration of microtubules described here.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen et al. used cryo-ET and in vitro reconstituted system to demonstrate that the autoinhibited form of LRRK2 can also assemble into filaments that wrap around the microtubule, although the filaments are typically shorter and less regular compared to the previously reported active-LRRK2 filaments. The structure revealed a new interface involving the N-terminal repeats that were disordered in the previous active-LRRK2 filament structure. The autoinhibited-LRRK2 filament also has different helical parameters compared to the active form.

      Strengths:

      The structure obtained in this study is the highest resolution of LRRK2 filaments done by subtomogram averaging, representing a major technical advance compared to the previous Cell paper from the same group. Overall, I think the data are well presented with beautiful graphic rendering, and valuable insights can be gained from this structural study.

      Weaknesses:

      (1) There are only three main figures, together with 9 supplemental figures. The authors may consider breaking the currently overwhelming Figures 1 and 3 into smaller figures and moving some of the supplemental figures to the main figure, e.g., Figure S7.

      (2) The key analysis of this manuscript is to compare the current structure with the previous active-LRRK2 filament structure. Currently, such a comparison is buried in Figure 3H. It should be part of Figure 1.

    3. Reviewer #2 (Public review):

      The authors of this paper have done much pioneering work to decipher and understand LRRK2 structure and function, to uncover the mechanism by which LRRK2 binds to microtubules, and to study the roles that this may play in biology. Their previous data demonstrated that LRRK2 in the active conformation (pathogenic mutation or Type I inhibitor complex) bound to microtubule filaments in an ordered helical arrangement. This they showed induced a "roadblock" in the microtubule impacting vesicular trafficking. The authors have postulated that this is a potentially serious flaw with Type 1 inhibitors and that companies should consider generating Type 2 inhibitors in which the LRRK2 is trapped in the inactive conformation. Indeed the authors have published much data that LRRK2 complexed to Type 2 inhibitors does not seem to associate with microtubules and cause roadblocks in parallel experiments to those undertaken with type 1 inhibitors published above.

      In the current study, the authors have undertaken an in vitro reconstitution of microtubule-bound filaments of LRRK2 in the inactive conformation, which surprisingly revealed that inactive LRRK2 can also interact with microtubules in its auto-inhibited state. The authors' data shows that while the same interphases are seen with both the active LRRK2 and inactive microtubule bound forms of LRRK2, they identified a new interphase that involves the WD40-ARM-ANK- domains that reportedly contributes to the ability of the inactive form of LRRK2 to bind to microtubule filaments. The structures of the inactive LRRK2 complexed to microtubules are of medium resolution and do not allow visualisation of side chains.

      This study is extremely well-written and the figures are incredibly clear and well-presented. The finding that LRRK2 in the inactive autoinhibited form can be associated with microtubules is an important observation that merits further investigation. This new observation makes an important contribution to the literature and builds upon the pioneering research that this team of researchers has contributed to the LRRK2 fields. However, in my opinion, there is still significant work that could be considered to further investigate this question and understand the physiological significance of this observation.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen et al examines the structure of the inactive LRRK2 bound to microtubules using cryo-EM tomography. Mutations in this protein have been shown to be linked to Parkinson's Disease. It is already shown that the active-like conformation of LRRK2 binds to the MT lattice, but this investigation shows that full-length LRRk2 can oligomerize on MTs in its autoinhibited state with different helical parameters than were observed with the active-like state. The structural studies suggest that the autoinhibited state is less stable on MTs.

      Strengths:

      The protein of interest is very important biomedically and a novel conformational binding to microtubules in the proposed.

      Weaknesses:

      (1) The structures are all low resolution.

      (2) There are no measurements of the affinity of the various LRRK2 molecules (with and without inhibitors) to microtubules. This should be addressed through biochemical sedimentation assay.

    1. eLife assessment

      The authors provide convincing data that identify a novel, non-opioid biologic from human birth tissue products with anti-nociceptive properties in a preclinical mouse model of surgical pain. This important study highlights the potential use of naturally derived biologics from human birth tissues as safe and sustainable pain treatment options that do not possess the adverse side effects associated with opioids and synthetic pharmaceuticals. Whether these results will translate to the clinic remains to be seen, nevertheless, these preclinical findings are promising.

    2. Reviewer #1 (Public review):

      Summary:

      Opioids and related drugs are powerful analgesics that reduce suffering from pain. Unfortunately, their use often leads to addiction and there is an opioid-abuse epidemic that affects people worldwide. This study represents an ongoing effort to develop non-opioid analgesics for pain management. The findings point to an alternative approach to control post-surgical pain in lieu of opioid medications.

      Strengths:

      (1) The study responds to the urgent need for the development of non-opioid analgesics.

      (2) The study demonstrates the efficacy of Clarix Flo (FLO) and HC-HA/PTX3 from the human amniotic membrane (AM) in reducing pain in a mouse model without the adverse effects of opioids.

      (3) The study further explored the underlying mechanisms of how HC-HA/PTX3 produces its effects on neurons, suggesting the molecules/pathways involved in pain relief.

      (4) The potential use of naturally derived biologics from human birth tissues (AM) is safe and sustainable, compared to synthetic pharmaceuticals.

      (5) The study was conducted with scientific rigor, involving purification of active components, comparative analysis with multiple controls, and mechanistic explorations.

      Weaknesses:

      (1) It should be cautioned that while the preclinical findings are promising, these results still need to be translated into clinical settings that are complex and often unpredictable.

      (2) The study shows the efficacy of FLO and HC-HA/PTX3 in one preclinical model of post-surgical pain. The observed effect may be variable in other pain conditions.

    3. Reviewer #2 (Public review):

      Summary:

      This is an outstanding piece of work on the potential of FLO as a viable analgesic biologic for the treatment of postsurgical pain. The authors purified the HC-HA/PTX3 from FLO and demonstrated its potential as an effective non-opioid therapy for postsurgical pain. They further unraveled the mechanisms of action of the compound at cellular and molecular levels.

      Strengths:

      Prominent strengths include the incorporation of behavioral assessment, electrophysiological and imaging recordings, the use of knockout and knockdown animals, and the use of antagonist agents to verify biological effects. The integrated use of these techniques, combined with the hypothesis-driven approach and logical reasoning, provides compelling evidence and novel insight into the mechanisms of the significant findings of this work.

      Weaknesses:

      I did not find any significant weaknesses even with a critical mindset. The only minor suggestion is that the Results section may focus on the results from this study and minimize the discussions of background information.

    4. Reviewer #3 (Public review):

      Summary:

      Non-opioid analgesics derived from human amniotic membrane (AM) product represents a novel and unique approach to analgesia that may avoid the traditional harms associated with opioids. Here, the study investigators demonstrate that HC-HAPTX3 is the primary bioactive component of the AM product FLO responsible for anti-nociception in mouse-model and in-vitro dorsal root ganglion (DRG) cell culture experiments. The mechanism is demonstrated to be via CD44 with an acute cytoskeleton rearrangement that is induced that inhibits Na+ and Ca++ current through ion channels. Taken together, the studies reported in the manuscript provide supportive evidence clarifying the mechanisms and efficacy of HC-HAPTX3 antinociception and analgesia.

      Strengths:

      Extensive experiments including murine behavioral paw withdrawal latency and Catwalk test data demonstrating analgesic properties. The breadth and depth of experimental data are clearly supporting mechanisms and antinociceptive properties.

      Weaknesses:

      A few changes to the text of the manuscript would be recommended but no major weaknesses were identified.

    1. eLife assessment

      Saijilafu et al. describe that MLCK and MLCP bidirectionally regulate NMII phosphorylation ultimately impinging on axonal growth during regeneration in the central and peripheral nervous systems. However, the evidence is in most cases incomplete, since some key controls are missing, some major claims are too broad to be supported by data and some claims and evidence present internal contradictions. In sum, this knowledge is potentially useful for the field due to the relevance of identifying mechanisms that regulate axonal regeneration, providing some claims inconsistencies are better supported and properly discussed.

    2. Reviewer #1 (Public review):

      This paper examines the role of MLCK (myosin light chain kinase) and MLCP (myosin light chain phosphatase) in axon regeneration. Using loss-of-function approaches based on small molecule inhibitors and siRNA knockdown, the authors explore axon regeneration in cell culture and in animal models. Their evidence shows that MLCK activity facilitates axon extension/regeneration, while MLCP prevents it.

      Major concern:

      A global inconsistency in the conclusions of the authors is evident when trying to understand the role of NMII in axon growth and to understand the present results in light of previous reports by the authors and many others on the role of NMII in axon extension. The discussion of the matter fails to acknowledge a vast literature on how NMII activity is regulated. The authors study enzymes responsible for the phosphorylation and dephosphorylation of NMII, referring to something that is strongly proven elsewhere, that phosphorylation activates NMII and dephosphorylation deactivates it. The authors mention their own previous evidence using inhibitors of NMII ATPase activity (blebbistatin, Bleb for short) and inhibitors of a kinase that phosphorylates NMII (ROCK), highlighting that Bleb increases axon growth. Since Bleb inhibits the ATPase activity of NMII, it follows that NMII is in itself an inhibitor of axon growth, and hence when NMII is inhibited, the inhibition on axon growth is relieved, and axonal growth takes place (REF1). It is known that NMII exists in an inactive folded state, and ser19 phosphorylation (by MLCK or ROCK) extends the protein, allowing NMII filament formation, ATPase activity, and force generation on actin filaments (REF2). From this, it is derived that if MLCK is inhibited, then there is no NMII phosphorylation, and hence no NMII activity, and, according to their previous work, this should promote axon growth. On the contrary, the authors show the opposite effect: in the lack of phospho-MLC, authors show axon growth inhibition.

      Reporting evidence challenging previous conclusions is common business in scientific endeavors, but the problem with the current manuscript is that it fails to point to and appropriately discuss this contradiction. Instead, the authors refer to the fact that MLCK and Bleb inhibit NMII in different steps of the activation process. While this is true, this explanation does not solve the contradiction. There are many options to accommodate the information, but it is not the purpose of this revision to provide them. Since the manuscript is focused solely on phosphorylation states of MLC and axon extension, the claims are simply at odds with the current literature, and this important finding, if true, is not properly discussed.

      What follows is a discussion of the merits and limitations of different claims of the manuscript in light of the evidence presented.

      (1) Using western blot and immunohistochemical analyses, authors first show that MLCK expression is increased in DRG sensory neurons following peripheral axotomy, concomitant to an increase in MLC phosphorylation, suggesting a causal effect (Figure 1). The authors claim that it is common that axon growth-promoting genes are upregulated. It would have been interesting at this point to study in this scenario the regulation of MLCP, which is a main subject in this work, and expect its downregulation.

      (2) Using DRG cultures and sciatic nerve crush in the context of MLCK inhibition and down-regulation, authors conclude that MLCK activity is required for mammalian peripheral axon regeneration both in vitro and in vivo (Figure 2).

      The in vitro evidence is of standard methods and convincing. However, here, as well as in all other experiments using siRNAs, it is not clear what the control is about (the identity of the plasmids and sequences, if any).

      Related to this, it is not helpful to show the same exact picture as a control example in Figures 2 and 3 (panels J and E, respectively). Either because they should not have received the same control treatment, or simply because it raises concern that there are no other control examples worth showing. In these images, it is not also clear where and how the crush site is determined in the GFP channel. This is of major importance since the axonal length is measured from the presumed crush site. Apart from providing further details in the text, the authors should include convincing images.

      (3) The authors then examined the role of the phosphatase MLCP in axon growth during regeneration. The authors first use a known MLCP blocker, phorbol 12,13-dibutyrate (PDBu), to show that is able to increase the levels of p-MLC, with a concomitant increase in the extent of axon regrowth of DRG neurons, both in permissive as well as non-permissive. The authors repeat the experiments using the knockdown of MYPT1, a key component of the MLC-phosphatase, and again can observe a growth-promoting effect (Figure 3).

      The authors further show evidence for the growth-enhancing effect in vivo, in nerve crush experiments. The evidence in vivo deserves more evidence and experimental details (see comment 2). Some key weaknesses of the data were mentioned previously (unclear RNAi controls and duplication of shown images), but in this case, it is also not clear if there is a change only in the extent of growth, or also in the number of axons that are able to regenerate.

      (4) In the next set of experiments (presented in Figure 4) authors extend the previous observations in primary cultures from the CNS. For that, they use cortical and hippocampal cultures, and pharmacological and genetic loss-of-function using the above-mentioned strategies. The expected results were obtained in both CNS neurons: inhibition or knockdown of the kinase decreases axon growth, whereas inhibition or knockdown of the phosphatase increases growth. A main weakness in this set is that it is not indicated when (at what day in vitro, DIV) the treatments are performed. This is important to correctly interpret the results, since in the first days in vitro these neurons follow well-characterized stages of development, with characteristic cellular events with relevance to what is being evaluated. Importantly, this would be of value to understand whether the treatments affect axonal specification and/or axonal extension. Although these events are correlated, they imply a different set of molecular events.

      The title of this section is misleading: line 241 "MLCK/MLCP activity regulated axon growth in the embryonic CNS"... the title (and the conclusion) implies that the experiments were performed in situ, looking at axons in the developing brain. The most accurate title and conclusion should mention that the evidence was collected in CNS primary cultures derived from embryos.

      (5) Performing nerve crush injury in CNS nerves (optic nerve and spinal cord), and the local application of PBDu, the author shows contrasting results (Figure 5). In the ON nerve, they can see axons extending beyond the lesion site due to PBDu. On the contrary, the authors fail to observe so in the corticospinal tract present in the spinal cord. The authors fail to discuss this matter in detail. Also, they accommodate the interpretation of the evidence in light of a process known as axon retraction, and its prevention by MLCP inhibition. Since the whole paper is on axon extension, and it is known that mechanistically axon retraction is not merely the opposite of axon extension, the claim needs far more evidence.

      In panel 5F and the supplementary data, the authors mention the occurrence of retraction bulbs, but the images are too small to support the claim, and it is not clear how these numbers were normalized to the number of axons labeled in each condition.

      (6) The author combines MLCK and MLCP inhibitors with Bleb, trying to verify if both pairs of inhibitors act on the same target/pathway (Figure 6). The rationale is wrong for at least two reasons.<br /> a- Because both lines of evidence point to contrasting actions of NMII on axon growth, one approach could never "rescue" the other.<br /> b- Because the approaches target different steps on NMII activation, one could never "prevent" or rescue the other. For example, for Bleb to provide a phenotype, it should find any p-MLC, because it is only that form of MLC that is capable of inhibiting its ATPase site. In light of this, it is not surprising that Bleb is unable to exert any action in a situation where there is no p-MLC (ML-7, which by inhibiting the kinase drives the levels of p-MLC to zero, Figure 4A). Hence, the results are not possible to validate in the current general interpretation of the authors. (See 'major concern').

      (7) In Figure 7, the authors argue that the scheme of replating and using ML7 before or after replating is evidence for a local cytoskeletal action of the drug. However, an alternative simpler explanation is that the drug acts acutely on its target, and that, as such, does not "survive" the replating procedure. Hence, the conclusion raised by the evidence shown is not supported.

      (8) In Figure 8, the authors show that the inhibitory treatments on MLCK and MLCP (ML7 and PRBu) alter the morphology of growth cones. However, it is not clear how this is correlated with axon growth. The authors also mention in various parts of the text that a local change in the growth cone is evidence for a local action/activity of the drug or enzyme. However the local change<->local action is not a logical truth. It can well be that MLCK and MLCP activity trigger molecular events that ultimately have an effect elsewhere, and by looking at "elsewhere" one observes of course a local effect, but is not because the direct action of MLCK or MLCP are localized. To prove true localized effects there are numerous efforts that can be made, starting from live imaging, fluorescent sensors, and compartmentalized cultures, just to mention a few.

      References:

      (1) Eun-Mi Hur 1, In Hong Yang, Deok-Ho Kim, Justin Byun, Saijilafu, Wen-Lin Xu, Philip R Nicovich, Raymond Cheong, Andre Levchenko, Nitish Thakor, Feng-Quan Zhou. 2011. Engineering neuronal growth cones to promote axon regeneration over inhibitory molecules. Proc Natl Acad Sci U S A. 2011 Mar 22;108(12):5057-62. doi: 10.1073/pnas.1011258108.

      (2) Garrido-Casado M, Asensio-Juárez G, Talayero VC, Vicente-Manzanares M. 2024. Engines of change: Nonmuscle myosin II in mechanobiology. Curr Opin Cell Biol. 2024 Apr;87:102344. doi: 10.1016/j.ceb.2024.102344.

      (3) Karen A Newell-Litwa 1, Rick Horwitz 2, Marcelo L Lamers. 2015. Non-muscle myosin II in disease: mechanisms and therapeutic opportunities. Dis Model Mech. 2015 Dec;8(12):1495-515. doi: 10.1242/dmm.022103.

    3. Reviewer #2 (Public review):

      Summary:

      Saijilafu et al. demonstrate that MLCK/MLCP proteins promote axonal regeneration in both the central nervous system (CNS) and peripheral nervous system (PNS) using primary cultures of adult DRG neurons, hippocampal and cortical neurons, as well as in vivo experiments involving sciatic nerve injury, spinal cord injury, and optic nerve crush. The authors show that axon regrowth is possible across different contexts through genetic and pharmacological manipulation of these proteins. Additionally, they propose that MLCK/MLCP may regulate F-actin reorganization in the growth cone, which is significant as it suggests a novel strategy for promoting axonal regeneration.

      Strengths:

      This manuscript presents a comprehensive array of experimental models, addressing the biological question in a broad manner. Particularly noteworthy is the use of multiple in vivo models, which significantly strengthens the overall validity of the study.

      Weaknesses:

      The following aspects apply:

      (1) The manuscript initially references prior research by the authors suggesting that NMII inhibition enhances axonal growth and that MLCK activates NMII. However, the study introduces a contradiction by demonstrating that MLCK inhibition (via ML-7 or siMLCK) inhibits axonal growth. This inconsistency is not adequately addressed or discussed in the manuscript.

      (2) While the study proposes that MLCK/MLCP regulates F-actin redistribution in the growth cone, the mechanism is not explored in depth. The only figure showing how pharmacological manipulation affects the growth cone suggests that not only F-actin but also the microtubule cytoskeleton might be affected, indicating that the mechanism may not be specific. A deeper exploration of this relationship in DRG neurons, in addition to cortical neurons, as shown in the study, would be beneficial.

      (3) In the sciatic nerve injury experiments, it would be crucial to include additional controls that clearly demonstrate that siMYPT1 treatment increases MLCP in the L4-L5 ganglia. Additionally, although the manuscript mentions quantifying axons expressing EGFP, the Materials and Methods section only discusses siMYPT1 electroporation, which could lead to confusion.

      (4) In some panels, it is difficult to differentiate the somas from the background (Figures 3, 4, 7). In conditions where images with shorter axonal lengths are represented, it is unclear whether this is due to fewer cells or reduced axonal growth (Figures 2, 4, 6).

    1. eLife assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

    2. Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

    3. Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

    4. Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results? I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

    1. eLife assessment

      This study demonstrates the potential role of 17α-estradiol in modulating neuronal gene expression in the aged hypothalamus of male rats, identifying key pathways and neuron subtypes affected by the drug. While the findings are useful and provide a foundation for future research, the strength of supporting evidence is incomplete due to the lack of female comparison, a young male control group, unclear link to 17α-estradiol lifespan extension in rats, demonstration of physiological effects of the treatment, and insufficient analysis of glial cells and cellular senescence in CRH neurons.

    2. Reviewer #1 (Public Review):

      Summary:

      Previous studies have shown that treatment with 17α-estradiol (a stereoisomer of the 17β-estradiol) extends lifespan in male mice but not in females. The current study by Li et al, aimed to identify cell-specific clusters and populations in the hypothalamus of aged male rats treated with 17α-estradiol (treated for 6 months). This study identifies genes and pathways affected by 17α-estradiol in the aged hypothalamus.

      Strengths:

      Using single-nucleus transcriptomic sequencing (snRNA-seq) on the hypothalamus from aged male rats treated with 17α-estradiol they show that 17α-estradiol significantly attenuated age-related increases in cellular metabolism, stress, and decreased synaptic activity in neurons.

      Moreover, sc-analysis identified GnRH as one of the key mediators of 17α-estradiol's effects on energy homeostasis. Furthermore, they show that CRH neurons exhibited a senescent phenotype, suggesting a potential side effect of the 17α-estradiol. These conclusions are supported by supervised clustering by neuropeptides, hormones, and their receptors.

      Weaknesses:

      However, the study has several limitations that reduce the strength of the key claims in the manuscript. In particular:

      (1) The study focused only on males and did not include comparisons with females. However, previous studies have shown that 17α-estradiol extends lifespan in a sex-specific manner in mice, affecting males but not females. Without the comparison with the female data, it's difficult to assess its relevance to the lifespan.

      (2) It is not known whether 17α-estradiol leads to lifespan extension in male rats similar to male mice. Therefore, it is not possible to conclude that the observed effects in the hypothalamus, are linked to the lifespan extension.

      (3) The effect of 17α-estradiol on non-neuronal cells such as microglia and astrocytes is not well-described (Figure 1). Previous studies demonstrated that 17α-estradiol reduces microgliosis and astrogliosis in the hypothalamus of aged male mice. Current data suggest that the proportion of oligo, and microglia were increased by the drug treatment, while the proportions of astrocytes were decreased. These data might suggest possible species differences, differences in the treatment regimen, or differences in drug efficiency. This has to be discussed.

      (4) A more detailed analysis of glial cell types within the hypothalamus in response to drugs should be provided.

      (5) The conclusion that CRH neurons are going into senescence is not clearly supported by the data. A more detailed analysis of the hypothalamus such as histological examination to assess cellular senescence markers in CRH neurons, is needed to support this claim.

    3. Reviewer #2 (Public Review):

      Summary:

      Li et al. investigated the potential anti-ageing role of 17α-Estradiol on the hypothalamus of aged rats. To achieve this, they employed a very sophisticated method for single-cell genomic analysis that allowed them to analyze effects on various groups of neurons and non-neuronal cells. They were able to sub-categorize neurons according to their capacity to produce specific neurotransmitters, receptors, or hormones. They found that 17α-Estradiol treatment led to an improvement in several factors related to metabolism and synaptic transmission by bringing the expression levels of many of the genes of these pathways closer or to the same levels as those of young rats, reversing the ageing effect. Interestingly, among all neuronal groups, the proportion of Oxytocin-expressing neurons seems to be the one most significantly changing after treatment with 17α-Estradiol, suggesting an important role of these neurons in mediating its anti-ageing effects. This was also supported by an increase in circulating levels of oxytocin. It was also found that gene expression of corticotropin-releasing hormone neurons was significantly impacted by 17α-Estradiol even though it was not different between aged and young rats, suggesting that these neurons could be responsible for side effects related to this treatment. This article revealed some potential targets that should be further investigated in future studies regarding the role of 17α-Estradiol treatment in aged males.

      Strengths:

      (1) Single-nucleus mRNA sequencing is a very powerful method for gene expression analysis and clustering. The supervised clustering of neurons was very helpful in revealing otherwise invisible differences between neuronal groups and helped identify specific neuronal populations as targets.

      (2) There is a variety of functions used that allow the differential analysis of a very complex type of data. This led to a better comparison between the different groups on many levels.

      (3) There were some physiological parameters measured such as circulating hormone levels that helped the interpretation of the effects of the changes in hypothalamic gene expression.

      Weaknesses:

      (1) One main control group is missing from the study, the young males treated with 17α-Estradiol.

      (2) Even though the technical approach is a sophisticated one, analyzing the whole rat hypothalamus instead of specific nuclei or subregions makes the study weaker.

      (3) Although the authors claim to have several findings, the data fail to support these claims.

      (4) The study is about improving ageing but no physiological data from the study demonstrated such a claim with the exception of the testes histology which was not properly analyzed and was not even significantly different between the groups.

      (5) Overall, the study remains descriptive with no physiological data to demonstrate that any of the effects on hypothalamic gene expression are related to metabolic, synaptic, or other functions.

    1. eLife assessment

      This study presents a valuable combination of X-ray and cryo-EM structures of the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. There is convincing support for the claims regarding the overall function and importance of individual domains. The model for PrgB's binding of eDNA is thought-provoking, but the evidence for it based on low-resolution volumes of cryoEM data is incomplete. If additional experimental evidence for the model is produced, this work will be impactful in the field of bacterial adhesins, conjugation, and biofilm formation, as it focuses on a clinically relevant Gram-positive pathogen, whereas most work in the field has been focused on Gram-negative model systems.

    1. eLife assessment

      This valuable work describes a novel role of Vangl2, a core planar cell polarity protein, in mechanistically linking the inflammatory NF-kB pathway to selective autophagic protein degradation. Using solid methods, the authors also establish the functional significance of the proposed mechanism in sepsis. The work may advance our understanding of NF-kB control, particularly in the context of aberrant inflammation. However, some gaps remain, and additional studies are needed to unequivocally establish the role of Vangl2 in regulating NF-kB signaling.

    1. Author Response

      eLife assessment

      Tilk and colleagues present a computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding – an important finding with general implications for our understanding of cancer evolution.

      We thank the reviewers for these kind words. The summary statement and public review highlight our work in understanding how human tumors phenotypically respond to mutational load by assessing changes in gene expression. This work provides a mechanistic underpinning to our previous finding that the accumulation of passenger mutations in tumors creates a substantial cost because even substantially damaging passenger mutations can fix in non-recombining clonal tumor lineages. At the same time, we believe the summary statement and the public review do not mention a key remaining part of our paper that validates our findings and establishes causal connections between protein misfolding due to coding passenger mutations and tumor fitness. Specifically, we replicate and cross-validate our findings in human tumors by examining expression responses in an independent dataset of cancer cell lines (CCLE), where we demonstrate similar expression responses to an accumulation of mutations, indicating generic, cell intrinsic responses. We then establish a causal link by demonstrating that mitigation of protein misfolding through protein degradation and re-folding is necessary for high mutational load cancer cells to maintain viability through perturbation experiments via shRNA known-down and treatment with targeted agents. These analyses and results are important because they show that the adaptive responses we observe are evidence of a generic, cell intrinsic phenomenon that cannot be explained by organismal effects, such as aging, changes in the immune system or microenvironment. 

      Joint Public Review:

      Tilk and colleagues present a computational investigation of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental and that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding.

      The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and additional effects for tumor homogeneity and type. This analysis identified a large number of genes (5000) that are more highly expressed at high mutational load at a FDR of 0.05. These genes are enriched in many core categories, most prominently in the proteasome, translation, and mitochondral translation. The authors then proceed to investigate specific categories of upregulated genes further.

      The individual reviews, and the discussion among the reviewers, raised several issues that could potentially undermine or weaken some of the findings presented in this paper.

      1) Systematic differences in expression of some genes from one tumor class to another might generate spurious associations with mutational load (ML), which would affect the results presented in Figs 1 and 3. The case of a causal link between ML and over-expression of genes that mitigate deleterious effects of misfolding would be stronger if these results were replicated within single cancer types with many samples with different ML (similar to how Fig S6 relates to Fig 3). A related concern might be an association between increased variance of expression and ML. The compositional nature of expression data could generate trends like the ones shown in Fig. 2 with changing variance.

      We agree with the reviewers that possible confounders should be considered since TCGA data is heterogeneous. In this paper, we investigated possible confounders such as multicollinearity with different mutational types (SNVs and CNVs), controlled for expression responses within cancer types in the GLMM, and used the jackknifing procedure to ensure that no one cancer type dominates the signal. However, in principle unknown hidden confounders could remain, which is why a large part of our paper was focused on validating these effects in an independent dataset (CCLE) where many other covariates are not relevant (immune system, donor variability, stage, age, sex, etc.). Importantly, we also used data from perturbation screens that are completely orthogonal to expression responses in CCLE to get at a cause and effect. 

      Our reasoning for using all of the data in Figure 1 while controlling for differences due to cancer type in the GLMM was to maximize the variation in mutational load across all of the samples in this dataset to identify what genes increase in expression as mutational load increases over 5 orders of magnitude. As noted here, we also already further validated that the signal we observe in Figure 1 is still robust for our gene sets of interest within cancer types in Supplemental Figure 6.

      2) Fig 4, Fig S5 and Fig S8 show results for the regression coefficient of expression on ML after leaving out one cancer at a time. All of us initially read this as results for 'one cancer at a time', rather than 'leave-one-out'. These figures are used to argue that the results are not driven by specific cancer types. However, this analysis would not reveal if the signal was driven by a (small) subset of cancer types. To justify claims like "significant negative relationship between mutational load and cell viability across almost all cancer types", one needs to analyze individual cancer types. Results for specific genes, rather than broad groups would also help interpret these results.

      Our reasoning for grouping together genes in Figure 4 was because the shRNA screen was done on a single gene at a time, and we were interested in measuring the joint effect on viability after knocking down all of the genes in a given complex. 

      Given that the expression responses in Figure 3 already validate within cancer types in TCGA in Supplemental Figure 6, we believe that it’s very unlikely that the signal we observe is driven by individual cancer types or smaller groups of cancer types. In addition, we did not perform a within cancer analysis in CCLE for Figure 4, because not all available cancer types in CCLE were profiled evenly in the shRNA screen (Total < 300). The vast majority of cancer types in CCLE for the shRNA screen (23/26) have sample sizes <20 within each group that we believe are unlikely to lead to meaningful results that are not driven by noise.

      3) You use different model architecture for the TCGA and CCLE analysis because you suspect that the sample size imbalance in the latter might mean that a GLMM can not capture the different variance components accurately. Did you test this? Could you downsample to avoid this? Cancer type is likely a strong confounder of ML.

      That was indeed our reasoning, that within group sample sizes in CCLE are too low to robustly estimate variance within cancer types. Given that many cancer types have <20 samples within each group, we don’t think that evenly downsampling would enable us to get an estimate not driven by noise. As noted above, our approach to control for this was to perform a jackknifing procedure that eliminates a single cancer type at a time and re-estimates the effect. 

      4) In the splicing analysis (Fig 2 and Fig S4), you report a 10% variation in splicing for a 100-fold variation in ML. This weak trend is replicated in very similar ways for many different types of alternative splicing events. It is not clear why different events (exon skipping, intron retention, etc) should respond in the same way to ML. A weak but homogeneous effect like the one shown here might result from some common confounder (see point 1). Similarly, it is not clear why with increasing intron retention PSI threshold the fraction of under-expressed transcripts would decrease and not increase.

      We agree that the effects of all the different alternative splicing effects are complex. Our focus was on intron retention, which is known to occur in cancer (Lindeboom, et. al 2016, Nature Genetics), and our analysis is consistent with the idea that damaging passenger mutations can shift cellular phenotypic states that require the use of many different mechanisms to mitigate protein misfolding.

      For Figure S4, as the PSI threshold for calling an alternative splicing event increases, fewer samples are called as having an intron retention event in the gene. This uniformly decreases the numerator across all the mutational load bins, so that when the threshold is increased the fraction of under-expressed transcripts with intron retention events is lower.

    2. eLife assessment

      Tilk and colleagues present a computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding – an important finding with general implications for our understanding of cancer evolution.

    1. eLife assessment

      This important manuscript reveals signatures of co-evolution of two nucleosome remodeling factors, Lsh/HELLS and CDCA7, which are involved in the regulation of eukaryotic DNA methylation. The results suggest that the roles for the two factors in DNA methylation maintenance pathways can be traced back to the last eukaryotic common ancestor and that the CDC7A-HELLS-DNMT axis shaped the evolutionary retention of DNA methylation in eukaryotes. The evolutionary analyses are solid, although more refined phylogenetic approaches could have strengthened some of the claims. Overall, this study could be used by researchers studying DNA methylation pathways in different organisms, and it should be of general interest to colleagues in the fields of evolutionary biology, chromatin biology and genome biology.

    1. eLife assessment

      The specific questions taken up for study by the authors – in mice of HDAC and Polycomb function in the context of vascular endothelial cell (EC) gene expression relevant to the blood-brain barrier, (BBB) – are potentially useful in the context of vascular diversification in understanding and remedying situations where BBB function is compromised. The strength of the evidence presented is incomplete, and to elaborate, it is known that the culturing of endothelial cells can have a strong effect on gene expression. This is a significant issue as we are not given how long the cells were cultured and how the above point was addressed.

    1. eLife assessment

      This study presents fundamental new insight into the regulatory apparatus of PI3Kγ, a kinase in signaling pathways that control the immune response and cancer. A suite of biophysical and biochemical approaches provide convincing evidence for new sites of allosteric control over enzyme activity. The rigorous findings provide structure and dynamic information that may be exploited in efforts to control PI3Kγ activity in a therapeutic setting.

    1. eLife assessment

      This fundamental study has successfully identified four key transcription factors (MECOM, PAX8, SOX17, and WT1) that exhibit synergistic effects and are potentially responsible for the transformation of fallopian tube secretory epithelial cells into high-grade serous 'ovarian' cancer cells. Convincing data strongly support the drawn conclusion and significantly contribute to our understanding of the etiology of this devastating cancer. The implications of this finding are substantial, as it provides molecular insights that can potentially pave the way for innovative diagnostics and therapeutics in the field of gynecological oncology. Enhancing the clarity and impact of this study would be achieved through improvements in data presentation.

    1. eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that the K+ catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The paper is an important contribution, but it's still unclear exactly where K+ binds in LeuT, and how to incorporate K+ binding into a transport cycle mechanism.

    1. eLife assessment

      There was a range of opinion among three highly expert reviewers from different perspectives in the field. This is a significant topic and it was felt that the contribution at present is valuable to those in the field. However, it was agreed after consultation that the description of the simulation methodology was inadequate.

    1. eLife assessment

      This valuable study is of relevance for those interested in mechanism required for infections of humans by Klebsiella pneumoniae. The authors apply TraDIS (high-density TnSeq) to K. pneumoniae with the goal of identifying genes required for survival under various infection-relevant conditions. In general, the evidence supporting the identity of the identified genes is convincing, but testing additional individual genes to validate the list inferred from TraDIS data, in addition to complementing the mutants, would help to provide full support for the claims made. Additional work would also help to unravel novel mechanisms beyond the ones reported.

    1. eLife assessment

      This valuable study reports on the structure and function of capsid size-determining external scaffolding protein encoded by a Vibrio phage satellite. The structural work is of high quality and the presented reconstructions are compelling, but some of the experiments could benefit from a more rigorous statistical analysis of capsid sizes and shapes. The paper offers an advance in the field of phage and virus structure and assembly with implications for understanding the evolution of phage satellites.

    1. eLife assessment

      This important study illustrates the value of museum samples for understanding past genetic variability in the genomes of populations and species, including those that no longer exist. The authors present genomic sequencing data for the extinct Xerces Blue butterfly and report convincing evidence of declining population sizes and increases in inbreeding beginning 75,000 years ago, which strongly contrasts to the patterns observed in similar data from its closest relative, the extant Silvery Blue butterfly. Such long-term population health indicators may be used to highlight still extant but especially vulnerable-to-extinction insect species – irrespective of their current census population size abundance.

    1. eLife assessment

      This paper is a valuable step in multi-subject behavioral modeling using an extension of the Variational Autoencoder (VAE) framework. Using a novel partition of the latent space and in tandem with a recently proposed regularization scheme, the paper provides a rich set of computational analyses analyzing social behavior data of mice with results that represent the state-of-the-art in this subfield. The strength of evidence is convincing, with the methodology being well documented and the results being reproducible, although some additional quantifications would have been helpful to fully gauge the circumstances where the approach would be most effectively applied.

    1. eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in basic and clinical aspects of consciousness.

    1. eLife assessment

      This valuable study focuses on the impact of growth feedback on the performance of artificial gene circuits capable of achieving adaptive responses, a significant problem in synthetic biology. Through solid computational analysis, the authors identify specific failure mechanisms, as well as core topologies associated with robust performance based on systematic analysis of over four hundred circuit topologies. The results will be of interest to those working on engineering gene circuits for diverse applications.

    1. eLife assessment

      This paper reports a valuable new set of new results. The main claim is that the projection from adult-born granule cells in the dentate gyrus to the hippocampal subfield CA2 is necessary for the retrieval of social memories formed during development. However, the reviewers agreed that evidence for this major claim is currently incomplete.

    1. eLife assessment

      This important study illustrates the value of museum samples for understanding past genetic variability in the genomes of populations and species, including those that no longer exist. The authors present genomic sequencing data for the extinct Xerces Blue butterfly and report convincing evidence of declining population sizes and increases in inbreeding beginning 75,000 years ago, which strongly contrasts to the patterns observed in similar data from its closest relative, the extant Silvery Blue butterfly. Such long-term population health indicators may be used to highlight still extant but especially vulnerable-to-extinction insect species – irrespective of their current census population size abundance.

    1. eLife assessment

      This study presents valuable new insights from the protist Tetrahymena regarding radial spokes, conserved protein complexes that are relevant for cilia motility. The work employs interdisciplinary approaches to provide convincing support for radial spoke composition with some experiments, but there are weaker areas with partially incomplete support, such as relying on knockouts alone rather than including localization studies of tagged proteins.